The eNTERFACE06_ARCTIC Database

The eNTERFACE06_ARCTIC database is an audio-visual database that can be used for designing an avatar control system and performing voice conversion. It was designed during the eNTERFACE'06 workshop (in the framework of the eNTERFACE 1-month summer workshops), for a collaborative project on multimodal character control [1].

CONTENTS

The final version of the database contains 1 male speaker, speaking 199 sentences uniformly sampled from the CMU_ARCTIC database [2] (which is itself composed of sentences taken from the Gutenberg Project). The list of filenames and sentences is included in the selected.txt file.
For each sentence, an .txt, a .avi, and a .wav file are available. The .avi file contains images with 320x240 pixels, 30 frames per second, of the speaker pronouncing the sentence (Fs=44100 Hz). The .wav file contains the same sound recording as in the .avi file, but resampled to 16 kHz.

TECHNICAL ASPECTS

The database was recorded using a standard mini-DV digital video camera. The resolution of the camera was 800.000 pixels. The recording of the speech signal was realized through the use of a high-quality microphone, specially conceived for speech recordings. The microphone was positioned roughly 30cm below the subject’s mouth, outside of the camera field.
The background consisted of a monochromatic dark green panel that covered the entire area behind the subject, to allow easier face detection and tracking. Natural lighting was used, so that some slight illumination variation can encountered among the files.
The recordings were made using the paNannyRecord tool provided by UPC Barcelona, which makes it possible for the speaker to hear the sentence it has to pronounce twice before recording it. The source speaker used for the recordings were the awb speaker of CMU_ARCTIC. The eNTERFACE06_ARCTIC speaker was asked to keep the prosody (timing, pitch movements) of the source, while using his own acoustic realization of phonemes, and of course, his voice (i.e., not trying to imitate the target voice). This particular setting has made it possible for the eNTERFACE_ARCTIC recordings to be pretty much aligned with the corresponding CMU_ARCTIC recordings.

ERRATA

Some minor problems have been detected. You'll have to live with them...
Sentence arctic_a0430.wav says "Nevertheless we found ourselves one more in the high seat of abundance." instead of "Nevertheless we found ourselves once more in the high seat of abundance."
arctic_a0085.* does not exist (this file was badly recorded).

LICENSE

This database is available under MIT-like license conditions (the terms of this very open license are provided with the database).

DOWNLOAD

Please note that the database is very big (1.4 GB, rar archive). Click HERE to proceed.

[1] T. Dutoit et al., ‘Multimodal Speaker Conversion- his master’s voice … and face.’ , in Proc. eNTERFACE 2006, July 17th-August 11th, Dubrovnik, Croatia - available on the eNTERFACE '06 website.
[2] http://festvox.org/cmu_arctic/