published on CD-ROM by the
Linguistic Data Consortium (LDC)
University of Pennsylvania
The OGI Spelled and Spoken Telephone Corpus consists of speech recordings from over 3650 telephone calls, each made by a different speaker to an automated prompting/recording system installed at OGI. Speakers were asked to say their name, where they were calling from, and where they grew up; they were asked to answer a couple of yes/no questions, and to spell their first and last names; many were also asked to repeat a few specific words, and to recite the letters of the alphabet.
Each response to a prompt is stored as a separate waveform file, and the files are organized according to prompt (response type) in the "speech" directory. (Each call is assigned a unique index number, so all responses from a given call are identifyable by that number in their respective file names.) Time-aligned transcriptions were generated and checked by hand for a subset of utterances, and these are organized in a parallel fashion under the "handlabl" directory. Further details about file and directory organization are found in "doc/overview.doc". Descriptions of the phonetic labeling used in the transcriptions are found in "doc/phn_labl.doc" and "doc/timitcde.doc".
The speech data are stored in compressed format; each waveform file begins with an uncompressed 1024-byte header in NIST SPHERE format, followed by the compressed sample data. The may be uncompressed using the SPHERE 2.0 software package, which is available for free from NIST or the LDC. See the file "doc/formats.doc" for details on obtaining SPHERE software. The file "doc/header.doc" gives a detailed description of the NIST SPHERE header specification.
A paper describing the collection and summary statistics of the corpus may be found in "doc/icslp92.ps" (printable PostScript format) and also in "doc/icslp92.tex" (Latex format).
The "db" directory contains the complete data base files "phonedb.lo" and "phonedb.hi", which describe each call in the corpus with regard to speaker gender, orthographic transcription of each response, quality assessment, and so on; the "db" directory also provides some source code in C and PERL for organizing the "phonedb" files into a relational data base.
For users who are interested in using the speech files for training and testing of speech recognition or other systems, OGI has also provided a partitioning of the corpus into training, development test, and evaluation test sets (using roughly a 60-20-20 split for training, development, and evaluation, respectively). These sets are specified simply as lists of call numbers, contained in the "partitn" directory.
(In preparing this corpus for publication on CD-ROM, the LDC has modified some of the file naming conventions originally employed by OGI. All documentation files on this CD-ROM have been modified as necessary to reflect these changes.)