Item Name: OGI Spelled and Spoken Word
Author(s): Ronald Cole, Yeshwant Muthusamy
LDC Catalog No.: LDC94S18
ISBN: 1-58563-036-5
ISLRN: 718-988-956-252-7
Member Year(s): 1994
DCMI Type(s): Sound
Sample Type: 1-channel pcm compressed
Sample Rate: 8000
Data Source(s): telephone speech
Application(s): speech recognition
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC94S18 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Cole, Ronald, and Yeshwant Muthusamy. OGI Spelled and Spoken Word LDC94S18. Web Download. Philadelphia: Linguistic Data Consortium, 1994.
The OGI Spelled and Spoken Telephone Corpus consists of speech recordings from over 3,650 telephone calls, each made by a different speaker to an automated prompting/recording system installed at the Oregon Graduate Institute. Speakers were asked to say their name, where they were calling from and where they grew up; they were asked to answer a couple of yes/no questions and to spell their first and last names; many were also asked to repeat a few specific words and to recite the letters of the alphabet.

Each response to a prompt is stored as a separate waveform file and the files are organized according to prompt (response type); all responses from a given call have a unique caller-index number as part of the file named, so that responses can easily be sorted by speaker. Waveform data are stored in compressed form, using the NIST SPHERE 2.0 software package, which is available separately at no charge to users. SPHERE 2.0 provides the decompression software needed to extract the waveform data, as well as tools for accessing and modifying file headers.

Time-aligned phonetic transcriptions are provided for a subset of responses and a complete log of each (giving speaker sex, quality judgments and orthographic transcriptions of all responses) is included in a form suitable for use as a relational data base.

