OGI Multilanguage Corpus
|Item Name:||OGI Multilanguage Corpus|
|Author(s):||Ronald Allan Cole, Yeshwant Muthusamy|
|LDC Catalog No.:||LDC94S17|
|Sample Type:||1-channel pcm compressed|
|Data Source(s):||telephone speech|
|Language(s):||Vietnamese, Tamil, Korean, Japanese, Hindi, French, English, German, Spanish, Mandarin Chinese, Persian, Dari, Iranian Persian|
|Language ID(s):||vie, tam, kor, jpn, hin, fra, eng, deu, spa, cmn, fas, prs, pes|
LDC User Agreement for Non-Members
|Online Documentation:||LDC94S17 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Cole, Ronald Allan, and Yeshwant Muthusamy. OGI Multilanguage Corpus LDC94S17. Web Download. Philadelphia: Linguistic Data Consortium, 1994.|
Speech was collected using an automated system that answered the telephone, played digitized prompts in the appropriate language to request the speech samples and digitized the callers' responses for a designated period of time.
Log files are included that provide a set of automatic measurements made on each utterance. In addition, some utterances were automatically segmented into broad phonetic catagories. The speech data are compressed, with NIST SPHERE headers.