CALLHOME Spanish Speech
| Item Name: | CALLHOME Spanish Speech |
| Author(s): | Alexandra Canavan, George Zipperlen |
| LDC Catalog No.: | LDC96S35 |
| ISBN: | 1-58563-083-7 |
| ISLRN: | 321-477-528-167-2 |
| DOI: | https://doi.org/10.35111/2skn-2002 |
| Member Year(s): | 1996, 1997 |
| DCMI Type(s): | Sound |
| Sample Type: | 2-channel ulaw |
| Sample Rate: | 8000 |
| Data Source(s): | telephone conversations |
| Project(s): | Hub5-LVCSR |
| Application(s): | speech recognition |
| Language(s): | Spanish |
| Language ID(s): | spa |
| License(s): |
LDC User Agreement for Non-Members |
| Online Documentation: | LDC96S35 Documents |
| Licensing Instructions: | Subscription & Standard Members, and Non-Members |
| Citation: | Canavan, Alexandra, and George Zipperlen. CALLHOME Spanish Speech LDC96S35. Web Download. Philadelphia: Linguistic Data Consortium, 1996. |
| Related Works: | View |
Introduction
CALLHOME Spanish Speech was developed by the Linguistic Data Consortium (LDC) and contains approximately 38 hours of speech from 120 unscripted telephone conversations between native Spanish speakers.
The CALLHOME series consists of telephone conversations, transcripts and lexicons developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish.
Data
The conversational telephone speech in this release represents training and development data and a subset of evaluation data. Calls originated in North America and were placed to locations overseas. Most participants called family members or close friends. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes.
Audio files are presented as 8 kHz u-law SPHERE files compressed with SHORTEN.
Corresponding transcripts (LDC96T17) and an associated lexicon (LDC96L16) are available separately.
Samples
Please listen to this audio sample.
Updates
06/12/2018: 16 SPHERE files from the train and devtest directories were corrupted. Corrected versions of these files were included with the corpus as of the date above.