Home › Language Resources › Data

CALLHOME Spanish Speech

Item Name:	CALLHOME Spanish Speech
Author(s):	Alexandra Canavan, George Zipperlen
LDC Catalog No.:	LDC96S35
ISBN:	1-58563-083-7
ISLRN:	321-477-528-167-2
DOI:	https://doi.org/10.35111/2skn-2002
Member Year(s):	1996, 1997
DCMI Type(s):	Sound
Sample Type:	2-channel ulaw
Sample Rate:	8000
Data Source(s):	telephone conversations
Project(s):	Hub5-LVCSR
Application(s):	speech recognition
Language(s):	Spanish
Language ID(s):	spa
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC96S35 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Canavan, Alexandra, and George Zipperlen. CALLHOME Spanish Speech LDC96S35. Web Download. Philadelphia: Linguistic Data Consortium, 1996.
Related Works: Hide	View hasVersion LDC2026S04 CALLHOME Spanish Second Edition hasAnnotation LDC96T17 CALLHOME Spanish Transcripts LDC2001T60 Syllable-Final /s/ Lenition isSimilarWith LDC96S57 CALLFRIEND Spanish-Caribbean Dialect LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect LDC2018S12 Multi-Language Conversational Telephone Speech 2011 -- Spanish LDC2026S07 Multi-Language Conversational Telephone Speech 2014 - Spanish & Portuguese relatesTo LDC2008S08 LDC Spoken Language Sampler

Introduction

CALLHOME Spanish Speech was developed by the Linguistic Data Consortium (LDC) and contains approximately 38 hours of speech from 120 unscripted telephone conversations between native Spanish speakers.

The CALLHOME series consists of telephone conversations, transcripts and lexicons developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish.

Data

The conversational telephone speech in this release represents training and development data and a subset of evaluation data. Calls originated in North America and were placed to locations overseas. Most participants called family members or close friends. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes.

Audio files are presented as 8 kHz u-law SPHERE files compressed with SHORTEN.

Corresponding transcripts (LDC96T17) and an associated lexicon (LDC96L16) are available separately.

Samples

Please listen to this audio sample.

Updates

06/12/2018: 16 SPHERE files from the train and devtest directories were corrupted. Corrected versions of these files were included with the corpus as of the date above.

CALLHOME Spanish Speech

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees