Home › Language Resources › Data

HUB5 Spanish Transcripts

Item Name:	HUB5 Spanish Transcripts
Author(s):	Elisa Munoz, Jennifer Alabiso, Robert MacIntyre, David Graff
LDC Catalog No.:	LDC98T27
ISBN:	1-58563-134-5
ISLRN:	997-940-878-462-1
DOI:	https://doi.org/10.35111/z46b-j130
Member Year(s):	1998
DCMI Type(s):	Text
Data Source(s):	telephone conversations
Project(s):	Hub5-LVCSR
Application(s):	speech recognition
Language(s):	Spanish
Language ID(s):	spa
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC98T27 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Munoz, Elisa, et al. HUB5 Spanish Transcripts LDC98T27. Web Download. Philadelphia: Linguistic Data Consortium, 1998.
Related Works: Hide	View isAnnotationOf LDC96S57 CALLFRIEND Spanish-Caribbean Dialect LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect LDC98S70 HUB5 Spanish Telephone Speech Corpus isContinuationOf LDC98T26 HUB5 Mandarin Transcripts LDC2018S18 HUB5 Mandarin Telephone Speech and Transcripts Second Edition isSimilarWith LDC96T17 CALLHOME Spanish Transcripts LDC2002S13 2001 HUB5 English Evaluation LDC2002S23 1997 HUB5 English Evaluation LDC2002T39 1997 HUB5 Arabic Transcripts LDC2002T43 2000 HUB5 English Evaluation Transcripts LDC2003T01 2001 HUB5 Mandarin Transcripts LDC2003T02 1998 HUB5 English Transcripts LDC2003T03 1997 HUB5 German Transcripts LDC2003T04 1997 HUB5 Spanish Transcripts

LDC98S70 - Speech data LDC98T27 - Transcripts

Introduction

This release of HUB5 Spanish training data consists of 106 calls derived from the CALLFRIEND Spanish (Language ID) collection. The transcripts cover a contiguous 10-30 minute segment taken from a recorded conversation lasting up to 30 minutes. These calls were originally collected by the LDC in support of the project on Language Recognition, sponsored by the U.S. Department of Defense. All of these calls are being designated as additional training data for the project on Large Vocabulary Conversational Speech Recognition (LVCSR) in Spanish.

Data

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project.

Once a caller was recruited to participate, he/she was given a free choice of whom to call. Recruits were given no guidelines concerning what they should talk about. Most participants called family members or close friends. All calls originated in North America and were placed to various locations within North America, Puerto Rico or the Dominican Republic. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call.

HUB5 Spanish speech and transcript data may be obtained by contacting the LDC

Updates

There are no updates at this time.

HUB5 Spanish Transcripts

Introduction

Data

Updates

Available Media

View Fees