HUB5 Mandarin Telephone Speech Corpus
Item Name: | HUB5 Mandarin Telephone Speech Corpus |
Author(s): | Linguistic Data Consortium |
LDC Catalog No.: | LDC98S69 |
ISBN: | 1-58563-131-0 |
ISLRN: | 333-068-970-015-5 |
DOI: | https://doi.org/10.35111/69dn-5z94 |
Member Year(s): | 1998 |
DCMI Type(s): | Sound |
Sample Type: | 2-channel ulaw |
Sample Rate: | 8000 |
Data Source(s): | telephone conversations |
Project(s): | GALE, Hub5-LVCSR, EARS |
Application(s): | speech recognition |
Language(s): | Mandarin Chinese |
Language ID(s): | cmn |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC98S69 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Linguistic Data Consortium. HUB5 Mandarin Telephone Speech Corpus LDC98S69. Web Download. Philadelphia: Linguistic Data Consortium, 1998. |
Related Works: | View |
LDC98S69 - Speech data LDC98T26 - Transcripts
Introduction
This release of HUB5 Mandarin training data consists of 42 calls derived from the CALLFRIEND Mandarin Chinese Mainland Dialect (Language ID) collection. The transcribed data is intended as additional training data in support of the project on Large Vocabulary Conversational Speech Recognition (LVCSR), also sponsored by the U.S. Department of Defense. The transcripts cover a contiguous 5-30 minute segment taken from a recorded conversation lasting up to 30 minutes.
LDC has released HUB5 Mandarin Telephone Speech and Transcripts Second Edition (LDC2018S18), which combines the speech and transcripts and make some updates to the release. See catalog entry for more details.
Data
Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in North America and were placed to various locations within North America.
Updates
There are no updates at this time.