Home › Language Resources › Data

CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition

Item Name:	CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition
Author(s):	Alexandra Canavan, George Zipperlen, John Bartlett
LDC Catalog No.:	LDC2018S09
ISBN:	1-58563-851-X
ISLRN:	466-791-939-707-1
DOI:	https://doi.org/10.35111/rmba-9w42
Release Date:	July 16, 2018
Member Year(s):	2018
DCMI Type(s):	Sound
Sample Type:	ulaw
Sample Rate:	8000
Data Source(s):	telephone conversations
Project(s):	EARS, GALE, LID
Application(s):	language identification
Language(s):	Mandarin Chinese
Language ID(s):	cmn
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2018S09 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Canavan, Alexandra, George Zipperlen, and John Bartlett. CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition LDC2018S09. Web Download. Philadelphia: Linguistic Data Consortium, 2018.
Related Works: Hide	View isVersionOf LDC96S55 CALLFRIEND Mandarin Chinese-Mainland Dialect isPartOf LDC2025S04 BOLT CTS CALLFRIEND CALLHOME Mainland Mandarin Chinese Audio isPartWith LDC96S34 CALLHOME Mandarin Chinese Speech LDC96S55 CALLFRIEND Mandarin Chinese-Mainland Dialect LDC98S69 HUB5 Mandarin Telephone Speech Corpus LDC2002S12 2001 HUB5 Mandarin Evaluation LDC2018S18 HUB5 Mandarin Telephone Speech and Transcripts Second Edition hasAnnotation LDC98T26 HUB5 Mandarin Transcripts hasOutcome LDC98S69 HUB5 Mandarin Telephone Speech Corpus LDC2018S18 HUB5 Mandarin Telephone Speech and Transcripts Second Edition isSimilarWith LDC2019S04 CALLFRIEND Egyptian Arabic Second Edition LDC2019S18 CALLFRIEND Canadian French Second Edition LDC2019S21 CALLFRIEND American English-Non-Southern Dialect Second Edition LDC2020S06 CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition LDC2020S08 CALLFRIEND American English-Southern Dialect Second Edition LDC2023S08 CALLFRIEND Russian Speech

Introduction

CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition was developed by the Linguistic Data Consortium (LDC) and consists of approximately 24 hours of unscripted telephone conversations between native speakers of the Mandarin Chinese dialect spoken in mainland China. This second edition updates the audio files to wav format, simplifies the directory structure and adds documentation and metadata. The first edition is available as CALLFRIEND Mandarin Chinese-Mainland Dialect (LDC96S55).

The CALLFRIEND series is a collection of telephone conversations in several languages conducted by LDC in support of language identification technology development. Languages covered in the collection include American English, Canadian French, Egyptian Arabic, Farsi, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and Vietnamese.

Data

All data was collected before July 1997. Participants could speak with a person of their choice on any topic; most called family members and friends. All calls originated in North America. The recorded conversations last up to 30 minutes.

The data was recorded as 8kHz u-law SPH encoded stereo files, with one end of the phone call on each channel. In this release, files were converted to WAV format, and information from the original SPH headers is described in the documentation. SPH files are not included in this second edition.

The audio files were originally split into train, dev and test folders of 20 recordings each, but they are combined in this release.

Completed calls passed through two human audits. The first audit was conducted to verify that the target language was spoken by the participants and to check the quality of the recordings. The second audit was conducted by a native speaker familiar with Mainland and Taiwanese Mandarin dialects to classify the conversations under one of the two categories.

CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees