Magic Data Chinese Mandarin Conversational Speech

Item Name: Magic Data Chinese Mandarin Conversational Speech
Author(s): Beijing Magic Data Technology Co.
LDC Catalog No.: LDC2019S23
ISBN: 1-58563-911-7
ISLRN: 636-430-467-703-3
Release Date: December 05, 2019
Member Year(s): 2019
DCMI Type(s): Sound, Text
Sample Type: pcm
Sample Rate: 16000
Data Source(s): microphone conversation
Application(s): speech recognition
Language(s): Mandarin Chinese
Language ID(s): cmn
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2019S23 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Beijing Magic Data Technology Co.. Magic Data Chinese Mandarin Conversational Speech LDC2019S23. Web Download. Philadelphia: Linguistic Data Consortium, 2019.
Magic Data Chinese Mandarin Conversational Speech was developed by Beijing Magic Data Technology Co., Ltd. and consists of approximately 10 hours of Mandarin conversational speech from 60 speakers. Each conversation was recorded on multiple devices and is presented in multiple forms, resulting in a total of approximately 60 hours of audio with corresponding transcripts.


All participants were native speakers of Mandarin in Mainland China from accent regions across the country. Speakers were paired for conversations on a range of topics, including travel, fitness, games, sports and pets.

Speech data was recorded on mobile devices and is presented as 16kHz, 16-bit flac compressed pcm wav. Most files are single channel; however, a stereo version of each conversation is also included.

Transcript data is contained in UTF-8 encoded plain text TextGrids. Metadata such as topic, collection date, mobile device and speaker demographic information is found in the documentation accompanying this release.


Please view this stereo speech sample and transcript sample.


