Mandarin Chinese phonetic segmentation and tone corpus The corpus contains 7,849 Mandarin Chinese "utterances" and their phonetic segmentation and tone labels. The utterances are time stamped between-pause units in the LDC 1997 Mandarin Broadcast News Speech and Transcripts (LDC98S73 and LDC98T24). The utterances with background noise and music, and those from speakers whose names were not tagged or from speakers with accents were excluded. 300 utterances were randomly selected from six speakers (50 utterances for each speaker) to form the test set. The remaining 7,549 utterances form the training set. The utterances in the test set were manually labeled and segmented into initials and finals in Pinyin. Tones were marked on the finals, including Tone1 through Tone4, and Tone0 for the neutral tone. The Sandhi Tone3 was labeled as Tone2. The phonetic segmentation and transcription of the training set were automatically obtained using a Hidden Markov Model (HMM) based forced aligner trained on the same utterances (Yuan et al. 2014). Tested on the test set, the aligner achieved 93.1% agreement (of phone boundaries) within 20 ms compared to manual segmentation. The quality of the phonetic transcription and tone labels of the training set was evaluated by checking 100 utterances randomly selected from it. There were 1,252 syllables in the 100 utterances: 15 syllables had a mistaken transcription of the tone; 2 had a mistaken transcription of the final, and no syllables had transcription errors on the initial. The phonetic labels are listed below. "i", "ii", and "iii" are variants of /i/: "ii" only appears after an alveolar fricative/affricate; "iii" only appears after a retroflex fricative/affricate; "i" appears in all other contexts. "v" represents "ΓΌ" in Pinyin. Initials: b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s Finals: a, ai, an, ang, ao, e, ei, en, eng, er, i, ii, iii, ia, ian, iang, iao, ie, in, ing, iong, iu, ong, ou, u, ua, uai, uan, uang, ui, un, ung, uo, v, van, ve, vn Tones: 1, 2, 3, 4, 0 Silence: sil The training and test utterances are placed in the train and test directories, respectively. Each utterance has three files: .wav is the audio; .txt is the word transcript in Chinese; and .phon contains phonetic boundaries and labels. The first three letters of the filename, e.g., CHJ in CHJ000001, represents the speaker. All the speakers are listed below: CHJ: Male CHX: Female DIL: Male DOH: Male FAJ: Female HAT: Male KOF: Female LIS: Male MAK: Male OUT: Male RUO: Female SHH: Female SUC: Male TIK: Male WAJ: Male XIH: Male XIJ: Male XIN: Female XIY: Male XUL: Female Reference: Yuan, J., Ryant N. and Liberman, M. (2014). "Automatic phonetic segmentation in Mandarin Chinese: boundary models, glottal features and tone," ICASSP 2014, May 4-9, Florence, Italy.