This is the release of the CallFriend Mandarin Speech Corpus Taiwan Dialect, produced by the Linguistic Data Consortium. This release contains speech data files ONLY, along with documentation describing speaker information (sex, age, education, callee telephone number) and call information (channel quality, number of speakers). These files are not compressed. Summary of contents: --------------------------- index.html html page that links to everything in the docs folder. docs/ README.txt This file. cf_man_t.txt Description of the CallFriend telephone speech corpus for Mandarin Taiwan Dialect. callinfo.txt Explanation of the audit information provided in "callinfo.tbl". callinfo.tbl A list of audit information as explained in "callinfo.txt", with information on number and sex of speakers and several sound quality judgements. The first three calls in the corpus do not have call info and are not shown in this table. headerinfo.txt Explanation of the SPH header information provided in "headerinfo.tbl". headerinfo.tbl A table of the data that was originally in the SPH header for each audio file before they were converted. spkrinfo.txt Explanation of the speaker demographic information provided in "spkrinfo.tbl". spkrinfo.tbl A table of information provided about the speakers involved in each phone call, such as age and hometown. For the first three calls in the table, no speaker information is available, so the entries are empty. file_partitions.txt Categorizes each of the audio files in the corpus into their original partitions (train, devtest, evltest) data/ The speech data files. These files were originally divided into train, devtest and evltest partitions, which are now described in file_partitions.txt Note that the partitioning of speech data into sets for "training", "development test" and "evaluation test" sets reflected the original usage of the speech files by participants in the U.S. Government- sponsored project on Language Identification (LID). In this release, all 60 files are combined in one data folder. METADATA: --------------- Total Duration: 27:02:56 Duration by language: - Mandarin 27:02:56 Calls per caller: 1