This is the CD-ROM release of the CallFriend Tamil Speech Corpus, produced by the Linguistic Data Consortium. This release contains speech data files ONLY, along with documentation describing speaker information (sex, age, education, callee telephone number) and call information (channel quality, number of speakers). These files are not compressed. Summary of CD-ROM contents: --------------------------- 0readme.1st this file cf_tam path to the speech data files, divided into train, devtest and evltest partitions doc directory of documentation for CallFriend Tamil Note that the partitioning of speech data into sets for "training", "development test" and "evaluation test" sets reflects the original usage of the speech files by participants in the U.S. Government- sponsored project on Language Identification (LID). As of this release, there are 20 conversations in the training set, 20 in the development test set, and 20 in the evaluation test set. An additional (new) set of 20 evaluation test calls will be released as the benchmark tests are carried out for this project.