This is the CD-ROM release of the CallFriend Tamil Speech Corpus,
produced by the Linguistic Data Consortium.  This release contains
speech data files ONLY, along with documentation describing speaker
information (sex, age, education, callee telephone number) and call
information (channel quality, number of speakers). These files are not
compressed.


Summary of CD-ROM contents:
---------------------------

0readme.1st		this file
cf_tam			path to the speech data files, divided into
			train, devtest and evltest partitions
doc			directory of documentation for CallFriend Tamil


Note that the partitioning of speech data into sets for "training",
"development test" and "evaluation test" sets reflects the original
usage of the speech files by participants in the U.S.  Government-
sponsored project on Language Identification (LID).  As of this
release, there are 20 conversations in the training set, 20 in the
development test set, and 20 in the evaluation test set.  An additional
(new) set of 20 evaluation test calls will be released as the
benchmark tests are carried out for this project.