This is the CD-ROM release of the Voicemail Corpus - Part I collected from volunteers at various IBM sites in the United States and published by the Linguistic Data Consortium. The corpus consists of 1801 messages comprising the training data set and 42 messages in the development test set. The average voicmail messages is 31 seconds in duration, and has about 100 words. Approximately 38% of the messages correspond to male speakers; the remainder corresponding to females. All messages were transcribed by IBM. In the root directory of this disc, you will find: README.1st (this file) /devtest/speech directory containing devtest speech files /devtest/transcrp directory containing devtest transcript files /doc/ibm_vmd.doc collection and transcription overview /doc/ibm_vmd.ps postscript version of "ibm_vmd.doc" /doc/results.doc preliminary results findings /doc/results.ps postscript version of "results.doc" /doc/transcrp.doc transcription conventions documentation /train/speech training data speech files directory /train/transcrp training data transcript files directory