National Cellular Corpus Release 2.3 Center for Spoken Language Understanding UPDATED: 23 September 2002 Use of this corpus is permitted only under the conditions of the signed license agreement. Use or redistribution of this corpus outside the agreement is prohibited by law. Overview -------- The National Cellular Corpus consists of cellular telephone speech from 2336 callers from locations throughout the United States. The data collection protocol contains requests for fixed vocabulary and continuous speech utterances. A total of about one minute of speech from each caller is collected. Distribution Directory Structure -------------------------------- This is the distribution for Release 2.3 of the National Cellular Corpus. This corpus is distributed by the Center for Spoken Language Understanding of the Oregon Graduate Institute. Following is a description of the directory structure in this release: readme.txt General information regarding the corpus. docs/ The documentation directory. This directory contains further documentation for the National Cellular corpus. labels/ Phonetic labeling directory. This directory contains time aligned phoneme-level transcriptions (automatic forced alignment). misc/ Miscellaneous directory, possibly containing software tools and scripts. speech/ The speech directory contains the actual .wav files. There are many numbered subdirectories within the speech directory. trans/ The transcriptions directory. This directory contains non-time-aligned word level transcriptions for each of the speech files. This corpus requires approximately 3.4GB of disk space. Please see the /docs directory for further documentation. Contact Information ------------------- Further information about this corpus can be found our web site: . Refer specific questions to: - Center for Spoken Language Understanding - Oregon Health & Science University - email : corpora@cslu.ogi.edu - Address : 20000 NW Walker Road Beaverton, OR 97006 USA Constructive feedback about this corpus is appreciated.