Numbers Corpus
                            Release 1.3

              Center for Spoken Language Understanding


UPDATED: 23 August 2002


Overview
--------
The Numbers Corpus is a collection of naturally produced
numbers. The utterances were taken from other CSLU
telephone speech data collections, and include isolated
digit strings, continuous digit strings, and
ordinal/cardinal numbers. A total of 12618 speakers, in
23902 files are included in this corpus.

The utterances in this corpus were taken from other
telephone speech data collections completed at the CSLU. 
In most data collections the callers were asked to leave
their phone number, birthdate, or zipcode at some point.
Also, the callers would occasionally leave numbers in
the midst of another utterance.  The numbers in these
situations were extracted from the host utterance and added
to the Numbers Corpus.

Each file in the Numbers Corpus has an orthographic
Transcription following the CSLU Labeling Conventions.
Also, many of the utterances have been phonetically
labelled.

Release 1.1 of this corpus contains 28626 files.  Of these,
6640 have been phonetically labeled.

Statistics
----------
The Numbers Corpus consists of about 15 hours of speech.

The following table gives a count of the number of files
for each utterance type.

	Type		Number
	----------------------
	other1		 5026
	other2		 1332
	other3		  292
	other4		   79
	other5		   28
	other6		   14 
	phone		 2970
	street           7079
	zipcode		 7076