ISOLET Corpus
                            Release 1.3

              Center for Spoken Language Understanding


UPDATED: 19 August 2002


Overview
--------
ISOLET is a database of letters of the English alphabet spoken in isolation. 
The database consists of 7800 spoken letters, two productions of each letter by 
150 speakers. It contains approximately 1.25 hours of speech. The recordings were 
done under quiet, laboratory conditions with a noise-canceling microphone.

Recording Conditions
--------------------
Speech was recorded in the OGI speech recognition laboratory. The room is 15' by 15' 
with a tile floor and standard office wall board and drop ceiling. There are two Sun 
workstations in the room, and three disk drives.

The recording equipment was selected to mimic the equipment used to collect the 
TIMIT database as closely as possible. The speech was recorded with a Sennheiser 
HMD 224 noise-canceling microphone, lowpass filtered at 7.6 kHz. Data capture was 
performed using the AT&T DSP32 board installed in a Sun 4/110. The data were sampled 
at 16 kHz. Now it is converted to RIFF format.

The subjects were seated in front of a Sun workstation and prompted with letters in 
random order. After each prompt, the subject would strike the return key and say the 
letter. Two seconds of speech were recorded and immediately played back for 
verification. If the subject spoke too soon or too late and missed the two second 
buffer, or if the experimenter or subject decided the letter was mis-spoken, the 
recording would be repeated. There was no attempt to elicit ideal speech. A letter 
was judged mis-spoken only if there was a significant departure from normal 
pronunciation.

The ISOLET corpus was collected in 1990.

Speaker Population
------------------
Subjects were obtained through advertising. Each subject was given a free dessert 
at a local restaurant in exchange for his or her participation. All speakers 
reported English as their native language. The ages varied from 14 to 72 years, with 
an average of 35.

There were 75 male and 75 female subjects used in this data collection.

Annotation
----------
After the recording session, each utterance was verified by a human examiner. The 
examiner viewed a waveform of the utterance to verify that the speech was padded 
with silence. Secondly, the examiner would listen to the speech and note any 
ambiguous or mis-spoken utterances.

All utterances noted by the examiner were examined by two other human examiners. 
If a majority of the examiners perceived that an utterance was abnormal, that 
utterance, and the rest of the utterances from that speaker, were removed from the 
corpus.

References
----------
This corpus is described in more detail in Technical Report No. CSE 90-004.