ISOLET Corpus Release 1.3 Center for Spoken Language Understanding UPDATED: 19 August 2002 Overview -------- ISOLET is a database of letters of the English alphabet spoken in isolation. The database consists of 7800 spoken letters, two productions of each letter by 150 speakers. It contains approximately 1.25 hours of speech. The recordings were done under quiet, laboratory conditions with a noise-canceling microphone. Recording Conditions -------------------- Speech was recorded in the OGI speech recognition laboratory. The room is 15' by 15' with a tile floor and standard office wall board and drop ceiling. There are two Sun workstations in the room, and three disk drives. The recording equipment was selected to mimic the equipment used to collect the TIMIT database as closely as possible. The speech was recorded with a Sennheiser HMD 224 noise-canceling microphone, lowpass filtered at 7.6 kHz. Data capture was performed using the AT&T DSP32 board installed in a Sun 4/110. The data were sampled at 16 kHz. Now it is converted to RIFF format. The subjects were seated in front of a Sun workstation and prompted with letters in random order. After each prompt, the subject would strike the return key and say the letter. Two seconds of speech were recorded and immediately played back for verification. If the subject spoke too soon or too late and missed the two second buffer, or if the experimenter or subject decided the letter was mis-spoken, the recording would be repeated. There was no attempt to elicit ideal speech. A letter was judged mis-spoken only if there was a significant departure from normal pronunciation. The ISOLET corpus was collected in 1990. Speaker Population ------------------ Subjects were obtained through advertising. Each subject was given a free dessert at a local restaurant in exchange for his or her participation. All speakers reported English as their native language. The ages varied from 14 to 72 years, with an average of 35. There were 75 male and 75 female subjects used in this data collection. Annotation ---------- After the recording session, each utterance was verified by a human examiner. The examiner viewed a waveform of the utterance to verify that the speech was padded with silence. Secondly, the examiner would listen to the speech and note any ambiguous or mis-spoken utterances. All utterances noted by the examiner were examined by two other human examiners. If a majority of the examiners perceived that an utterance was abnormal, that utterance, and the rest of the utterances from that speaker, were removed from the corpus. References ---------- This corpus is described in more detail in Technical Report No. CSE 90-004.