SR4X Corpus Release 1.2 Center for Spoken Language Understanding UPDATED: 23 August 2002 Overview -------- This corpus consists of speech recorded on four different channels of 36 speakers repeating the following eleven words: startrek supernova tektronix generation nebula processing singularity 71523 abracadabra sungeeta computer Each word on each channel was repeated six times by each speaker. Each utterance is recorded as a separate file. The file names appear as: SD-1309-tektronix-t2-52.wav SD is an abbreviation that identifies the corpus (speaker dependent) 1309 is the speaker number tektronix is the word spoken for this utterance t2 indicates that this is for channel 2 52 is a serial number assigned during the course of each call. The four channels used are: 1 - office phone 2 - home phone 3 - carbon microphone telephone 4 - speaker phone (through speaker) Gender Information ------------------ The following table shows the gender of each of the participants based on their speaker number. 1030 m 1063 m 1111 f 1159 m 1227 f 1234 m 1305 m 1309 m 1348 m 1381 f 1430 f 1436 m 1561 f 1584 m 1637 m 1648 m 1683 f 2222 f 3333 m 3335 m 3745 m 4444 f 5555 m 6666 f 7011 m 7308 f 7315 m 7329 f 7339 f 7341 m 7382 f 7488 m 7496 m 7502 f 7523 m 7876 m Male: 22 Female: 14 Verification ------------ We classified each utterance in the corpus as either: good, bad, noisy, or different. We made the classifications for the whole corpus once then redid it. We compared the results from both passes and reviewed all the utterances that did not agree from both passes. Agreement was about 85%. The following confusion matrix shows where most of the confusions occurred. g b n d g 6877 314 628 b 31 45 2 n 142 3 414 60 d 119 6 25 305 1330 mismatches out of 8971 files The four categories are defined in the document speaker.ps that is included in the /docs directory of this distribution. The result of the verification process is contain in the four files: good.txt bad.txt noisy.txt different.txt in the /docs directory with this distribution.