Spelled and Spoken Words Corpus Release 1.2 Center for Spoken Language Understanding UPDATED: 23 August 2002 Overview -------- The Spelled and Spoken Words corpus consists of spelled and spoken words. 3647 callers were prompted to to say and spell their first and last names, to say what city they grew up in and what city they were calling from, and to answer two yes/no questions. In order to collect sufficient instances of each letter, about 1371 callers also recited the English alphabet with pauses between the letters. Each call was transcribed by two people, and all differences were resolved. In addition, a subset of the calls has been phonetically labeled. Recording Conditions -------------------- Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file. The Gradient Desklabs used to record the database have 14 bit samples which are stored in 16 bit shorts. The least significant two bits are always 0. To see this, add "original mean" to every sample. The utterances were recorded with gain set to 10, which is the limit of what the manufacturer recommends. No highpass filter was used. A lowpass anti-aliasing filter was used. Speaker Population ------------------ A press release describing our research project and the need for volunteers produced newspaper, radio and television coverage. In addition, we posted requests for callers on several university bulletin boards and national computer newsgroups. Annotation ---------- Each file in the corpus was listened to and transcribed by two transcribers. Any differences between the two transcribers' transcriptions were examined and resolved. Some of the utterances were phonetically transcribed using a TIMIT-like phonetic alphabet. The transcription followed conventions that provided the ground work for the more elaborate conventions described in The CSLU Labeling Guide. Protocol -------- Protocol for First 3000 Calls The caller heard the following instructions: Thank you for calling the OGI speech research laboratory. We are developing a computer system to recognize spelled names. To do this, we need to record samples of speech from many speakers. We will ask you to say and spell your last name, your first name, and to say the city and state you grew up in. The rest of the call will take about one minute. Please wait for the beep before speaking. What city are you calling from? What is your last name? Please spell your last name. Please spell your last name, with short pauses between letters. Does your last name contain the letter A as in apple? What is your first name? Please spell your first name, with short pauses between letters. What city and state did you grow up in? Would you like to receive more information about the results of this project? Thank you for calling. We appreciate your help. If you would like to receive information about the results of this project, please leave your name and address at the tone. Protocol for Calls 3000--3647 After 3000 speakers were recorded, the protocol was changed to guarantee more instances of each letter by asking speakers to recite the English alphabet. In addition, three additional words were added to the protocol. The new protocol consisted of the following instructions: Thank you for calling the OGI speech research laboratory. We are developing a computer system to recognize spelled names. To do this, we need to record samples of speech from many speakers. We will ask you to spell your last name, say where you are calling from and where you grew up, and to say the alphabet. Please wait for the beep before speaking. What city are you calling from? What is your last name? Please spell your last name. What city and state did you grow up in? Please say ``apostrophe'' Please say ``capital'' Please say ``hyphen'' We will now ask you to say the alphabet. We need you to pause briefly between letters, like this: A B C D E F G. You may hang up when you are finished. Please begin speaking now. Miscellaneous Information ------------------------- The files phonedb-low.txt (calls 61-2000) and phonedb-high.txt (calls 2001-4218) are the log files created when the calls were processed. For each call there are a number of entries all of which begin with the call number and the response type, e.g. "1652 say_lname". The second field in each line is either a response type, e.g. callfrom, or a keyword that indicates global (applies to the whole call) information, e.g. age. The corpus log is divided into two files due to its large size. For this release, the log files are stored in the /docs directory.