National Cellular Corpus Release 2.3 Center for Spoken Language Understanding UPDATED: 22 September 2002 Overview -------- The Cellular Corpus consists of cellular telephone speech from 2000 callers from locations throughout the United States. The data collection protocol contains requests for fixed vocabulary and continuous speech utterances. A total of about one minute of speech from each caller is collected. Recording Conditions -------------------- The data were collected with the CSLU T1 digital data collection system described in "Digital Data Collection at CSLU" (please see our web site ). The sampling rate was 8khz and the files were stored in 8 bit mu-law format on a UNIX file system. The 2.2 release offers the files in 16-bit linearly encoded Windows wav (riff) format. Transcription ------------- Each utterance in the National Cellular corpus has an orthographic transcription. The transcriptions are in the /trans directory. Protocol -------- Here is the protocol used for the data collection. These are the prompts that each caller heard when they called the system. After saying the introduction, we asked if the caller is calling from within a vehicle or not. If the caller was calling from within a vehicle the in vehicle portion of the protocol was used, otherwise the not in vehicle part was used. At the end of the protocol we ask for the caller's name and address. This information is kept confidential and is provided so that we can send a gift certificate. Callers who do not want a gift certificate do not have to leave this information. Introduction Thank you for calling the Oregon Graduate Institute. In collaboration with Cellular One we are collecting speech samples from cellular phones. The speech samples you provide will allow us to perform basic research that may lead to improved services for cellular phone users. We will share the speech data you provide with other researchers, but your identity will be kept confidential. Are you calling from within a vehicle? Please say yes or no. In vehicle protocol The first four questions provide us with background information. Please wait for the beep before speaking. Are you male or female? What is your native language? What city and state did you grow up in? What is your date of birth? The answers to the next set of questions will provide Information about driving conditions and the recording conditions in your car. Please tell us if your window is open, or if you are using the windshield wipers, heater or radio? Briefly describe the traffic conditions. About how fast are you traveling right now? Are you using a digital or analog phone? If you know the brand and model of your cellular phone, Please tell us now. Are you using your phone's handset or a mounted microphone? The answers to the next questions will provide us with Some background information about you and some examples of spoken digits and letters. Please say your last name. Please spell your last name. Please say a familiar license plate number. Please say a familiar phone number. What time is it now? Please say another phone number. What is today's date? Please say the days of the week. The last question is designed to provide samples of natural continuous speech. When you hear the beep, we would like you to talk for about half a minute and - tell us something about yourself. - describe a typical day in your life. - tell us what you like most about where you live. - tell us about your family. - tell us about your dream home. - tell us something about the town where you grew up. - tell us about your favorite restaurant. - tell us about your favorite sport or hobby. - tell us about your favorite movie or television show. We would like you to keep talking until you hear two beeps. We will give you a moment to collect your thoughts. Please begin speaking at the beep. Thanks again for your help. If you would like to receive a gift certificate to McDonalds, TCBY, B. Dalton's Books, Baskin-Robbins, or Blockbuster Video, please let us know which one, and leave your name and address. Hang up when you are finished. Not in vehicle protocol The first four questions provide us with background information about the caller. Please wait for the beep before speaking. Are you male or female? What is your native language? What city and state did you grow up in? What is your date of birth? The answers to the next questions will provide information about the source of background noise during your call. Please describe your location. Please identify any background noises that we may be hearing while you speak. For example, is the radio or TV on? Are there other people speaking nearby? Are you using a digital or analog phone? If you know the brand and model of your cellular phone, please tell us now. Are you speaking directly into your phone's handset or a speaker phone? The answers to the next questions will provide us with some background information about you and some example oof spoken digits and letters. Please say your last name. Please spell your last name. Please say a familiar license plate number. Please say a familiar phone number. What time is it now? Please say another phone number. What is today's date? Please say the days of the week. The last question is designed to provide samples of natural continuous speech. When you hear the beep, we would like you to talk for about half a minute and - tell us something about yourself. - describe a typical day in your life. - tell us what you like most about where you live. - tell us about your family. - tell us about your dream home. - tell us something about the town where you grew up. - tell us about your favorite restaurant. - tell us about your favorite sport or hobby. - tell us about your favorite movie or television show. We would like you to keep talking until you hear two beeps. We will give you a moment to collect your thoughts. Please begin speaking at the beep. Thanks again for your help. If you would like to receive a gift certificate to McDonalds, TCBY, B. Dalton's Books, Baskin-Robbins, or Blockbuster Video, please let us know which one, and leave your name and address. Hang up when you are finished. In Vehicle and Not-In Vehicle percentages ----------------------------------------- For release 2.x, approximately 38% of the callers were calling from within a vehicle. This is based on the number of "how fast are you traveling" answers received. Of those calls, only about 3% indicated that they were using a mounted microphone.