CSLU: Portland Cellular Telephone Speech Version 1.3


Introduction

CSLU: Portland Cellular Telephone Speech Version 1.3, LDC2008S01, ISBN 1-58563-463-8 was created by the Center for Spoken Language Understanding (CSLU) at OGI School of Science and Engineering, Oregon Health and Science University, Beaverton, Oregon. It consists of cellular telephone speech and corresponding transcripts, specifically, 7,571 utterances from 515 speakers who made calls in the Portland, Oregon area using cellular telephones.

Speakers called the CSLU data collection system on cellular telephones, and they were asked to repeat certain phrases and to respond to other prompts. Two prompt protocols were used: an In Vehicle Protocol for speakers calling from inside a vehicle and a Not in Vehicle Protocol for those calling from outside a vehicle. The protocols shared several questions, but each protocol contained distinct queries designed to probe the conditions of the caller's in vehicle/not in vehicle surroundings. Not every caller provided a response to each prompt.

Directory Structure

docs/ The documentation directory. This directory contains further documentation for CSLU: Portland Cellular Telephone Speech Version 1.3.
labels/ Phonetic labeling directory. This directory contains phonetic labels and phonetic transcriptions for corresponding speech files.
misc/ Miscellaneous directory. This directory contains software tools and scripts.
speech/ Speech directory. This directory contains the actual .wav files; there are subdirectories within this directory based on the speaker's ID number.
trans/ Transcriptions directory. This directory contains orthographic transcriptions for most of the speech files.

Recording Details

The speeech data was captured digitally from CSLU's T1 connection and saved as 8 khz, 16-bit linear.

Transcriptions

The text transcriptions in this corpus were produced using the non time-aligned word-level conventions described in The CSLU Labeling Guide, which is included in the documentation for this release. CSLU: Portland Cellular Telephone Speech contains orthographic and phonetic transcriptions of corresponding speech files. Non time-aligned orthographic transcriptions provide quick access to the content of an utterance; they may contain markers for word boundaries to support access and retrieval at the lexical level. Phonetic/phonemic transcriptions represent the phonetic content of an utterance at a given level of detail that is made explicit by the use of diacritics. Phonetic phenomena transcribed includes excessive nasalization, glottalization, frication on a stop, centralization, lateralization, rounding and palatalization.

Protocol

Each caller was asked the same initial set of questions set forth below. The string in {} after the prompt is a "type key" used to identify those utterances that are responses to the corresponding prompt.

Thank you for calling the Oregon Graduate Institute. In collaboration with Cellular One we are collecting speech samples from cellular phones. The speech samples you provide will allow us to perform basic research that may lead to improved services for cellular phone users. We will share the speech data you provide with other researchers, but your identity will be kept confidential.

Are you calling from within a vehicle? Please say yes or no. {yorn}

The first four questions provide us with background information. Please wait for the beep before speaking. {iv_inst1}

Are you male or female? {morf}

What is your native language? {nlang}

What city and state did you grow up in? {growup}

What is your date of birth? {dob}

In vehicle protocol

Those calling from within a vehicle were asked the following specific questions once the background portion was complete:

The answers to the next set of questions will provide information about driving conditions and the recording conditions in your car.

Please tell us if your window is open, or if you are using the windshield wipers, heater or radio? {environ}

Briefly describe the traffic conditions. {traffic}

About how fast are you traveling right now? {fast}

Are you using a digital or analog phone? {dora}

If you know the brand and model of your cellular phone, Please tell us now {brand}

Are you using your phone's handset or a mounted microphone? {horm}

The answers to the next questions will provide us with some background information about you and some examples of spoken digits and letters.

Please say your last name. {lastname}

Please spell your last name. {spelllastname}

Please say a familiar license plate number. {flpnum}

Please say a familiar phone number. {fphone}

What time is it now? {time}

Please say another phone number. {phone2}

What is today's date? {date}

Please say the days of the week. {week}

The last question is designed to provide samples of natural continuous speech. When you hear the beep, we would like you to talk for about half a minute and:

tell us something about yourself. {story1}

describe a typical day in your life. {story2}

tell us what you like most about where you live. {story3}

tell us about your family. {story4}

tell us about your dream home. {story5}

tell us something about the town where you grew up. {story6}

tell us about your favorite restaurant. {story7}

tell us about your favorite sport or hobby. {story8}

tell us about your favorite movie or television show. {story9}

We would like you to keep talking until you hear two beeps. We will give you a moment to collect your thoughts. Please begin speaking at the beep.

Thanks again for your help. If you would like to receive a gift certificate to McDonalds, TCBY, Bdaltons books, Baskin-Robbins, or Blockbuster video, please let us know which one, and leave your name and address.

Hang up when you are finished. {address}

Not in vehicle protocol

Those calling from outside a vehicle were asked the following questions once the background portion was complete:

The answers to the next questions will provide information about the source of background noise during your call.

Please describe your location. {location}

Please identify any background noises that we may be hearing while you speak. For example, is the radio or TV on? Are there other people speaking nearby? {bnoise}

Are you using a digital or analog phone?

If you know the brand and model of your cellular phone, Please tell us now

Are you speaking directly into your phone's handset or a speaker phone? {horm_niv}

The answers to the next questions will provide us with Some background information about you and some examples of spoken digits and letters.

Please say your last name.

Please spell your last name.

Please say a familiar license plate number.

Please say a familiar phone number.

What time is it now?

Please say another phone number.

What is today's date?

Please say the days of the week.

The last question is designed to provide samples of natural continuous speech. When you hear the beep, we would like you to talk for about half a minute and:

tell us something about yourself.

describe a typical day in your life.

tell us what you like most about where you live.

tell us about your family.

tell us about your dream home.

tell us something about the town where you grew up.

tell us about your favorite restaurant.

tell us about your favorite sport or hobby.

tell us about your favorite movie or television show.

We would like you to keep talking until you hear two beeps. We will give you a moment to collect your thoughts. Please begin speaking at the beep.

Thanks again for your help. If you would like to receive a gift certificate to McDonalds, TCBY, Bdaltons books, Baskin-Robbins, or Blockbuster video, please let us know which one, and leave your name and address. Hang up when you are finished

Statistics

Set forth below are the total number of utterances per type key:

bnoise 234
brand 380
date 359
dob 405
dora 396
environ 167
fast 165
flpnum 370
fphone 386
growup 405
horm 165
horm_niv 226
lastname 749
location 232
morf 417
nlang 411
phone2 364
spelllastname 382
story1 38
story2 38
story3 34
story4 37
story5 43
story6 43
story7 41
story8 44
story9 38
time 357
traffic 167
week 360
yorn 500

 

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2008S01.

Content Copyright

Portions © 1995, 1998, 2000, 2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2008 Trustees of the University of Pennsylvania


Contact: ldc@ldc.upenn.edu
© 2007 Linguistic Data Consortium, Trustees of the University of Pennsylvania. All Rights Reserved.