----------------------------------------------------------- Description of the CallHome telephone speech and transcript corpus for Spanish ----------------------------------------------------------- CONTENTS 1. Summary abstract 2. Data acquisition 3. Data verification 4. Speaker demographics 5. Data transcription - General 6. Data transcription - Spanish-specific 6.1. Spanish transcription symbol table ----------------------------------------------------------------------- 1. Summary abstract The CallHome Spanish corpus of telephone speech was collected and transcribed by the Linguistic Data Consortium, primarily in support of the project on Large Vocabulary Conversational Speech Recognition (LVCSR), sponsored by the U.S. Department of Defense. This release of the CallHome Spanish corpus consists of 120 unscripted telephone conversations between native speakers of Spanish. The transcripts cover a contiguous 5 or 10 minute segment (see section 2 below) taken from a recorded conversation lasting up to 30 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends overseas. All calls originated in North America. The distribution of call destinations can be found in the file "spkrinfo.tbl". The transcripts are timestamped by speaker turn for alignment with the speech signal, and are provided in standard orthography. ----------------------------------------------------------------------- 2. Data acquisition Speakers were solicited by Rutgers and the LDC to participate in this telephone speech collection effort through personal contacts and via the internet. A total of 200 call originators were found, each of whom placed a telephone call via a toll-free robot operator maintained originally by Rutgers University, and later by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at Rutgers or the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Each caller was allowed to place only one telephone call. In all, 200 calls were transcribed. Of these, 80 have been designated as training calls, 20 as development test calls, and 100 as evaluation test calls. For each of the training and development test calls, a contiguous 10-minute region was selected for transcription; for the evaluation test calls, a 5-minute region was transcribed. For the present publication, only 20 of the evaluation test calls are being released; the remaining 80 test calls are being held in reserve for future LVCSR benchmark tests. ----------------------------------------------------------------------- 3. Data verification After a successful call was completed, a human audit of each telephone call was conducted to verify that the proper language was spoken, to check the quality of the recording, and to select and describe the region to be transcribed. The description of the transcribed region provides information about channel quality, number of speakers, their gender, and other attributes. The information from this audit may be found in the file "callinfo.tbl", and its contents are described in greater detail in "callinfo.doc". ----------------------------------------------------------------------- 4. Speaker demographics Information on speaker demographics can be found in the file spkrinfo.tbl, whose contents are described in the file spkrinfo.doc. ----------------------------------------------------------------------- 5. Data transcription - General Transcription was carried out by Texas Instruments under contract to the LDC. Below are the general transcription instructions given to transcribers by TI: CALLHOME TRANSCRIPTION CONVENTIONS - General (TI) 1. Transcribe "verbatim", without correcting grammatical errors. 2. Do not try to imitate pronunciation details, including accents and mispronunciations. Write the words that you believe the speaker intended, using standard orthography. 3. Speaker identification: Label each speaker with A: or B: at the beginning of the line. Use A: for the lower speaker and B: for the upper speaker in the waveform. (A will be the person calling from the U.S., and B the person overseas.) If there is more than one speaker at one end of the conversation (e.g. the telephone is passed around, or multiple extensions in use), add numbers for each new speaker: B: (the first speaker on side B) B1: (a different speaker) B2: (yet another speaker) Try to label the speakers consistently. For example, if the first speaker returns, use "B:" again. 4. Speaker turns: Begin each speaker turn on a new line. Do not put carriage returns within a speaker line. (Don't worry if the screen shows a break in the middle of a word.) Each speaker turn begins and ends with a pause. That is, each continuous stretch of speech is transcribed as one turn. Any simultaneous speech on the other channel is transcribed separately, after the current turn is completed. Example: (x indicates speech, - indicates silence) channel B: xxxxxxxxxxxxxxxxxxxxxxxxx---------xxxxxxxxx-- channel A: -------xxx-----xxx-----xxxxxxxxxxxxxxx-------- time 0 1 2 3 4 5 sequence of turns in the transcription (times are not exact): 0.1 3.1 B: 1.0 1.3 A: 2.0 2.3 A: 3.0 5.0 A: 4.6 5.9 B: A "turn" consisting entirely of noise is transcribed only if it is a vocal tract noise from the talker (laugh, cough, etc.) - see 7 below. Channel noise is NOT transcribed. 5. Simultaneous speech on the same channel: If two people are speaking on the same channel (an extension phone or a speaker phone), and if they speak simultaneously, put pound signs # around the words spoken simultaneously. Example: B: #Oh, how interesting.# B1: #That's good news.# If only part of the utterance is simultaneous, mark only the part that is simultaneous, but transcribe the entire utterance as one turn. Put the other speaker's utterance on the next line, with its times. Example: 10.5 12.5 B: Well, I agree with you. #I think# you're right. 11.5 12.0 B1: #Oh yes, yes.# Note that # is used only for simultaneous speech on the same channel. Simultaneous speech on different channels is identifiable as such by reference to the time marks. 6. Partial words: If a speaker does not finish a word, write as much as you heard and end it with a hyphen. Put a space after the hyphen, but no space before it. 7. Non-speech sounds: a) Sounds made by the talker: When the participants in the conversation make sounds that are not speech, indicate them using a label between braces, for example: {cough} {laugh} Example: A: Oh, that's funny. {laugh} {cough} Excuse me, I have a cold. If the talker makes one of these sounds as an entire turn, transcribe it and show the times, for example: 340.0 342.0 A: {laugh} b) Other sounds: Mark other sounds using brackets [ ]. This includes background noises, background speech, and noises on the line. Mark these sounds only when they are clearly audible and about as loud as the speech. If they are hard to hear, or quieter than the speech, then ignore them. Also, do not transcribe noises that occur when no one on that channel is speaking, even if the noises are loud and clear. For example, if B is speaking and there is a loud noise on channel A (which is not made by speaker A), do not transcribe it. Examples: A clearly audible noise occurs during speech: A: Yes [noise]. If the event being described lasts longer than a few words, then indicate the beginning in braces [ ], and the end in braces with a "/", [/ ]. For intermittent sounds, mark the beginning and end of the intermittent occurrence of the sound - not the beginning and end of each individual occurrence. Example: A: Well, it all depends, uh, on, you know, [baby crying] how the family reacts. I mean, it can be a positive or a negative thing, you know? B: Yes, you're right. A: So it's difficult to say what's best sometimes. [/baby crying] Note: Be sure to mark the end on the channel where it occurred (A, in the example above). If the noise ends while the other speaker is talking, mark it at the end of the turn of the speaker on the same channel. For example, if the baby stops crying while B is talking: A: Well, it all depends, uh, on, you know, [baby crying] how the family reacts. I mean, it can be a positive or a negative thing, you know? [/baby crying] B: Yes, you're right. A: So it's difficult to say what's best sometimes. 8. Speech to someone in the background: If the speaker talks to someone in the background, put the speech between double slash marks. Examples: A: Just a minute. // Mary, please bring me a pencil. // A: Sm //una llamada de// ?quieres hablar un poquito con tu papa? 9. When a word or phrase is not clear, type double parentheses (( )) around what you think you hear. If there is no way to tell what the speaker said, leave one blank space between the double parentheses, indicating speech has been left out because it was unintelligible. Examples: A: So when I finally did ((take up)) the violin, I progressed pretty quickly in the beginning. B: Of course, that was in college which was a long time ago, so (( )) I remember. 10. Comments To put a comment in the transcription, use double square brackets: [[comment]] Comments should be used very sparingly - only when there is no other way to indicate some unusual event. Notations describing noises should use single brackets, not double brackets (see #7). Examples of comments: [[speaker is singing]] [[speaker imitates a little child]] [[previous word is exceptionally prolonged]] Comments may be used to indicate the reason for unintelligible speech. Example: (( )) [[distortion]] However, use such comments sparingly. If there is consistent distortion, note it on the conversation summary sheet and do NOT put it in the transcription every time. The same is true for mumbling, rapid speech, etc. In other words, use comments only for unusual cases. ----------------------------------------------------------------------- 6. Data transcription - Spanish-specific (TI) 1. Do not use abbreviations. Write out all words, including numbers. 2. Use normal capitalization on proper names. 3. Punctuation: Use normal punctuation to the extent that it is reasonable. However, many utterances in conversation are not grammatically correct, so there may be no correct punctuation. In these cases, keep the punctuation simple. When in doubt, just use commas to separate phrases or clauses. 4. Letter names: Use capital letters surrounded by spaces to represent spoken letter names (as in spelling a word). Example: A: Ah, Austin, ?csmo se escribe--? B: A U S T O N. A: Ya. B: ah, T I N, perdsn. 4. Hesitation sounds: Use the following: aaa eee iii mmm emm amm imm If you want to use something else, put % at the beginning of the word so we know it's a hesitation sound. (example: %oo) NOTE: There are no occurrences of "%" in the Callhome Spanish transcripts. However, non-lexemes (interjections) are identified in the lexicon with the morphological tag "Int". 5. Different languages: When talkers use words in a language other than Spanish, or when they change languages for a short time, the speech that is not Spanish needs to be marked and the language labeled, if possible. a) Put angled brackets < > around speech that is not Spanish. b) Put the language name right after the left bracket. If you don't recognize the language, put ?. d) If you can transcribe the speech, put the transcription after the language name. If you can't transcribe it, mark it as unintelligible: (( )) Examples: A: Sm, y asm le dices, sabes que yo estoy tratando de que me den la para poder trabajar. A: .... ----------------------------------------------------------------------- 6.1. Spanish transcription symbol table {text} sound made by the talker {laugh} {cough} {sneeze} {breath} [text] sound not made by the talker (background or channel) [distortion] [background noise] [buzz] [/text] end of continuous or intermittent sound not made by the talker (beginning marked with previous [text]) [[text]] comment; most often used to describe unusual characteristics of immediately preceding or following speech (as opposed to separate noise event) [[previous word lengthened]] [[speaker is singing]] ((text)) unintelligible; text is best guess at transcription ((coffee klatch)) (( )) unintelligible; can't even guess text (( )) speech in another language ? indicates unrecognized language; (( )) indicates untranscribable speech text- partial word absolu- #text# simultaneous speech on the same channel (simultaneous speech on different channels is not explicitly marked, but is identifiable as such by reference to time marks) //text// aside (talker addressing someone in background) //quit it, I'm talking to your sister!// +text+ mispronunciation **text** idiosyncratic word, not in common use, not necessarily included in lexicon **poodle-ish** text -- marks end of interrupted turn and continuation -- text of same turn after interruption, e.g. A: I saw &Joe yesterday coming out of -- B: You saw &Joe?! A: -- the music store on &Seventeenth and &Chestnut. -----------------------------------------------------------------------