SWITCHBOARD: A User's Manual TABLE OF CONTENTS 1. Summary Abstract 2. Overview of directory and file structure 3. The .wav files 4. The .txt files 5. The .mrk files 6. Ancillary text files: database tables 7. Ancillary speech files: the collection prompts 8. How the data was collected 9. How the data was transcribed 10. The SWITCHBOARD dictionary 11. How the data was time aligned 12. Quality Control (QC) procedures 13. Technical problems in collection and processing 14. How to report errors 15. References ATTACHMENTS Attachment 1: SWITCHBOARD registration packet Attachment 2: SWITCHBOARD prompts -- description and text Attachment 3: Instruction manual for SWITCHBOARD transcribers 1. Summary Abstract SWITCHBOARD is a corpus of spontaneous conversations which addresses the growing need for large multispeaker databases of telephone bandwidth speech. Collected at Texas Instruments with funding by DARPA, the complete set of CD-ROMs includes about 2430 conversations averaging 6 minutes in length; in other terms, over 240 hours of recorded speech, and about 3 million words of text, spoken by over 500 speakers of both sexes from every major dialect of American English. Apart from sheer volume, however, it has a number of unique features designed to support telephone-based speech technology development as well as basic research on spontaneous conversational speech and language. First, SWITCHBOARD was collected without human intervention, under computer control. Interaction with the system was via touchtones and recorded instructions, but the two talkers, once connected, could "warm up" before recording began. From a human factors perspective, automation guards against the intrusion of experimenter bias, and guarantees a degree of uniformity throughout the long period of data collection. The protocols were further intended to elicit natural and spontaneous speech by the participants. The transcribers' ratings indicate that they perceived the conversations as highly natural. Second, the use of T1 lines and automatic switching software made it possible to collect the digital version of the speech signals directly from the telephone network, and also to isolate the two sides of the conversations. The goal was to have real telephone speech, routed through the public network, but with no degradation due to the collection system. Isolation of the callers, within the limits of network echo cancelling performance, permits researchers to train on each speaker's voice separately, and then test on either one or both speakers in any conversation. Third, the speech is fully transcribed, and the transcription conventions documented. Court reporters produced most of the verbatim transcripts, following a manual prepared specifically for the project. Their work was checked for formatting errors by an awk script, then twice more by humans during quality control (QC) inspections. Fourth, each transcript is accompanied by a time alignment file, which estimates the beginning time and duration of each word in the transcript in centiseconds. The time alignment was accomplished with supervised phone-based speech recognition, as described by Wheatley et al. [1]. The corpus is therefore capable of supporting not only purely text-independent approaches to speaker verification, but also those which make use of any degree of knowledge of the text, including phonetics. It should also facilitate studies of the phonetic characteristics of spontaneous speech on a scale not previously possible. Fifth, SWITCHBOARD has both depth and breadth of coverage for studying speaker characteristics. Forty eight people participated 20 times or more; this adds up to about, an hour of speech, enough for extensive training or modeling and for repeated testing with unseen material. Hundreds of others participated ten times or less, providing a pool large enough for many open-set experiments. Sixth, the participants' demographics, as well as the dates, times, and other pertinent information about each phone call, are recorded in relational database tables. Except for personal information about the callers, these tables are included with the corpus. The volunteers who participated provided information relevant to studies of voice, dialect, and other aspects of speech style, including age, sex, education, current residence, and places of residence during formative years. The exact time and the area code of origin of each call is provided, as well as a means of telling which calls by the same person came from different telephones. Many callers made calls from multiple handsets, in order to facilitate study of the effects of that variable on voice recognition. 2. Overview of Directory and File Structure There are 25 speech discs in the Switchboard Corpus (NIST Speech Discs 9-3.1 - 9-27.1). Each disc has a "readme.doc" file and a "swb1" subdirectory in the top-level directory. The "readme.doc" file contains information about the conversations on that disc. The "swb1" subdirectory contains the NIST SPHERE-headered binary files containing the speech by both speakers in each conversation. The wavefiles are named "swXXXX.wav" where XXXX is the conversation number. All transcription files for the Switchboard corpus are on one CD-ROM (NIST Speech Disc 9-1.1). This disc has a "readme.doc" file in the top-level directory and a "trans" subdirectory. The "trans" subdirectory contains a "phase1" subdirectory and a "phase2" subdirectory. The "phase1" subdirectory contains 15 subdirectories, one for each disc in Phase 1 of the Switchboard Corpus. The "phase2" subdirectory contains 10 subdirectories, one for each disc in Phase 2 of the Switchboard Corpus. These subdirectories contain the transcription files. The orthographic transcription files are named "swXXXX.txt" where XXXX is a conversation number. The time-aligned marked transcripts are named "swXXXX.mrk" where XXXX is a conversation number. For each word in the .txt file, the .mrk file gives an estimated start time and duration. In the following sections are examples illustrating the contents of each file type, and some information on the conventions used in writing them. 3. The .wav Files The information in the header of the file sw4940.wav can be read with the SPHERE utility h_read: speaker_id1 1423 speaker_id2 1662 recording_date 920508 recording_time 2204 conversation_id 4940 database_id SWB1 channel_count 2 sample_max1 4015.500000 sample_max2 4015.500000 sample_coding mu-law channels_interleaved TRUE sample_count 4798496 sample_rate 8000 sample_n_bytes 1 sample_sig_bits 8 "speaker_id1" is the number of the speaker who initiated the call. In the transcripts this speaker will be called "A". In the database tables the identification number will be under the attribute "CALLER_NO". "speaker_id2" is the number of the speaker who received the call. In the transcripts this speaker will be called "B". "recording_date" is in YYMMDD format, so the date of this conversation was May 8, 1992. "recording_time" is in HHMM format; recording of this call began at 10:04 p.m. CDT. "sample_max1" is the maximum amplitude of the signal on speaker_id1's channel, expressed as a positive linear value; 4015.5 is full scale. "sample_max2" is the maximum amplitude of the signal on speaker_id2's channel, which was also full scale. "sample_coding" tells how to interpret the binary data in the .wav file; these are coded as mu-law values, exactly as read from the digital telephone line. "channels_interleaved" has the value TRUE indicating that alternate bytes are the values from alternate channels; speaker_id1's data are the odd bytes, speaker_id2's data are the even bytes (where the first byte is byte 1); summing successive pairs gives the entire conversation. "sample_count" is the total number of bytes of speech data. Since there is one byte per sample, but both sides of the conversation are represented at each sample time, there are 16000 samples per second, or 960000 samples per minute. Thus a good rule of thumb is "one Megabyte per minute," so 4798496 samples represents nearly five minutes of speech. "sample_rate" is 8000 samples per second. "sample_n_bytes" is 1, the number of bytes per sample in the mu-law format. "sample_sig_bits" is the number of bits per sample value, which is 8. 4. The .txt Files The transcripts begin with a header-like section which can be ignored by skipping down to the line consisting entirely of "====". Some of this information matches the .wav header information, and was used to verify and maintain consistency between the two files when transcribers worked on the .txt files. The rest is information inserted by the original transcriber after completing the transcript, then reviewed and corrected if necessary by one or more QC transcribers. The instructions given to the transcribers for rating the difficulty, amount of echo or noise, etc. are found below in the section on "TRANSCRIPTION", but will be described very briefly here. A scale of 1 to 5 is used, where 1 implies good quality, easy to understand, etc., and 5 is bad quality, more difficult to deal with, etc. The header section of conversation 4940 is reproduced here for illustration: FILENAME: 4940_1423_1662 TOPIC#: 302 DATE: 920508 TRANSCRIBER: nk DIFFICULTY: 2 TOPICALITY: 1 NATURALNESS: 2 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 2 STATIC_ON_B: 1 BACKGROUND_A: 2 BACKGROUND_B: 2 REMARKS: None ============================================================ The first four lines are self-explanatory. "DIFFICULTY" means the overall difficulty of transcribing this conversation compared to the rest of the SWITCHBOARD conversations this transcriber has done. It is a subjective catch-all, designed to alert the user, where no other standard category of problem is noted, that there may be a soft-spoken, mumbling, or otherwise difficult-to-understand caller. The transcriber thought conversation 4940 was not very difficult, but harder than some. "TOPICALITY" refers to whether the callers conversed generally about what was suggested by the recorded prompt. Conversations were not rejected if callers strayed from the prompt, or even ignored it entirely. However, those who wish to group calls for vocabulary studies, language modelling, etc., may find this a useful guide. The transcriber thought that the speakers in conversation 4940 stayed right with the topic suggested by the prompt. "NATURALNESS" is another very subjective rating, intended partly to study after the fact how well the human factors in SWITCHBOARD succeeded in eliciting natural conversational speech. The transcriber felt this was a natural sounding conversation, but less so than some others. "ECHO_FROM_B" estimates how loud the crosstalk from the other channel (B) was on this channel (A), at the times when A was silent and B was talking. A score of "1" means inaudible or almost so; a score of "5" means the crosstalk was almost as loud as the speech on the A channel itself. This conversation apparently had little or no crosstalk in either direction. "ECHO_FROM_A" is the same estimate, but for the B channel. To make these ratings, of course, transcribers had to listen to each channel separately as well as the combined signal. "STATIC_ON_A" was intended to isolate the occurrence of electrical noises often described as static, some of which were caused by the collection system, from other types of unwanted acoustic signals on channel A. It is not clear how well the transcribers understood this distinction, so there may be many "false positives" from acoustic noise in a caller's environment. But in the cases where strong digital noise was present, they did seem to note it and lower the ratings accordingly. The transcriber heard some static in this conversation, and noted two places in the transcript where it occurred with the term [static]. "STATIC_ON_B" is the same for the other channel. None was noted in this conversation, hence a rating of 1. "BACKGROUND_A" refers to the presence of noise, including any unwanted signal of any kind, coming from the environment of caller A. In this example, the noise of people talking, children playing, and dishes being washed caused a rating of 2 on the A channel. "BACKGROUND_B" refers to the same on channel B. In this conversation there were also voices and children on the B side, and the same rating was given. "REMARKS" was a field for transcribers or QCers to insert unlimited free-form comments on the conversation; they were encouraged to note any unusual characteristics that might help in studying the speech, and especially any overall sources of difficulty not well identified in the ratings. For example, if one caller was eating all through the conversation, or had a head cold, this was the place to note it. The remainder of the .txt file is the verbatim transcript of what was said, with the speakers indicated by "A:" and "B:", and a number of conventional symbols and expressions which will be explained in the TRANSCRIPTION section below. Here are the first fifty lines of the file sw4940.txt, the example used above: FILENAME: 4940_1423_1662 TOPIC#: 302 DATE: 920508 TRANSCRIBER: nk DIFFICULTY: 2 TOPICALITY: 1 NATURALNESS: 2 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 2 STATIC_ON_B: 1 BACKGROUND_A: 2 BACKGROUND_B: 2 REMARKS: None ============================================================ A: Okay [children]. B: Okay Carol. So, air quality. A: Yeah. Is it, [noise] {sounds like water running and she is doing dishes} I know in here, uh, downtown Dallas, it's, you, I mean you drive by and you can just, you can see it. B: Uh-huh. A: But, then again [throat_clearing] I originally was from California and, uh, there is a big difference between Texas and California. #And, uh# -- B: #Surely.# A: -- they'd have their smog alerts and where you'd have to stay indoors for so many hours with an air conditioner. And, of course, they don't have that here in Texas so, [breathing] there's ... B: You mean they don't have the, uh, the smog alerts? A: No, not in, not in Te-, well not in Dallas, that is. B: Right. I, I, A: [throat_clearing]. B: yeah, I spent a summer i-, i-, in Tyler so I know, just east of Dallas there. A: Yeah. We're going there tomorrow. B: Oh, really #[laughter].# 5. The .mrk Files For ease of use the .mrk files are arranged in fixed records of four fields, where the first field is the speaker (A or B), the second is the estimated start time in seconds, the third is the estimated duration in seconds, and the fourth is the word whose start time and duration are estimated. A "word" in the transcript is sometimes not actually a spoken word, and in these cases an asterisk is placed in the start time and duration fields. This occurs for certain punctuation marks, for bracketed expressions indicating acoustic events other than speech of the callers, for transcribers' comments in braces, etc. The same convention is used also where there is simultaneous speech--one talker's words are time marked in that case, and the other's are left with asterisks in the time fields. The first 100 lines of file sw4940.mrk are reproduced here to illustrate some of these conventions. A 0.04 0.42 Okay A * * [children]. B 0.82 0.22 Okay B 1.06 0.34 Carol. B 3.58 0.34 So, B 3.92 0.20 air B 4.12 0.70 quality. A 5.40 0.22 Yeah. A 6.16 0.16 Is A 6.32 0.16 it, A * * [noise] A * * {sounds A * * like A * * water A * * running A * * and A * * she A * * is A * * doing A * * dishes} A 7.02 0.10 I A 7.12 0.22 know A 7.34 0.08 in A 7.42 0.30 here, A 7.80 0.22 uh, A 8.36 0.44 downtown A 8.80 0.46 Dallas, A 9.26 0.22 it's, A 9.60 0.20 you, A 9.82 0.10 I A 9.92 0.20 mean A 10.12 0.08 you A 10.20 0.28 drive A 10.52 0.26 by A 10.78 0.08 and A 10.86 0.08 you A 10.94 0.16 can A 11.10 0.24 just, A 11.96 0.10 you A 12.06 0.14 can A 12.20 0.40 see A 12.60 0.16 it. B 12.76 0.32 Uh-huh. A 13.78 0.38 But, A 14.34 0.42 then A 14.88 0.36 again A * * [throat_clearing] A 15.52 0.16 I A 15.90 0.54 originally A 16.58 0.22 was A 16.80 0.14 from A 16.94 0.66 California A 17.72 0.26 and, A 17.98 0.18 uh, A 18.60 0.16 there A 18.76 0.10 is A 18.86 0.08 a A 18.94 0.30 big A 19.28 0.58 difference A 20.36 0.34 between A 20.70 0.48 Texas A 21.18 0.12 and A 21.30 0.72 California. A * * #And, A * * uh# A * * -- B 22.56 0.34 #Surely.# A * * -- A 22.90 0.10 they'd A 23.00 0.28 have A 23.42 0.44 their A 23.86 0.42 smog A 24.28 0.34 alerts A 24.62 0.22 and A 25.50 0.10 where A 25.60 0.20 you'd A 25.80 0.10 have A 25.90 0.10 to A 26.00 0.48 stay A 26.48 0.44 indoors A 26.92 0.10 for A 27.04 0.24 so A 27.28 0.22 many A 27.50 0.30 hours A 27.80 0.16 with A 27.96 0.06 an A 28.06 0.12 air A 28.18 0.60 conditioner. A 28.78 0.16 And, A 28.94 0.02 of A 28.96 0.30 course, A 29.26 0.08 they A 29.34 0.12 don't A 29.46 0.14 have A 29.60 0.12 that A 29.72 0.14 here A 29.86 0.08 in A 29.94 0.52 Texas A 31.54 0.42 so, A * * [breathing] A 32.64 0.22 there's 6. Ancillary Text Files: Database Tables In the directory /swb1/tables on the transcription disc (NIST Speech Disc 9-1.1) are the tables containing information about the callers, conversations, etc. To design experiments with SWITCHBOARD, it is recommended that these tables be incorporated into a relational database management system (RDBMS) with at least the relations caller, conversation, and caller_conversation. The relations topic and rating may also be helpful. To insure anonymity, the names of the callers are not included in the tables, and the telephone numbers have been encoded as follows. The area code and first three digits of the phone number have not been altered. For each six-digit prefix, a list was made of all phone numbers. These lists were sorted into ascending order. For the first phone number in each list, we replaced the last four digits with "0000". For the second phone number in each list, we replaced the last four digits with "0001". This was done for all phone numbers in the tables so that it is still possible to tell when callers were using the same extension, but the actual phone number will not be revealed. Here are the suggested relations, and a few rows from the tables to illustrate their structure: The CALLER relation-- SQL> describe caller Name Null? Type ------------------------------- -------- ---- CALLER_NO NOT NULL NUMBER(4) SEX CHAR(6) BIRTH_YEAR NUMBER(4) DIALECT_AREA CHAR(13) EDUCATION NUMBER(1) REMARKS CHAR(120) SQL> select * from caller where caller_no < 1046; CALLER SEX BIRTH YEAR DIALECT EDU REMARKS ------ ------ ---------- -------- --- ------------------------ 1000 FEMALE 1954 SOUTH MIDLAND 1 1001 MALE 1940 WESTERN 3 1002 FEMALE 1963 SOUTHERN 2 1003 MALE 1947 NORTH MIDLAND 2 1004 FEMALE 1958 NORTHERN 2 1005 FEMALE 1956 WESTERN 2 1007 FEMALE 1965 NEW ENGLAND 2 1008 FEMALE 1939 MIXED 1 1010 MALE 1932 NEW ENGLAND 1 1011 FEMALE 1964 SOUTH MIDLAND 2 1013 FEMALE 1957 SOUTH MIDLAND 2 1014 FEMALE 1947 MIXED 1 1015 FEMALE 1967 NEW ENGLAND 2 1016 FEMALE 1945 SOUTHERN 2 1018 FEMALE 1962 SOUTH MIDLAND 3 1019 MALE 1941 NEW ENGLAND 3 1020 FEMALE 1956 NORTH MIDLAND 2 1021 MALE 1957 NORTHERN 3 1022 FEMALE 1959 SOUTH MIDLAND 2 1023 MALE 1939 SOUTHERN 2 1024 MALE 1964 NORTH MIDLAND 2 1025 MALE 1953 SOUTH MIDLAND 2 1026 FEMALE 1957 SOUTHERN 2 1027 FEMALE 1961 NORTH MIDLAND 2 1028 MALE 1965 NYC 3 1031 FEMALE 1940 SOUTH MIDLAND 3 1032 FEMALE 1943 SOUTHERN 2 1033 FEMALE 1965 SOUTH MIDLAND 1 1034 MALE 1961 NORTHERN 3 1035 FEMALE 1953 NORTH MIDLAND 2 1037 MALE 1947 WESTERN 3 1038 FEMALE 1963 UNK 2 1039 MALE 1943 SOUTHERN 3 The CONVERSATION relation-- SQL> describe conversation Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(5) CALLER_FROM NUMBER(4) CALLER_TO NUMBER(4) IVI_NO NUMBER(4) TALK_DAY CHAR(7) TIME_START NUMBER(6) TIME_STOP NUMBER(6) REMARKS CHAR(240) SQL> / CONVERSATION NO CALLER FROM CALLER TO IVI NO TALK DAY TSTART TSTOP REMARKS --------------- ----------- --------- ------ -------- ------- ------- ----------------------- 2030 1071 1123 334 910306 1909 1919 2031 1151 1126 353 910306 1912 1922 2032 1167 1093 308 910306 1929 1937 2033 1078 1024 360 910306 2056 2106 2034 1000 1083 356 910307 1701 1706 2035 1176 1107 358 910307 1721 1726 2036 1013 1063 309 910307 1751 1757 2037 1132 1175 336 910307 1828 1838 2038 1073 1039 346 910307 1849 1859 2039 1152 1101 339 910307 1911 1919 2040 1130 1119 309 910307 1951 2001 2041 1110 1179 356 910307 2038 2048 2042 1221 1219 310 910307 2117 2127 2043 1169 1139 315 910307 2122 2132 2044 1219 1005 313 910307 2134 2144 2045 1033 1055 364 910307 1834 1840 The CALLER_CONVERSATION relation-- SQL> describe caller_conversation Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(5) CALLER_NO NOT NULL NUMBER(4) PHONE_NUMBER CHAR(10) LENGTH NUMBER(6) IVI_NO NOT NULL NUMBER(4) REMARKS CHAR(240) ACTIVE CHAR(1) Note: IVI_NO is the number of the recorded prompt which was played before the conversation. See TOPIC below. SQL> select * from caller_conversation; CONVERSATION NO CALLER PHONE NUMBER LENGTH IVI NO REMARKS --------------- ------ ------------ ------- ------ ------- 2022 1138 2145301431 5 357 2022 1107 2144148439 5 357 2023 1033 9034655243 10 304 2023 1135 7132743525 10 304 2024 1016 2149950386 7 311 2024 1061 2149953417 7 311 2025 1061 2149953417 6 341 2025 1064 2143177874 6 341 2026 1013 8174976701 4 311 2026 1073 3157620226 4 311 2027 1096 2144366786 9 303 2027 1035 2149171371 9 303 2028 1086 2144248977 10 313 2028 1101 3015403172 10 313 2029 1022 2144120124 7 349 2029 1051 8179642327 7 349 2030 1071 2145308909 10 334 2030 1123 5134335177 10 334 The TOPIC relation-- SQL> describe topic Name Null? Type ------------------------------- -------- ---- TOPIC_DESCRIPTION CHAR(30) IVI_NO NUMBER(4) PROMPT CHAR(240) FLG CHAR(1) REMARKS CHAR(120) PROMPT_CONT CHAR(50) SQL> select topic_description, ivi_no, prompt, prompt_cont from topic; DESCRIPTION IVI NO PROMPT -------------------- ------ ------------------------------------------ PROMPT_CONT -------------------------------------------------- PUBLIC EDUCATION 353 DISCUSS WITH THE OTHER CALLER WHETHER THERE IS SOMETHING SERIOUSLY WRONG WITH OUR PUBLIC SCHOOL SYSTEMS TODAY, AND IF SO, WHAT CAN BE DONE TO CORRECT IT. DRUG TESTING 354 HOW DO YOU FEEL ABOUT THE PRACTICE OF SOME COMPANIES OR GOVERNMENT AGENCIES TESTING EMPLOYEES OR PROSPECTIVE EMPLOYEES FOR DRUGS? IS RANDOM SPOT TESTING JUSTIFIED? WHAT LIMITS SHOULD THERE BE, IF ANY? FEDERAL BUDGET 359 WHAT SHORT AND LONG-TERM STEPS DO YOU AND THE OTHER CALLER THINK SHOULD BE TAKEN TO IMPROVE THE US BUDGET? FISHING 360 FIND OUT WHAT KIND OF FISHING THE OTHER CALLER ENJOYS. DO YOU HAVE SIMILAR OR DIFFERENT INTERESTS IN THE KIND OF FISHING YOU ENJOY? GARDENING 361 FIND OUT WHAT THE OTHER CALLER DOES IN THE WAY OF LAWN AND GARDEN WORK.DOES THE OTHER CALL ENJOY DOING IT? COMPARE THIS TO YOUR OWN SITUATION. BASEBALL 365 FIND OUT THE OTHER CALLER'S FAVORITE PRO BASEBALL TEAM AND WHERE IT'S HEADED THIS YEAR. DO YOU AGREE WITH THE CALLER'S PREDICTION? SQL> describe rating Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(4) DIFFICULTY NUMBER(1) TOPICALITY NUMBER(1) NATURALNESS NUMBER(1) ECHO_A NUMBER(1) ECHO_B NUMBER(1) STATIC_A NUMBER(1) STATIC_B NUMBER(1) BACKGROUND_A NUMBER(1) BACKGROUND_B NUMBER(1) REMARKS CHAR(120) SQL> select * from rating; CONVERSATION NO DIFFICULTY TOPICALITY NATURALNESS ECHO_A ECHO_B --------------- ---------- ---------- ----------- ---------- ---------- STATIC_A STATIC_B BACKGROUND_A BACKGROUND_B ---------- ---------- ------------ ------------ REMARKS -------------------------------------------------------------------------------- 2001 1 1 2 1 3 1 1 1 1 2002 3 1 1 1 2 4 4 1 3 2003 4 2 1 3 2 5 5 1 1 2004 1 1 1 1 1 2 4 1 1 2005 4 1 2 3 3 1 1 2 2 2006 1 1 1 1 1 1 1 1 1 2007 1 1 1 1 4 1 2 1 1 2008 1 1 3 3 1 1 3 1 3 2009 1 1 1 4 2 1 1 1 1 2010 1 1 1 3 3 1 2 1 1 7. Ancillary Speech Files: The Collection Prompts The speech prompts which the callers heard over the phone were recorded by a female employee of Texas Instruments (Jane McDaniel) under laboratory conditions and digitized as 16-bit, 16 KHz samples. They were later filtered, downsampled to 8 KHz, and converted to 8-bit mu-law form before being transferred over the local network to the Robotoperator disk for use as prompts. In order to permit researchers to reconstruct the collection protocol, a number of the "direction" prompts and all of the topic prompts have been included. In the directory /swb1/prompts on the transcription disc (NIST Speech Disc 9-1.1) are NIST SPHERE-headered files containing the prompts. Also in this directory is a shell script named "demo.sh" that will demonstrate what prompts callers would have heard when setting up a Switchboard call. Along with each prompt is the "Topic Description," a word or short phrase which summarizes its content. 8. How The Data Was Collected HARDWARE: the Robotoperator The search for an off-the-shelf hardware platform capable of meeting the SWITCHBOARD requirements led to the "Robotoperator," a PC-based voicemail and call management system from InterVoice, Inc. (IVI). The Robotoperator typically answers, transfers, forwards or otherwise handles incoming calls using touchtone detection and stored messages; it can also make outgoing calls, record speech directly from a T1 line, and consult a database (e.g., of bank account balances) to make decisions. See Figure 1 for a diagram of the hardware setup. Note that the Robotoperator includes a T1 interface and a software controllable switching network ("Switchware"), which can interconnect any of the T1 channels with each other and/or with the PS/2's message file on disk. This was a key capability for SWITCHBOARD which was lacking in other telephone interfaces: two callers on the line with the Robotoperator simultaneously could become one two-way telephone conversation by connecting the Transmit (T) side of one to the Receive (R) side of the other. But at the same time, by recording to disk from each T side separately (as if the two callers were leaving distinct "messages"), it was possible to isolate the two sides of this conversation. The isolation might not be perfect, since signal reflections ("echos") often occur on the telephone network, but it would be far better than one could achieve by processing a single channel version after the fact. SOFTWARE: the IVI application program The software on the Robotoperator was licensed with the system. To achieve the functionality required by customers, IVI provides a user application program, which is created with a fourth generation programming language interface. Although customers can learn to use this interface themselves, programming and debugging of the first user application is provided by IVI as part of the Robotoperator purchase and licensing agreement. The functional block diagram of Figure 2 contains the essentials of the application program. The basic idea of achieving the SWITCHBOARD scenario with the Robotoperator can be followed from the diagram. An incoming phone call is treated like a call to a business (e.g., a bank) in which a customer interacts with the computer via touchtones and recorded prompts, requests information (e.g., his account balance) which must be retrieved from a database, adds information to the database, and leaves a digitally recorded voice message. Meanwhile, the system makes an outgoing call to another customer who, if he wishes, follows the recorded prompts through a similar transaction, and also leaves a message. In this respect the functions of the Robotoperator are not unlike those of its customary commercial applications. The unusual requirement of SWITCHBOARD is that the computer coordinate the two calls, cause them to be connected together at a certain point, and start and end recording of the two talkers' messages at the same time. The application depends heavily on the Robotoperator's database manager, which is a version of Btrieve for the PS/2. With SWITCHBOARD, each completed call changed the conditions for future calls, so dynamic database management was a necessity. In practice, a new database was loaded under Btrieve weekly, in the form of four database tables that were both read and written to, which controlled events for the seven days, and another which was written by Btrieve to log completed calls. The recorded calls and the log file were transferred daily to the TI Speech Research computer system, where the speech files were processed and the log file used to update the ORACLE database (see above, section 6). The other IVI database files were saved and archived weekly. The remainder of this section will describe in detail how the collection system operated; first the internal database tables (the ones on the Robotoperator) must be explained, then a sample call can be used to illustrate the process. One table, PINTOPIC, was keyed to callers' Personal Identification Numbers (PIN), and listed which of the seventy possible topics each registered participant was willing to talk about. One field was reserved for a flag indicating whether he or she had actually completed a call on the topic listed. Here are sample rows of a PINTOPIC table, with spaces inserted for readability: 4533 301 0 (caller 4533, can talk on topic 301, has not done so yet) 4533 356 0 (caller 4533, can talk on topic 356, has not done so yet) 4533 328 1 (caller 4533, can talk on topic 328, has already done so) 3429 301 1 3429 305 0 6798 301 0 6798 325 0 6798 356 1 . . . A second table, TOPICPIN, contained the same information but was keyed to the topic. This table was searched by topic to find a prospective partner to be called by the system. 301 4533 0 (topic 301, caller 4533 is still a possible partner) 356 4533 0 328 4533 1 (topic 328, caller 4533 has already spoken on this) 301 3429 1 305 3429 0 301 6798 0 325 6798 0 356 6798 1 . . . The third table, CALLER, was keyed to day of the week and PIN, and had fields for: PIN, day of the week, phone number to call during the person's first period of availability on that day, phone number for the second period, phone number for the third period, starting time for the first period, ending time, starting time for the second period, ending time, starting and ending time for the third period, and a counter of calls completed on that day. 4533 1 2149950651 2149950651 2149919112 0800 0930 1700 1830 2000 2130 0 (Monday: 3 time slots, no calls completed) 4533 2 2149950651 2149950651 2149919112 0800 0930 1700 1830 2000 2130 1 (Tuesday: same schedule, one call completed) 4533 3 2149950651 2149919112 0000000000 0800 0930 2000 2130 0000 0000 1 (Wednesday: 2 time slots, one call completed) 4533 4 2149950651 0000000000 0000000000 0800 0930 0000 0000 0000 0000 0 (Thursday: 1 time slot, no calls) . . . The fourth table, TALK, was written by the Robotoperator in the format: PIN_A, PIN_B, COUNTER_AB, after a completed call. 4533 6798 1 3429 6798 1 . . . Each row thus records a pairing of callers who have spoken to each other, and how many times. COUNTER_AB is the number of times these two callers have spoken to each other, so it would be incremented after the first call. However, the number of calls permitted was always kept at one. Another table, CONVER, created a log of successful calls. A new row of the CONVER table was written at the completion of each conversation. It contained the PINs of the callers, the phone number from which the call was made, the day of the week, a pointer to the B caller's time period (first, second, or third), the date (yymmdd), the time the incoming call was picked up (hhmmss), the start and end times of the recorded portion (hh:mm), the topic, and the "message numbers" for each side of the call (needed to retrieve the recordings.) 3429 6798 2148810028 1 1 920325 090832 9:15 9:20 0340 500 501 4533 2792 8175400128 1 2 920325 091829 9:20 9:24 0359 502 503 . . . In this example, a call was initiated by 3429 at 9:08:32 am; it must have taken several tries for the Robotoperator to find a partner (6798), since recording did not start until 9:15. Both callers heard the prompt about taxes (topic 340). Caller A's side of the recorded conversation, which lasted five minutes, could be found by extracting message number 500 from the Robotoperator's message file, and caller B's side by extracting number 501. The Robotoperator also produced two other files, HIST (for "history logging") and LOG (for "special event logging"). These recorded detailed information about transactions and their times of occurrence: the time of every attempted call, incoming and outgoing, whether it was a "ring, no answer" or "hangup" or "busy", which options were selected by a caller as a call progressed, etc. This information was used mainly for debugging and is not described further here. COLLECTION PROTOCOL The program supported a number of ancillary functions, such as taking messages and comments from callers, playing recorded instructions on how to participate, giving error handling messages for busy or no-answer conditions, etc. These can be seen as branches in the flowchart in Figure 3, following the obvious logical paths. To facilitate understanding of the collection protocol, however, it is probably best to step through a typical successfully completed SWITCHBOARD call. "A" will represent the person calling in, "B" the one called, and "R" the Robotoperator. It should be possible to follow this on the flowchart, taking the correct branch at each decision point. --Participant A initiates a call by dialing the 800 number. --R picks up on A's line and plays the recorded greeting: "Welcome to the Texas Instruments Switchboard. Please respond to questions by pressing the appropriate buttons on your touchtone phone. If you would like instructions, press 0. To make some brief comments about the system, press 1. To participate in a conversation now, press 2." --A presses 2. --R plays recorded prompt: "Please enter your personal identification number." --A presses four digits of his PIN, e.g., "4533". --R checks PIN against CALLER file and verifies PIN. --R prompts: "Thank you. Please enter the area code you are calling from." --A presses 3 digits of area code, e.g., "214". --R prompts: "Thank you. Now enter the 7-digit phone number you are calling from." --A presses 7 digits of phone number, e.g., "9950651". --R searches PINTOPIC for entries with A's PIN, a topic number (e.g., 301), and a "0" meaning "has not yet spoken on this topic", and takes the topic number from the first such entry (e.g., 45333010 --> 301). --R announces topic to A; for example: "Discuss with the other caller whether there is something seriously wrong with our public school systems today, and if so, what can be done to correct it." --R tells A to wait: "Please think about the topic while I locate another caller." --R searches TOPICPIN for entries with the chosen topic (301), another PIN (not 4533), and a 0 (not yet used this topic), and extracts the PIN (e.g., 30167980 --> 6798). --R searches the CALLER file for a match to the day of the week, the chosen PIN, and the current time of day, and checks the flag to be sure this caller has not completed a call on this day. If no match is found on the chosen PIN, TOPICPIN is searched again for another candidate. --Once a match is found, the database returns the phone number listed for the time slot which contains the current time, and R dials this number. --B answers the ring and hears the prompt: "Hello, this is Switchboard calling. If you are ready to participate, press 1; if the person participating can be called to the phone without delay, press 2; otherwise, press 3 to terminate the call." --B presses 1. --R prompts: "Please enter your Personal Identification Number." --B enters 4-digit number. --R verifies PIN, prompts B: "Discuss with the other caller", etc., the same prompt heard by A. --R connects A to B, and prompts both: "Welcome to both of you and thanks for participating. Recording will begin when the person who called in presses 1. Until then, you may introduce yourselves and get acquainted." --A and B converse for a while, without being recorded. --A presses 1. --R begins timing and recording two messages, one from each line. The association of a T1 channel with a message number having been made by the application software, "recording" is just writing the 8-bit mu-law values from the T1 interface to the disk without modification. --If the time limit (a software parameter setting, normally 5 minutes in the later conversations and 10 in the earlier ones) is reached, R stops recording and prompts A and B, while they are still connected: "We're sorry, but our recording capacity is limited today. Please try to wind up your conversation in the next 30 seconds. Good Bye." Although the recording ends just before the prompt is played, the callers are never really cut off; their call only terminates when one of them hangs up. --When A or B hangs up, R detects end of call, writes log information to the CONVER table, frees phone lines and resets program variables for another call. THE TALKERS Generally, the talkers were paid volunteers who gave written consent to the recording and use of their conversations. Their signatures are on file at TI along with their personal data and records of payment. Most were paid $5 cash per completed call; although TI employees received gifts of comparable cash value, and some callers refused payment of either kind. Additional premiums were paid to some who participated at least 25 times and used two or more different handsets in a systematic manner. Subjects were recruited in several ways. A number were volunteers drawn from, or recruited by, DARPA contractors and government agencies. An announcement on TNET, TI's internal electronic news service, drew responses from about 200 interested TI employees. Email to a number of institutions involved in speech research attracted several dozen more. Finally, a posting on some national electronic bulletin boards elicited several hundred replies. Anyone who responded was sent a registration form and a letter urging applicants to invite others to participate, which led to more applications. The letter and registration forms are included in Attachment 1. A total of 670 persons registered over the entire course of the project, and 542 participated in at least one of the published conversations of SWITCHBOARD. It was intended that the talkers be broadly representative of adult speakers of American English between 20 and 60 years of age. From the beginning, a bias toward higher socioeconomic and educational levels was considered inevitable, due to the requirements of the task. It was also recognized that a serious effort would be required to insure representation of all dialects. The consent form asked where the applicant grew up during the first 10 years of life. This community was then located on a wall map of the United States with the boundaries between the major dialect areas drawn on it. The names for these seven areas, plus the term "MIXED," were then used to classify each person by dialect. This _a priori_ classification of callers into nominal dialect areas has only limited value in predicting their actual speech patterns. Accurate _a posteriori_ classification of the speech itself, however, would be a very expensive and time-consuming process. Since the _a_priori_ procedure had been used in previous speech corpora, most notably TIMIT, it was used again in collecting SWITCHBOARD for consistency's sake. Due to the number of TI employees, their relatives and acquaintances, and local residents who responded to notices, there is a far greater number of "SOUTH MIDLAND" callers than would be expected, for example, in a random nationwide sample. NUMBER OF CALLERS PER DIALECT AREA DIALECT AREA COUNT -------------------- SOUTH MIDLAND 155 WESTERN 85 NORTH MIDLAND 77 NORTHERN 75 SOUTHERN 56 NYC 33 MIXED 26 NEW ENGLAND 21 Callers were drawn principally from the age groups 20 to 60: NUMBER OF CALLERS PER AGE RANGE AGE COUNT ------------------- 20-29 140 30-39 179 40-49 112 50-59 87 60-69 13 The speakers were approximately 55% male and 45% female. Females volunteered to participate in greater numbers than males whenever a public announcement was made, and they tended to participate more actively as well. The resulting imbalance was finally redressed (in fact reversed) by posting electronic bulletin board announcements which asked for male applicants only. NUMBER OF CALLERS PER SEX SEX COUNT ------------------- MALE 292 FEMALE 239 The educational level was coded as 0 for less than high school, 1 for lest than college (but not 0), 2 for college (but not 3), 3 for more than college, and 9 for unknown. The distribution was: EDUCATION COUNT -------------------- 0 14 less than high school 1 39 less than college 2 309 college 3 176 more than college 9 4 unknown HANDSETS Callers, especially those who were permitted to continue past 10 or 15 calls, were instructed to use more than one handset. To help keep track of this variable, for each call and caller a phone number is recorded in the CALLER_CONVERSATION table. For the outgoing side of the call, the Robotoperator simply keeps track of the number dialed; for the incoming side, the originator of the call is prompted to "enter the number you are calling from," the DTMFs are decoded, and the number written to the database. The phone number is unfortunately the only objective indicator of what handset is being used. The association of phone numbers with handsets is probably very high, but surely not perfect, for at least two reasons. First, people make mistakes keying in numbers; dozens of cases were found and, where possible, corrected by hand. Typical errors are transpositions, keying in a 1 before the area code and number (which causes the last digit not to be captured), or "bouncing" a key so that a digit is repeated. Also, there was no obvious error-recovery procedure if one began to enter the wrong number and then realized the error. Second, compliance with our requests varied. Some participants, travelers in particular, used many phones of different types; some used one home and one work phone; some complied by using two handsets of different manufacture at the same extension. (In the latter case, when it became known to the project, callers were asked to key in a number like "9999999999" for one extension, and the correct phone number for the other.) In a few cases, we simply cannot determine whether more than one telephone instrument was used, because the same number was keyed in on every call. Here are examples for a few of the callers' who were asked to vary the handset they used. CALLER PHONE NUMBER COUNT(*) ------ ------------ ---------- 1013 2144243223 4 1013 2145394862 6 1013 2145745859 1 1013 8068742424 1 1013 8174976701 12 1022 2142359387 1 1022 2144120124 14 1022 2144759048 1 1022 2146800738 1 1022 2146802232 1 1022 2146906425 1 1022 2149956257 5 1028 7162716100 1 1028 7162750661 1 1028 7162750759 8 1028 7164425557 15 1028 7164425574 1 1035 2146444314 9 1035 2149171371 15 1041 7035605000 8 1041 7036202752 18 1041 7036560500 1 1043 8143793338 9 1043 8143793361 13 1043 9999999999 4 1073 3153304581 1 1073 3157620226 20 1074 2147805813 1 1074 8176663073 6 1074 8177727098 19 1104 2149959114 18 1104 8174291805 10 1112 2144230895 7 1112 2145174227 2 1112 2149171312 10 1112 4059171312 1 1112 4149626585 1 1120 4122873879 10 1120 8142263524 19 1121 4013339846 1 1121 5082229761 1 1121 5086991823 21 1121 5086993640 1 1121 6033529215 1 1121 6172477047 1 1124 3013231010 1 1124 3013231212 1 1124 3015364327 18 1124 3015430834 5 PROMPTS The prompts were devised with several common sense criteria in mind: covering many different topics of conversation; choosing subjects that interest large numbers of people, that tend to generate friendly differences of opinion or viewpoint, or invite exchanging of stories or shared experiences; and avoiding overlapping or subordinated topics and sensitive or personal issues as much as possible. Once approved, the prompts were recorded by an experienced female speaker at 20 kHz, then downsampled and transferred to the Intervoice disk. Attachment 2 is a complete list of the texts of the 70 prompts. In registering for SWITCHBOARD, participants were given a sheet containing all the Topic Descriptions, on which they could indicate whether they would be very interested, somewhat interested, not interested, or unwilling to talk about each one. (See Attachment 1.) These "topic preferences" were used in creating the TOPICPIN file described earlier, so that callers were matched on topics they both expressed interest in. DATA COLLECTION AND CONVERSION Returning to the section above entitled COLLECTION PROTOCOL, where a typical successful collection was described, let us follow the collection process from the point where the Robotoperator begins writing the speech from the T1 line to disk. All recording of speech on the Robotoperator is done in a special file called VOICE.VOX, which normally stores customer messages. The software keeps a pointer to the starting address of each "message" (in SWITCHBOARD, each side of each conversation) for later compression or extraction. SWITCHBOARD messages were kept in their original (uncompressed) 64 kbit mu-law form. The application which controlled the recording assigns numbers to these messages so that information in the database can later be attached to the proper message. A program running on the PC caused the application to shut down each night at midnight, rebooted the PC, and ran a series of programs to extract the message files from VOICE.VOX and to transfer them and the database files via a network to the Speech Research Group computer system for further processing. Finally the application program was restarted for the next day's traffic. As described above, the CONVER file, written by the Robotoperator, contained the message numbers, speaker identification numbers, time of day, topic number, telephone number, and other information. A C program extracted this data and combined it with the binary message file to construct a single Unix speech data file with appropriate header, and also updated the Oracle database for that call. The Unix file contained both sides of the conversation, in mu-law format, with the data from the A and B sides interleaved. Thus playing back only the odd bytes (where the first byte is byte 1) resulted in hearing the A side, and only the even bytes the B side. Summing pairs of bytes produced the complete conversation. The Unix speech file was next played through a Sparcstation to a cassette recorder to produce an audio tape that could be sent out for transcription. Three recordings were made of each file: the combined version of the conversation, the A side only, and the B side only. This allowed transcribers to determine what was said during simultaneous speech. 9. How The Data Was Transcribed Approximately half of the transcriptions were done by court reporters, and half by transcribers working temporarily at TI. They were done from the audio tapes described above, following a transcription manual written just for SWITCHBOARD and revised several times over the course of the project. The text of the transcription manual follows as Attachment 3. The transcription style chosen had several goals. One was consistency, another was utility for research in speech and linguistics. Human readability, though not very important for most researchers, was also a consideration because it facilitates the later steps in the QC process. When no other principle ruled, court reporters' practice was followed. A number of symbols and conventions were borrowed from other projects, such as the London-Lund corpus and the AT&T transcription manual: use of (()) for doubtful words, use of "expr ... \expr" to enclose multi-word events, {} to enclose comments, -- for interruptions, etc. The marking of nonspeech events with [descriptor] was designed to signal the presence of acoustic events likely to bother a speech recognizer. The allowable expressions inside the [] were then limited to a fixed set in order to facilitate modeling classes of these events instead of a universal "garbage model." The comments in braces give information needed to understand events that are happening but are not clear from the text alone, as in {talks to child in room}. They should not represent acoustic events by themselves. Transcribers were not restricted in their use of these comments; they would also be a natural place for researchers to record and share their own comments (perhaps in double braces) as SWITCHBOARD is used for research. The treatment of simultaneous speech by bracketing the overlapping texts with pairs of #s evolved from a more complex scheme, which proved too difficult to enforce across several transcribers. Hundreds of files had to be corrected to the current standard, and the possibility of some inconsistencies cannot be ruled out. Transcribers were told to listen to the separate sides of every conversation as well as the joint version in order to resolve simultaneous talking. The RATINGS at the beginning of each conversation are an attempt to translate the extensive experience of the transcribers into rough indices of quality, subjective but potentially very useful. The instructions for using the rating scale are included in the Transcription Manual. These ratings have in most cases been reviewed at least once, during the QC phase; the QCer was considered the final authority, and was instructed to change any rating which differed from his or her own assessment by more than one. If, for example, an audio cassette tape was noisy because of a bad recording, the original transcriber might give a conversation a 4 on DIFFICULTY or BACKGROUND_NOISE for that reason. During QC, listening at a Sparcstation, the QCer would not hear the noise and should correct the rating, say to a 1. The presence of crosstalk was probably the most difficult of the ratings in terms of interobserver agreement, but on the whole still a reliable indicator. For example, an informal study of the first half of the corpus found that higher ratings (more crosstalk) were much more likely with local and intrastate calls, where echo cancelling is least likely, than on calls from more than 1000 miles away. See the Transcription Manual for further details. 10. The SWITCHBOARD Dictionary A dictionary of SWITCHBOARD was developed at TI as a byproduct of the automatic time alignment procedure. It was not part of the contract for SWITCHBOARD, and is not included in the first edition of the SWITCHBOARD corpus because it needs further work before being made public. If it is included in later editions, as is planned, it will be documented fully there. Nevertheless a brief description and sample entries are included here, since the dictionary did play a role in time alignment and QC. Each entry is a Prolog data statement, containing the spelling, a code for the part(s) of speech this word can be in English, a phonetic representation of one or more pronunciations the word may have, and a certification if the entry has been verified for spelling (v) or certified for accuracy by a linguist or other professional (c), and by whom. The surface phonetic level of representation uses a fairly common and widely accepted symbol set, with three levels of stress and some informal rules of syllabication. Phonetic elements are separated by commas, complete alternate pronunciations are separated by semicolons, but alternate subwords or phonetic units can be embedded with braces. Note that, in order to accomplish the task of time alignment, it was necessary to enter as words many proper nouns and neologisms which would not belong in a dictionary otherwise. Of the 4893 "new words" encountered in SWITCHBOARD conversations, 49% are names. Here are some examples of entries: lx("Weider","n",{2,w,iy,1,d,er},s). lx("Noxy","n",{2,n,ao,k,1,s,iy},xs). lx("nondefense","n",{1,n,ao,n,0,d,ih,2,f,eh,n,s},s). lx("Andrea","n",{2,ae,n,0,d,r,iy,0,ah;1,aa,n,2,d,r,ey,0,ah},s). lx("grandkid","n",{2,g,r,ae,n,d,1,k,ih,d},cs). lx("Tijuana","n",{1,t,iy,0,ah,2,w,aa,0,n,ah},cs). lx("nonproducing","g",{1,n,ao,n,0,p,r,{ow;ah},2,d,uw,0,s,ih,ng},cs). lx("expedientially","a",{0,eh,k,1,s,p,iy,0,d,iy,2,eh,n,0,sh,ah,0,l,iy},cs). lx("Rustoleum","n",{1,r,ah,s,t,2,ow,0,l,iy,0,ah,m},s). lx("speckly","j",{2,s,p,eh,k,0,l,iy;2,s,p,eh,0,k,ah,0,l,iy},cs). lx("uptight","nj",{1,ah,p,2,t,ay,t},cs). lx("Ernie","n",{2,er,1,n,iy},s). lx("Quayleisms","n",{2,k,w,ey,l,1,ih,0,z,ah,m,z},vs). lx("stagflation","n",{1,s,t,ae,g,2,f,l,ey,0,sh,ah,n},vs). lx("Colson","n",{2,k,ow,l,0,s,ah,n},vs). lx("Amiga","n",{1,ah,2,m,iy,0,g,ah},s). lx("Fortran","n",{2,f,ao,r,1,t,r,ae,n},s). lx("formatter","n",{2,f,ao,r,1,m,ae,0,t,er},cs). lx("Ian","n",{2,iy,0,ah,n},s). lx("stepgrandmother","n",{1,s,t,eh,p,2,g,r,ae,n,d,0,m,ah,0,dh,er},cs). lx("Shanahan","n",{2,sh,ae,0,n,ah,1,hh,ae,n},vs). lx("eyebrows","pn",{2,ay,1,b,r,aw,z},cs). lx("Logitek","n",{2,l,ao,0,jh,ih,1,t,eh,k},xs). lx("retrofit","nj",{2,r,eh,0,t,r,ow,1,f,ih,t},xs). lx("ROMs","pn",{2,r,ao,m,z},xs). lx("great-grandad","n",{1,g,r,ey,t,2,g,r,ae,n,0,d,ae,d},xs). lx("gups","pn",{2,g,ah,p,s},xs). lx("undemanding","gj",{1,ah,n,0,d,iy,2,m,ae,n,0,d,ih,ng},xs). lx("deisolation","n",{1,d,iy,1,ay,0,s,ow,2,l,ey,0,sh,ah,n},xs). lx("overrecycled","f",{2,ow,0,v,er,0,r,iy,1,s,ay,0,k,ah,l,d},xs). lx("cripe","x",{2,k,r,ay,p},xs). lx("behaviorist","n",{1,b,iy,2,hh,ey,v,0,y,ao,r,0,ih,s,t},xs). lx("Gibbs","n",{2,g,ih,b,z},s). lx("trappings","pn",{2,t,r,ae,p,1,ih,ng,z},cs). lx("laddervators","pn",{2,l,ae,0,d,er,1,v,ey,0,t,er,z},vs). lx("baggies","pn",{2,b,ae,1,g,iy,z},s). lx("Clarke","n",{2,k,l,aa,r,k},vs). lx("nonissue","n",{1,n,{ao;aa},n,2,ih,0,sh,uw},cs). lx("Fitz","n",{2,f,ih,t,z},s). lx("Herbie","n",{2,hh,er,1,b,iy},s). lx("Schenley","n",{2,sh,eh,n,0,l,iy},s). lx("devaluing","g",{1,d,iy,2,v,ae,l,0,y,uw,0,ih,ng},cs). lx("pisses","v",{2,p,ih,1,s,ih,z},cs). lx("nonsafety","nj",{1,n,ao,n,2,s,ey,f,0,t,iy},cs). lx("insignificant","nj",{1,ih,n,0,s,ih,g,2,n,ih,0,f,ih,0,k,ah,n,t},cs). lx("multiuser","n",{1,m,ah,l,0,t,iy,2,y,uw,0,z,er},cs). lx("freeware","n",{2,f,r,iy,1,w,eh,r},s). 11. How The Data Was Time Aligned From the time SWITCHBOARD was first planned, two things were very clear: first, that the value of the corpus would be greatly enhanced by some form of time alignment between the speech signal and its transcription, and second, that the most desirable forms of alignment, e.g., word by word markings created or verified by human skill, would be far too expensive to justify. At the beginning of the project, therefore, two specifications were considered as possible cost effective alternatives: either mark in the transcript the time of each conversational turn, or indicate the time at regular intervals of about 5 or 10 seconds. At the time it was not thought feasible to mark stretches of several minutes of truly spontaneous conversational speech automatically, e. g., at the word level. However, experiments conducted during the early months of SWITCHBOARD collection indicated that the technique of supervised recognition would probably succeed in aligning speech and text far more accurately than the specifications required, and at less cost. Beginning in July 1991, therefore, all the conversations were processed by this method, which is described briefly here. More details can be found in [1]. Each conversation in SWITCHBOARD has an orthographic transcription and a time-marked transcription. The time-marked transcription was generated using an automatic time alignment procedure involving the following steps: 1.) Create a supervision grammar from the orthographic transcription. 2.) Generate a grammar for each word in the transcription, based on an on-line dictionary and phonological rule set. 3.) Execute supervised recognition. 4.) Extract the timing information from the recognition output and merge it with the orthographic transcription. From the orthographic transcription, we automatically generate a finite-state grammar uniquely characterizing the observed word sequence. This grammar dictates a strict linear progression through the text except for simultaneous speech, as discussed below. Nonspeech sounds, such as breath noises and laughter, are also indicated in the transcription but are not explicitly represented in the top level grammar; however, the grammar does have self-loops at each node, i.e., initially, finally, and between each pair of words. Acoustic models trained on the Texas Instruments Voice Across America (VAA) long-distance telephone corpus [2] are used for silence, inhalation, exhalation, and lipsmacks, while all other nonspeech sounds are accommodated through the use of a score threshold which automatically classifies as nonspeech any input frame not sufficiently close to any candidate recognition model. Each word in a conversation generates a finite-state grammar representing one or more pronunciations, which are obtained from an on-line dictionary. A separate path through the word-level grammar is generated for each alternate pronunciation represented in the dictionary. In addition, alternate paths are added for optional variants derived by applying phonological rules, such as alveolar stop flapping. All the steps in conversation-level and word-level grammar creation are fully automated. The sole manual operation in the time-alignment procedure is adding new words to the dictionary as they occur in conversations. Initially, each conversation required the addition of 20-25 words, but this rate decayed rapidly to about 2 words per conversation, most of them proper nouns. Word pronunciations are realized in terms of a set of context-independent phoneme models. These phoneme models are continuous-density HMMs that have been trained for speaker-independent recognition of long-distance telephone speech on 1,000 phonetically balanced sentences (based on TIMIT sentences) in the VAA corpus. Each phoneme has two variants, one trained on male speakers and one on female speakers. The sex of each speaker determines which set of phoneme variants is specified in the supervision grammar. Each conversation is time-aligned by a hierarchical-grammar speech recognition algorithm [3], using the corresponding conversation, word, and phoneme models. The recognizer outputs the beginning time and duration for each word. Since the recognition models use 20 millisecond frames, all times are in multiples of 0.02 seconds. The recognition output is then combined with the original transcription to produce a time-marked transcription showing speaker turns. Two interrelated issues that arose in defining this procedure are use of the combined-channel signal versus the two single-channel signals, and treatment of simultaneous speech. For reasons of cost and time efficiency, the combined-channel signal was used, since aligning each channel separately would require twice the processing time. In addition, alignment of the single-channel signal is vulnerable to errors associated with the "silent" portions of each signal, i.e., the times when the other participant was speaking. For example, some conversations contain considerable cross-channel echo, resulting in a relatively strong speech signal not reflected in a supervision grammar representing only one side of the conversation. This unrepresented signal tends to introduce spurious alignments, resulting in overall alignment failure. Aligning the entire conversation with the combined-channel signal, however, requires an effective method of handling simultaneous speech segments. Stretches of simultaneous speech are labeled as such during transcription, but it is not generally feasible to specify a precise interleaving of words during simultaneous speech. Hence, a simple nonbranching supervision grammar based directly on the transcription would not yield satisfactory alignment performance. The solution was to insert alternate paths in the grammar for the duration of the simultaneous speech portion. Constrained by such a grammar, the recognizer aligns the words for one participant or the other, but not both; it automatically selects between the two paths, based on which aligns better. This method was successful in enabling the alignment procedure to handle simultaneous speech without going astray. The disadvantage is that it yields word-level timing data for only one participant during simultaneous speech segments. However, since simultaneous speech is typically rather brief, even the unaligned words are localized to a small stretch of time. The automatic time-marking procedure seems to be fairly robust. Out of about 2,500 files, only 12 had to be marked manually for at least some portions of the file. The primary failure mode in these files is an extremely quiet speaker; when the energy level is exceptionally low, the alignment process may fail to find expected words, resulting in overall alignment failure. The accuracy of the automatic alignment was estimated by marking 10 randomly selected 30-second excerpts by hand and comparing the results with the automatically determined times. Table 1 shows the difference between hand-marked and automatically marked word alignments, measured separately for word beginning times, word ending times, and word durations. For all data, the mean difference in beginning and ending times is approximately one frame (0.02 second). For 95% of the words, the mean difference is 0.005 second or less, with a standard deviation of approximately three frames or fewer. Independent support for this level of accuracy is provided by comparisons performed at NIST, where keywords occurring in a selected subset of the corpus were marked by hand and the times compared with the automatically generated times. About 95% of these words were marked "correctly" in the sense that the centroid of the word according to the automatic marking fell within the hand assigned beginning and ending times. ---------------------------------------------------------------------------- Differences (sec) | Begin Times | End Times | Durations ---------------------|-------------------|----------------|----------------- ALL | Mean | -0.019 | -0.022 | -0.003 DATA | Std Dev | 0.134 | 0.137 | 0.080 (N=1025) | Range | -1.60 to 0.51 | -1.77 to 0.54 | -0.62 to 0.42 | | | | ----------|----------|-------------------|----------------|----------------- EXCUDING | Mean | -0.005 | -0.004 | -0.001 OUTLIERS | Std Dev | 0.048 | 0.050 | 0.064 (N=975) | Range | -0.22 to 0.22 | -0.22 to 0.21 | -0.22 to 0.22 | | | | ---------------------------------------------------------------------------- As the third row in Table 1 shows, the remaining 5% exhibit wider variation; in a few cases, alignment errors exceeded 1.5 seconds. Examination of these cases indicated that the failures were attributable to exceptionally prolonged words. The acoustic models used for time marking are finite-duration models, which are generally more robust than infinite-duration models for telephone-quality speech. However, such models impose a maximum duration on each word, leading to errors when the input violates the durational assumptions built into the models. In summary, although the time alignments provided with SWITCHBOARD are approximately two orders of magnitude more accurate than the original specifications, they still leave room for improvement. As experiments are performed with SWITCHBOARD, researchers who refine the time alignments provided should contact NIST so that these improved measurements can be incorporated into future versions of the corpus. See Section 14 below for more information. 12. Quality Control Procedures The transcription quality control procedures began with the daily taping of the previous day's conversations from the digital files which were downloaded from the Robotoperator to the Unix network each night. (The cassette recordings were needed because most transcriptionists work on analog equipment which allows them to stop and replay short sections rapidly, change speeds over a useful range, etc.) These files were already converted to a format used by the TI Speech Research Group. A technician at TI loaded a cassette tape into a deck, then executed a script on her Sun Sparcstation which played back first the complete conversation (i.e., the algebraic sum, sample by sample, of the two sides), then the A side alone, then the B side alone, so that all three versions were recorded on one side of a cassette with long pauses separating them. Note that the speech file had already been named automatically according to the convention: CONV#_SPKRA_SPKRB.ext, as in 4940_1423_1662, where speaker 1423 called in, spkr 1622 answered, and conversation number 4940 took place. While the recording took place, the technician listened to the conversation, checked for problems, verified the information in the filename, and labeled the cassette for the transcriber. Problems such as excessive noise or apparent technical glitches, inappropriate behavior by callers, or misidentification of speakers were reported to the project manager. Since the speakers were already registered on the database, the technician listened for whether the sex of each speaker matched the database. With this and other available information, she attempted to verify that the speaker ID numbers in the filename on the tape label were correct and in the right order: the speaker whose number occurs first in the filename should be the person who called in, who would be labeled "A" in the transcript. Note that this is not necessarily the person who speaks first in the combined recording, hence the possibility of confusion if a transcriber does not listen to the A and B sides alone. The technician then filled out an electronic form with information to help the transcriber: the topic prompt, the speaker assignments, and the first few words spoken by "A" and by "B". This form accompanied the tape when it was sent out. When a transcription returned from the contractor, it was processed by an _awk_ program which identified obvious format errors, missing information in the header or ratings, illegal bracketed expressions, etc. The program corrected some of the minor errors, and flagged others to be reworked. Next the transcript was run through a Unix spell checker (ispell) and a local version which checked the SWITCHBOARD dictionary. Words unknown to both programs were flagged for correction (if misspelled) or entry into the dictionary. Any serious problems up to this point were resolved by listening to the conversation and fixing the transcript accordingly. The automatic time alignment of the speech file with its transcription took place next. If this did not succeed, the reason was sought by listening and checking the transcript for errors. After time alignment each conversation was audited from beginning to end while reading the transcript, checking for misidentification of speakers (e.g., switching of A and B during the transcript), and looking for errors of language, spelling, or format. A checklist of the most common kinds of errors (its, it's, they're, their, and the like) was made up for this task. Finally, a rough check of time alignment was made by playing samples of the speech file at several places early, mid, and late in the file; the playback times were taken from the ".marked" file, and the task was to verify that the words heard were the ones covered by those times in the file. Errors of up to a second or so were considered tolerable; usually, errors of that magnitude or greater revealed more general problems requiring that the file be reprocessed. 13. Technical Problems in Collection In spite of all precautions and guarantees, a few technical problems did occur with the collection system. These fall into two major categories: digital noise, or "static", and loss of synchrony between the A and B sides of conversations. STATIC: During the first two months of collection, one of the four telephone interface cards began to fail intermittently. When this failure occurred , data from the affected channel was replaced by apparently random values, which are heard as very loud static, for periods ranging from a few samples up to several seconds at a time. The same type of noise is occasionally heard on the public network when a T1 line loses synchrony. For this reason, and because most calls were not affected, collection continued for over a month before the problem was traced to the interface and the hardware was replaced. Conversations collected before March 23, 1991, which contained more than a few seconds of the noise were later dropped from the SWITCHBOARD collection. Those with only short episodes were retained; the occurrences are marked in the transcripts with the notation [static], and the noise itself should not be qualitatively different from what may be encountered on the network, and indeed elsewhere in the SWITCHBOARD collection. LOSS OF SYNCHRONY BETWEEN A AND B: In this category there were four problems. Unfortunately, they were subtle and difficult to detect; fortunately, once detected, they could be corrected to a degree that should satisfy most users. All four synchrony problems had one common symptom: an unusual time lag between the speech signal on one side of a conversation and its echo on the other side. This time lag could be unusual in magnitude, direction, or variability. They were thus discovered serially, in the course of searching for the cause of the time lag anomalies. i.) The first problem was an asynchronous startup of recording between A and B. In the original specifications, "simultaneous" startup was called for, but the need for scientific precision was not understood by the applications programmers at Intervoice. As a result the recording of the A side of each call was being started either 55, 110 or 165 ms after the B side. The delay was due to the nature of communication between the application software and the microcode (IDSP) which runs on the interface boards. This communication depends on DOS pseudo-multitasking protocols, which allocate time slices to servicing each phone line. The number of instructions to be executed between the time the first (higher, outbound) phone line starts recording and the time the second line gets its instruction to begin would differ depending on a number of factors. The original applications programmer did not check either the delay (which, had it been constant, would not have caused a problem) or the consistency of the delay. As it turned out, the amount of time required to perform various housekeeping tasks, after detecting the DTMF signal to start recording, was not constant. In some cases the recording of A started in the next time slice (55 msec delay), in other cases in the second time slice (110 msec delay), and on a few occasions in the third (165 msec delay). As a result, when the two sides of the conversation were played together, if there was strong echo on one or both sides, it could be heard distinctly, especially at the longer delays. Moreover, in the case of echo on B's side from A's speech, it would appear to lead the speech rather than lag it. A solution was finally found which invariably started the recordings within a few samples of each other. It involved re-writing some routines in the application program so that all the housekeeping functions were performed first, before recording began. Files containing this error were corrected as described below under "Corrections." ii.) The second problem was more serious but quite rare. It manifested itself as relatively large changes in synchrony between the A and B sides of a conversation, caused by losses of 100 ms or more of data on one side at a time. It is apparently caused by contention for the Robotoperator disk when other programs were run on the PC while a call was being recorded. InterVoice engineers did not realize that this condition could occur, and during the early weeks of the project they conducted tests while calls were being collected. Executing DOS commands such as "DIR" while recording was in progress in some cases caused the application program to stop recording and then resume recording without any indication of trouble, so that speech data was lost. The condition was discovered when InterVoice tried to create worst-case conditions to investigate the third problem, described below. Three files were found with this defect, and they were eliminated from the corpus. If users identify others, they should notify NIST immediately. iii.) The third problem was small changes in synchrony between A and B, due to a pseudorandom dropping of 2 ms chunks of data on either side. Over the course of a 10 minute conversation, these could accumulate to a differential of 30 or 40 msec between sides--enough to change a cross-channel echo from inaudible to audible, for example, or from barely audible to very noticeable, for a human listener. When this bug was finally run down, it turned out to be a piece of code in the utility which extracts conversations ("messages") from the Robotoperator message master file. The code performed a check at each data block boundary to see if the first two bytes had the values "FF FF"; if so, these were interpreted as header information, and the 16 bytes beginning with "FF FF" were discarded as not part of the speech data. This code was a relic from an earlier version of the Robotoperator which did not deal with mu-law values, and thus never encountered FF in data. In mu-law data, FF is one of two ways of representing zero signal level ("minus zero"). The offending lines of code were removed and the problem ceased. There is, of course, only one circumstance in which the chances of seeing "FF FF" at a data block boundary are good--during long stretches of total silence. Since the SWITCHBOARD recordings included the silent times when the other speaker is talking, 2-msec chunks were dropped fairly often but not necessarily symmetrically, resulting in changes in the synchrony between the two sides of some conversations. With respect to the changes caused by this phenomenon, the recorded conversations fallinto three classes. a.) Conversations with strong crosstalk on both sides typically show no loss of data, since there were no long silences. b.) In conversations with weak or no crosstalk on both sides, any loss of data is likely to be fairly symmetrical, and could not be detected in any case, since it is basically a dropout of silence in the midst of silence. These calls appear unaffected, and should pose no problem for research purposes. c.) Conversations with strong crosstalk on one side typically show slippage in the 10-50 msec range over the entire file, which is detectable because of the changing lag in the crosstalk. These are corrected as described below under "Corrections." CORRECTIONS. The files known to be subject to these problems (all files numbered less than 2453) were processed at NIST to correct the asynchronous offset and the slippage. The A and B sides were compared with a cross correlation measure at various delays. The lag time which showed the best peak in the correlation function between speech on one side and its echo on the other was measured throughout the file. Early in the file, this lag was a good estimate of the initial offset (55, 110, 165 msec), which was corrected by removing that amount of data at the beginning of the B side. Later in the file, whenever this lag time changed it was considered evidence of the loss of data in 2 msec increments from a silent period, and 2 msec of silence was inserted in an appropriate place on the side that had been shortened. In files with not enough crosstalk to determine a lag in the cross-correlation, only a 55 msec offset correction was made. 14. How to Report Errors Switchboard users discovering any kind of error in the corpus should fill out the following form, which is available via ftp in "/bugs/data/doc". The form with the error report should be emailed to "debugger@jaguar.ncsl.nist.gov". SWITCHBOARD CORPUS ERROR REPORT Conversation: Start time: End time: Problem Description: Suggested Solution: Revised files: Revised tables: Revised documents: 15. References [1] B. Wheatley, G. Doddington, C. Hemphill, J. Godfrey, E.C. Holliman, J. McDaniel, and D. Fisher, "Robust Automatic Time Alignment of Orthographic Transcriptions with Unconstrained Speech," Proc. ICASSP-92, Vol. I, 533-536, 1992. [2] B. Wheatley and J. Picone, "Voice Across America: Toward Robust Speaker-Independent Speech Recognition for Telecommunications Applications," Digital Signal Processing 1:2, 1991. [3] G.R. Doddington, "Phonetically Sensitive Discriminants for Improved Speech Recognition," Proc ICASSP-89, 1989. ======================== ATTACHMENTS ============================ ATTACHMENT 1: Contents of the SWITCHBOARD registration packet: letter, signup sheets, consent form, schedule form, topic selection form. TEXAS INSTRUMENTS Speech and Image Understanding Laboratory Switchboard Information and Sign-up Package In the future computers will understand human speech, and you can contribute towards this goal by participating in the Switchboard Speech Database. Participation involves taking part in brief, natural conversations over the telephone with others. Your speech will be recorded and used to develop speech technology. The conversational topics will be drawn from a list you express interest in. No expertise, just basic conversational skills are expected. The calls are free, and you will be compensated for participating. You will find the details in the following pages. To sign up electronically, you may fill out the following pages and email them back to the above email address. You will receive an "official" signup packet in the mail within the next few days. We will need a signed consent form from the packet so you should send back to us either a hardcopy of this filled-out electronic version or the "official" filled-out copy (be sure to sign either one). NO PAYMENT WILL BE MADE FOR ANY CALLS PLACED UNTIL WE RECEIVE YOUR SIGNED CONSENT FORM. We are seeking many participants with a wide variety of American English speaking patterns. If you know of other males between the ages of 20 and 60 who would be interested, please copy and pass this information on, or have them contact: Texas Instruments, Inc. ATTN: John J. Godfrey Speech Research Group, MS 238 P.O. Box 655474 Dallas, TX 75265 (214) 995-0651 Email: swboard@csc.ti.com Thank you. Your help is much appreciated. Sincerely, John J. Godfrey SWITCHBOARD INFORMATION PARTICIPANT REQUIREMENTS We are seeking speakers with the following qualifications: 1. Your first language is American English. 2. You are a male between 20 and 60 years of age. 3. You can comfortably converse with people you don't already know. 4. You have access to a touchtone (not pulse or rotary) telephone. HOW TO PARTICIPATE The calls are computer guided, and the steps are easy to follow. Either you call our 800 number or you will be called; the computer will connect you with another participant, tell you the topic, and record the conversation. You will need to enter an identification number using the touchtone keypad. CONVERSATION REQUIREMENTS In this package you will select conversational topics that you find interesting and comfortable to talk about. You are not expected to be an expert on topics, but simply able to: a) converse for about 5 minutes in a natural manner and b) stay on the assigned topic (as much as possible). The number of calls you may participate in will depend on many factors, such as the number of participants, your topics, and the times you are available. The average number of conversations per person is 10. You may make 1 call per day, starting as soon as you are notified in the mail. COMPENSATION OPTIONS FOR CALLS COMPLETED Participants will receive thank-you gifts or payment of $5 for each completed call. Texas Instruments employees, as well as others whose circumstances do not permit them to receive payment, should choose the "gift" option, or they may decline both. The number and type of gifts will vary with the number of conversations completed. DISTRIBUTION OF THE SWITCHBOARD SPEECH DATABASE Your speech will be recorded, transcribed, and made available for research and development of speech technology. It will be archived at the National Institute of Standards and Technology (NIST) in Maryland. Your name will not be released with the database. WHO TO CONTACT Contact the Speech Research Branch (214-995-0785) if you have any questions. If you have problems with a conversation that was recorded, call within 5 days, and it will be erased. SWITCHBOARD SIGN-UP SHEET I. GIFT/CASH INFORMATION A. Are you and employee of Texas Instruments (TI)? _____ B. Which compensation option do you select? (TI employees may not receive cash.) C. Address for receiving your gift or check: Name _______________________________________________ Street Address _______________________________________________ City, State, Zip code _______________________________________________ Social Security Number _______________________________________________ II. BACKGROUND INFORMATION A. Are you a man ____ or a woman ____? B. Birth year: 19____ C. Highest educational level achieved: _______________________ D. Where did you grow up during your first 10 years? _____________________________________________________________________ III. Legal Consent Statement I have read and understood the attached description of the Switchboard Speech Database collection project. I consent for Texas Instruments, Incorporated (TI) to record and monitor my voice over the telephone during computer-controlled conversations with other participants. The recordings and transcripts of my speech will be part of a publicly available database; universities, government laboratories, contractors, and other qualified persons will be able to use them for research and development of automatic speech recognition, speech understanding, and speaker identification. TI agrees to protect my privacy by not telling anyone who receives the recordings which ones are mine. My name, address, telephone number(s), and social security number will not be released with the speech database. I understand that this is work for hire; I will be given a gift or payment for each conversation completed according to the requirements; this comprises TI's complete obligation to me. Participant's signature: ___________________________ Participant's printed name: ___________________________ Date: ___________________________ TIMES AVAILABLE FOR PARTICIPATION On the calendar please fill in the time periods you expect to be available for participating and the appropriate phone numbers. Your times should be between 6 A.M. and 11 P.M. Central Time. Please include A.M. and P.M. when you specify your times. WEEKLY CALENDAR Starting Times Ending Times Phone Number MON ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ TUES ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ WED ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ THURS ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ FRI ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ SAT ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ SUN ______________ ____________ ____________ ______________ ____________ ____________ ______________ ____________ ____________ Your area code: __________ Please circle your time zone: Eastern Central Mountain Pacific SWITCHBOARD TOPICS Please select the topics you are interested in discussing from the list below by deleting those which you are not interested in. We recommend you select at least 15 topics in order to increase the likelihood of quickly matching your topic preferences, time schedule, and availability with those of other callers. AIDS Air Pollution Auto repairs Baseball Basketball Boating and sailing Buying a car Camping Capital punishment Care of the elderly Child care Choosing a college Clothing and dress Computers Consumer goods (appliances, etc.) Crime Drug testing Elections and voting Ethics in government Exercise and fitness Family finance Family life and activities Family reunions Federal Budget Fishing Football Golf Gun control Home Repair Immigration Job benefits Latin America Magazines Metric system Middle East Music News media Painting (e.g. house painting) Pets Politics Public education Recycling Right to Privacy Social Change Soviet Union Space flight and exploration Taxes Trial by jury Universal health insurance Universal public service Vietnam War Woodworking Your preferences will be followed as much as possible; however, you may be asked to speak on topics that you have not selected. In such cases you may continue with the call and do your best, or abort the call at the start by simply hanging up. ATTACHMENT 2: Prompts used to start SWITCHBOARD conversations. PROMPT#: DESCRIPTION -- Prompting text -------- ----------- -------------- 353: PUBLIC EDUCATION -- Discuss with the other caller whether there is something seriously wrong with our public school systems today, and if so, what can be done to correct it. 354: DRUG TESTING -- How do you feel about the practice of some companies or government agencies testing employees or prospective employees for drugs? Is random spot testing justified? What limits should there be, if any? 359: FEDERAL BUDGET -- What short and long-term steps do you and the other caller think should be taken to improve the us budget? 360: FISHING -- Find out what kind of fishing the other caller enjoys. Do you have similar or different interests in the kind of fishing you enjoy? 361: GARDENING -- Find out what the other caller does in the way of lawn and garden work. Does the other call enjoy doing it? Compare this to your own situation. 365: BASEBALL -- Find out the other caller's favorite pro baseball team and where it's headed this year. Do you agree with the caller's prediction? 366: CONSUMER GOODS -- Find out from the other caller whether they have had to return a product they bought recently. Are consumer goods generally getting better or worse in quality? 317: AFFIRMATIVE ACTION -- Do you think affirmative action in hiring and promotion is a good policy for private industry? Will it accomplish the government's goals? Can you distinguish between affirmative action and a quota system? 318: AUTO REPAIRS -- What was the last auto repair you performed or had done on your car? Are there some types of repairs or maintenance tasks you prefer to do yourself? Discuss your experiences in this area with the other caller. 301: AIDS -- Please discuss funding for aids research. Should the us spend more, less, or about the same amount of money it currently is? Why do you think so? 302: AIR POLLUTION -- Please discuss air pollution. Find out what substances the other caller thinks contribute the most to air pollution today. What can individuals or society do to improve air quality? 303: CLOTHING AND DRESS -- The topic is clothing. Please find out how the other caller typically dresses for work. How much variation is there from day to day? How much variation is there from season to season? 304: CREDIT CARD USE -- Please discuss credit cards. Find out how the other caller makes use of credit cards. How do they compare to your own? 305: CARE OF THE ELDERLY -- Please discuss care of the elderly. Find out how the other caller feels about sending an elderly family member to a nursing home. What should one know about the nursing home environment when making this decision? 306: RECIPES, FOOD, COOKING -- Please discuss food and cooking. What foods would you include in the menu for a dinner party? Share the recipe for one of these foods with the other caller. 307: FOOTBALL -- Please discuss professional football. Find out the other caller's favorite pro football team and where it's headed this year. Do you agree with the caller's prediction? 308: MUSIC -- Please discuss music. Can you find musicians, singers, instruments, or types of music that both you and the other caller like? 309: PUERTO RICAN STATEHOOD -- The topic is puerto rico. Please find out whether the other caller favors statehood, independence, or the status quo for puerto rico. Why? 338: SOVIET UNION -- Find out whether or not the other caller considers the Soviet Union a threat to the united states. Take an opposing view in your discussion with the other caller. 339: TV PROGRAMS -- Find out what the other caller's favorite TV shows are and why. Are your interests similar or different? 340: TAXES -- Talk about whether americans, like you, are paying too much in taxes -- be it taxes in general or income tax. You might discuss whether americans in general get back what they pay for. 341: TRIAL BY JURY -- Discuss possible changes in the way trials by jury are conducted. For example, what do you and the other caller think about leaving the sentencing to the judge? Must criminal cases require unanimous verdicts? 343: HOUSES -- Find out about the other caller's home. Is it a typical home for the area? How does it compare to your home? 344: IMMIGRATION -- Find out how the other caller feels about America's immigration policy. If there are problems, what might the solutions be? 346: LATIN AMERICA -- What do you think about current or recent American actions in Latin America, or about our policy toward that part of the world? 348: MOVIES -- Find out what the other caller thought about the last few movies they saw. What movies have you seen lately? 349: NEWS MEDIA -- Discuss how you and the other caller keep up on current events. Do you get most of your news from tv, radio, newspapers, or people you know? ARE YOU SATISFIED WITH THE QUALITY OF COVERAGE? 351: PETS -- Find out what kind of pets the other caller has, if any. Discuss in general why people keep pets. 325: COMPUTERS -- Find out the other callers' preference and level of interest in personal computers. How does it compare to your interest and preference? 327: UNIVERSAL PUBLIC SERVICE -- See how the other caller feels about the proposal that all young americans should spend a year or two doing some kind of public service, such as joining the Peace Corps. 328: VIETNAM WAR -- Try to find out what the other caller's views are on the Vietnam War. Was it justified? Was it worth the cost in dollars and lives? 329: WOMEN'S ROLES -- Discuss the changes in the roles of women in American society over the past generation or two. Which changes have been the most significant? Do you have an opinion on what further changes will take place over the next generation? 330: DIRECTIONS -- Get directions from the other caller on how to get from their place of work to the nearest major airport. 331: FAMILY REUNIONS -- Discuss planning a family reunion. Draw on your experiences and those of the other caller for making the next get-together successful and memorable. 332: HOME REPAIRS -- Find out what the last home repair or remodeling project the other callerundertook. How successful was it? How does it compare to your own experience? 333: VOTING -- Find out from the other caller whether or not they think that low voter turnout in American elections is a serious problem. Should anything be done to raise voter turnout? 334: SOCIAL CHANGE -- Discuss recent social changes. How is life in America different today compared to living ten, twenty, or thirty years ago? 336: RIGHT TO PRIVACY -- Find out what everyday occurrences the other caller considers to be an intrusion of privacy. What can be done to prevent them? Do you agree or disagree? 310: VACATION SPOTS -- Please discuss types of vacations and trips you enjoy. Find out whether the other caller can interest you in a vacation spot you haven't visited. 311: BOOKS AND LITERATURE -- Find out what books the other caller reads for enjoyment or self-improvement. Do you have similar or different interests in books? 312: CRIME -- Discuss crime in American cities today. What are your concerns and the concerns of the other caller? What steps can be taken to reduce crime? 313: WEATHER AND CLIMATE -- Discuss the weather. What has it been like in your area? Has it been typical for this time of year? Compare it with the other caller's weather. 314: GUN CONTROL -- Discuss gun control. Where do you and the other caller stand on a scale from 1 to 10, with 1 being a total ban on firearms and 10 being no restrictions on any kind of weapon? 315: MIDDLE EAST -- Find out what the other caller thinks about current US policy in the Middle East. Should us policy be changed or not? 316: RESTAURANTS -- What kind of dining out do you enjoy? What things do you look for in a restaurant that would get you to go back again? See whether the other caller's preferences are similar to yours. 319: BASKETBALL -- Find out the other caller's favorite pro basketball team and where it's headed this year. Do you agree with the caller's prediction? 321: CAMPING -- Find out from the other caller what kind of camping they have done. How does it compare with your own experiences? 323: CHILD CARE -- Find out what criteria the other caller would use in selecting child care services for a preschooler. Is it easy or difficult to find such care? 324: CHOOSING A COLLEGE -- What advice or experience can you offer to a parent on how to help a son or daughter choose a college to attend? 320: BUYING A CAR -- What kind of car do you think you might buy next? What sorts of things will enter into your decision? See if your requirements and the other caller's requirements are similar. 322: CAPITAL PUNISHMENT -- Compare your opinions and those of the other caller on capital punishment. Do either of you think it should be restricted to certain crimes or circumstances? How do the policies and practices of your state fit with your opinions? 335: RECYCLING -- What is being done in your community or area about recycling waste materials? Do you think more should be done? Do you have any ideas on how to encourage more recycling or on what other materials should be included? 337: SAVINGS AND LOAN BAILOUT -- What do you think were the causes of the current savings and loan crisis? Do you believe that the problem is mostly under control? Is it being handled correctly? Could it happen again? 347: METRIC SYSTEM -- Do you think the United States should adopt the metric system? Why do you think the last effort to adopt it failed? What would have to be done differently to guarantee success? 355: ELECTIONS AND VOTING -- Why do you think that only about half of eligible voters in america take part in national elections, and even fewer in local elections? Is this a serious problem? Can you suggest a solution? 362: GOLF -- Discuss golf. Are you a spectator or a player? What are the aspects of the game that you think are most challenging? What do you enjoy the most about playing or watching golf? 363: HEALTH CARE -- Discuss our health care system today, particularly as it affects you and your family. Do you think good medical attention is available to most people? Do you think the costs are reasonable? 364: HOBBIES AND CRAFTS -- What hobbies do you have in your spare time? Do they include any handicrafts, such as knitting, painting, woodworking? 342: UNIVERSAL HEALTH INSURANCE -- Do you believe that the us government should provide universal health insurance, or should at least make it a long term goal? How far in that direction whould you be willing to go? WHAT DO YOU SEE AS THE 350: PAINTING -- Have you done any painting projects recently, either indoors or outdoors? What types of painting are you willing to take on by yourself? Are you usually satisfied when you finish, or do you wish you hired a professional? SEE IF THE OTHER 352: POLITICS -- Discuss any recent political elections or movement that you and the other caller consider interesting or important. Or, if you prefer, discuss political trends or changes taking place in the us. See if the other caller shares your views. 356: EXERCISE AND FITNESS -- Do you do any exercise regularly to maintain your health or fitness level? If so, describe what you do; if not, have you considered doing so? Do you enjoy the exercise you get, or do it as a task? COMPARE YOUR HABITS AND YOUR MOTIVES 357: FAMILY FINANCE -- Does your family keep a monthly budget, or even a long-term financial plan? If not, how do you control expenses? If so, can you give a general description of your procedures, and how successful they have been? SEE HOW SIMILAR THEY ARE TO 358: FAMILY LIFE -- If you have children, can you describe how much time you and your spouse spend with them, and what activities you all do together? Is it difficult to find time for these kinds of activities? What DO YOU THINK ARE THE CURRENT TRENDS IN THE 345: JOB BENEFITS -- What do you consider the most important benefits besides salary in a job with a large organization? How satisfied are you with the current benefits of your job, and what changes in benefits would you like to see? 326: BOATING AND SAILING -- Do you sail or enjoy some other form of boating? do you have your own boat? find out what the other caller enjoys or thinks about boating or sailing. Or you might discuss the pros and cons of boat ownership. 368: SPACE FLIGHT AND EXPLORATION -- What do we gain from our space flight and exploration efforts? Should we continue to support the space program at current levels? You MIGHT ALSO DISCUSS WHETHER SPACE FLIGHT WILL EVER BECOME COMMON, OR WHETHER, GIVEN THE CHANCE, YOU 369: MAGAZINES -- Do you have magazines that you subscribe to or read on a regular basis? what do you like or dislike about magazines, compared to other media? 367: ETHICS IN GOVERNMENT -- Do you think it is possible to have an honest government? Are most politicians in government more for personal gain or public service? How much self-serving activity do you think goes on? IS IT POSSIBLE TO MAKE 370: WOODWORKING -- Please discuss woodworking. Is it a hobby for you, or something you do to save money? what kinds of projects do you like to do, and what kind do you avoid? Do you usually finish what you start? Would you do more if you had more tools? ATTACHMENT 3: SWITCHBOARD Transcription Manual, Revision 4: 17 March 1992 Part I: HEADER FORMAT AND INSTRUCTIONS 1. When the transcription is finished, fill out the template at the top of the text file as in the following example: FILENAME: 3021_1279_1108 TOPIC#: 314 DATE: 910606 TRANSCRIBER: RDL DIFFICULTY: 1 TOPICALITY: 1 NATURALNESS: 1 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 1 STATIC_ON_B: 2 BACKGROUND_A: 1 BACKGROUND_B: 3 REMARKS: Conversation was dominated by Speaker A. Near the end of the conversation there was a silence of about 30 seconds while B went to answer the doorbell. ============================================================ 2. The first three items are filled in from information provided on the log sheets for each conversation; the fourth is the transcriber's initials; the fifth through the thirteenth are "ratings", which are to be given by the transcriber immediately after finishing a conversation. The key to the ratings is given below in #3. The last item, "REMARKS:", is for brief comments about unusual characteristics of the conversation, if any. See #4 below for more details. If there are no comments, just type the word "None." There should be a blank line after the end of the remarks and two more blank lines after the "======" line, before the transcription itself begins. 3. Use the following key in rating each conversation; remember that 1 is good and 5 is bad. SWITCHBOARD CONVERSATION RATING KEY On a scale of 1 to 5, please rate the conversation according to the following characteristics: DIFFICULTY: The conversation was very easy (1) 1 2 3 4 5 or very difficult (5) to transcribe. TOPICALITY: The conversation generally stayed on 1 2 3 4 5 one topic (1) or strayed far from it (5). NATURALNESS: The conversation sounded natural (1) 1 2 3 4 5 or artificial or forced (5). ECHO_FROM_B: In listening to A separately, B could hardly be heard (1) or was nearly as loud as A (5) 1 2 3 4 5 (Caller A's side) ECHO_FROM_A: In listening to B separately, A could hardly be heard (1) or was nearly as loud as B (5) 1 2 3 4 5 (Caller B's side) STATIC_ON_A: There was no static-like noise or 1 2 3 4 5 (Caller A's side) distortion (1) or a great deal of it (5) FROM THE TELEPHONE LINE ITSELF. STATIC_ON_B: There was no static-like noise or 1 2 3 4 5 (Caller B's side) distortion (1) or a great deal of it (5) FROM THE TELEPHONE LINE ITSELF. BACKGROUND_A: The conversation was mostly clear 1 2 3 4 5 (Caller A's side) and intelligible (1) or distorted, muffled, or otherwise hard to understand (5) BECAUSE OF THE SPEAKERS' BEHAVIOR OR THE BACKGROUND WHERE THEY WERE CALLING FROM. BACKGROUND_B: The conversation was mostly clear 1 2 3 4 5 (Caller B's side) and intelligible (1) or distorted, muffled, or otherwise hard to understand (5) BECAUSE OF THE SPEAKERS' BEHAVIOR OR THE BACKGROUND WHERE THEY WERE CALLING FROM. 4. In rating the conversations, remember that you are listening to an audio cassette recording of a computerized recording of a live phone conversation. Any problem caused by the taping will not be part of the database, and should NOT be noted in the transcription and the ratings, but rather in a separate note to TI. However, it can be difficult to distinguish between problems that might originate on the phone lines, on the computer recording, or on the tape recording. Perhaps the following will help: The most common problem from tape recording is a type of "dropout" caused when the computer, while playing back the speech to the cassette recorder, stops playing and then starts again. This leaves up to several seconds of silence on the tape, but no speech is lost--that is, the recording picks up exactly where it quit, even in the middle of a syllable. Ignore this in transcribing; if it gets bad enough to affect the ability to transcribe, return to TI for re-recording. Dropout can also occur on phone lines, usually on long distance calls, or even in the computer recording process. In these cases, however, some speech does get lost during the silences. If this occurs, use a descriptive comment like {dropout, part of a word lost} in the text. If it occurs often, mention this in the REMARKS. Slowing down or speeding up of speech would be caused by magnetic tape slipping or sticking, and should not be noted in the transcript. Return for re-recording if the problem is serious. In general, DO NOT REFER to tape-related problems in rating the conversation, or in the REMARKS, or in {comments} in the text (see below). If in doubt, say so in the comments and in the REMARKS section. If a tape has several such events that you cannot identify, or that make it very hard to transcribe, call the TI lab number or return the tape to TI with a note as soon as possible. EXAMPLE of a comment in the text: {dropout, possibly on phone line?} EXAMPLE of a REMARK in the header: REMARKS: Several episodes of very brief dropout on A's side might have been from the telephone line rather than the tape. Too short to be sure. Part II. GENERAL INSTRUCTIONS 1. Transcribe "verbatim", without correcting grammatical errors: "I seen him," "me and him gone to the movies," etc. 2. Do not try to imitate pronunciation; use a dictionary form: "no" will do for "naw," "nah," etc., "oh" for "aw,"; "going to" (not gonna or goin to); "you all" rather than "y'all"; "kind of" instead of "kinda"; etc. Nonstandard words which are not in the dictionary (e.g., kiddo) should be typed normally, i.e. without quotes or other special notation. 3. Follow the dictionary on hyphenating compounds in clear-cut cases. But "when in doubt, leave them out." 4. Try to avoid word abbreviations: Fort Worth, not Ft. Worth; percent, not %; dollars, cents, and so forth. 5. Contractions are allowed, but be conservative. For example, contraction of "is" (it's a boy, running's fun) is common and standard, but there'll (there will) be forms that're (that are) better left uncontracted. It is always permitted to spell out forms in full, even if the pronunciation suggests the contracted form. Thus it is O K to type he is and they are and we would even if it's he's and they're and we'd you heard. 6. Use normal capitalization on proper names of persons, streets, restaurants, cities, states, etc., but put titles (of books, journals, movies, songs, plays, TV shows, etc.--what would properly be in italics.) in ALL CAPS, i.e., uppercase letters. 7. If it is necessary to use accent marks, insert the number 3 before the letter which would receive the accent, e.g., fianc3e. 8. Punctuation: although normal punctuation rules apply, spontaneous conversational speech is full of difficult situations. Strive for simplicity and consistency, with the following specific guidelines: -- terminate each sentence with a period unless a question mark or exclamation point is clearly justified; -- use a comma instead of ... or -- or fancier punctuation when speakers change thoughts or grammatical structures in the middle of a sentence; --for more detail, and for special rules involving interruptions, etc., see below under SPECIAL CONVENTIONS. 9. Be sure to run a spell check upon completion of the transcript. Remember to watch for common spelling confusions like: its and it's, they're and there and their, by and bye, etc. PART III. SPECIAL CONVENTIONS FOR SWITCHBOARD CONVERSATIONS 1. Speakers should be indicated by "A: " and "B: " at the left margin, with two spaces after the colon, and with a blank line between speakers (i.e., an extra carriage return before each A: or B: ). On the audio tape, A will be THE SPEAKER ON THE FIRST OF THE TWO SEPARATELY RECORDED SIDES. IT IS IMPERATIVE TO KEEP THIS DESIGNATION CORRECT AND CONSISTENT, even when the crosstalk or echo is so strong that both speakers are equally loud. The log sheet for each conversation will show the first few words by each speaker, to help you confirm the assignment. EXAMPLE: A: Blah blah blah blah. B: Blah blah blah. A: Etcetera. 2. Spell out letter and number sequences: D F W, seven forty-seven, U S A, one oh one, F B I, etc., unless the letter sequence is pronounced as a word, as in NASA, ROM, DOS. Transcribe years like 1983 as "nineteen eighty-three," with hyphens only between the tens and ones digits. When a letter sequence is used as part of an inflected word, add the inflection with a dash: T I -er, B S -ing, the Oakland A -s, a witness I D -ed him. This leads to clumsy-looking possessive forms, as in: the U S -'s policy, the T I -er's last name, all the C E O -s' votes, but it saves lots of time later on. 3. Partial words: if a speaker does not finish a word, and you think you know what the word was, you may spell out as much of the word as is pronounced, and then use a single dash followed by a comma, -,. If you cannot tell what word the speaker is trying to say, leave it out. EXAMPLE: A: Well, th-, that's what they kept tell-, wanted me to believe. B: I, I, I just am not to-, totally sure, uh, about that. 4. Hesitation sounds: use "uh" for all hesitations consisting of a vowel sound (rather than trying to distinguish uh, ah, er, etc.), and "um" for all hestitations with a nasal sound (rather than uhm, hm, mm, etc.) 5. Yes/no sounds: use "uh-huh" (yes) and "huh-uh" (no) for anything remotely resembling these sounds of assent or denial; you may use "yeah," "yep," and "nope" if that is what the words sound like. 6. Punctuation: use commas instead of ... or -- or other "fancy" punctuation when speakers change thoughts or grammatical structures in the middle of a "sentence." Terminate each sentence with a period unless a question mark or exclamation point is clearly justified. Only use suspension dots ... if a speaker leaves a sentence unfinished at the end of his/her turn, and a period cannot be used, or at the end of a conversation where the speaker's turn was cut off by the computer timing out: EXAMPLE: A: I was going to do that, but then I ... B: Right, me too. Use a double dash if a speaker breaks a sentence off and picks it up at the beginning of the next turn, with another double dash where the pickup begins: EXAMPLE: A: I was going to do that, but then I -- B: Right, me too. A: -- thought I better not after all. 7. Non-speech sounds during conversations: indicate these using only the following list of expressions in brackets. When making judgments, pick the closest description; [noise] will be adequate to describe most sounds that are not represented below. Note underscores (not spaces or hyphens) to connect the double word descriptions. [TV] [baby] [baby_crying] [baby_talking] [barking] [beep] [bell] [bird_squawk] [breathing] [buzz] [buzzer] [child] [child_crying] [child_laughing] [child_talking] [child_whining] [child_yelling] [children] [children_talking] [children_yelling] [chiming] [clanging] [clanking] [click] [clicking] [clink] [clinking] [cough] [dishes] [door] [footsteps] [gasp] [groan] [hiss] [horn] [hum] [inhaling] [laughter] [meow] [motorcycle] [music] [noise] [nose_blowing] [phone_ringing] [popping] [pounding] [printer] [rattling] [ringing] [rustling] [scratching] [screeching] [sigh] [singing] [siren] [smack] [sneezing] [sniffing] [snorting] [squawking] [squeak] [static] [swallowing] [talking] [tapping] [throat_clearing] [thumping] [tone] [tones] [trill] [tsk] [typewriter] [ugh] [wheezing] [whispering] [whistling] [yawning] [yelling] If the event being described lasts longer than a few words, then indicate the beginning in brackets [ ], and the end in brackets with a "/", [/ ]. EXAMPLES: 1. Separate multiple sounds by a space, each one in brackets: A: Oh, that's funny. [laughter] [cough] Excuse me, I have a cold. B: That's all right, [sneezing] so do I. [barking] [child_talking] 2. Use "/" to show end of a continuous sound: A: Well, it all depends, uh, on, you know, [baby_crying] how the family reacts. I mean, it can be a positive or a negative thing, you know? B: Yeah, well, I guess so. It just seems [/baby_crying] to me that it's a very difficult, uh, difficult issue. 8. When a comment is needed to describe an event, put the comment in curly braces { }: {very faint}, {sounds like speaker is talking to someone else in the room}, {speaker imitates a woman's voice here}. EXAMPLE: 1. Curly braces to describe the speech: B: Yeah, yeah, I agree {very faint} right. 2. Combine curly braces and brackets if more explanation is needed to describe the word in the brackets: A: Did it sound like this? [clicking] {sounds made with mouth} B: No, more like [clicking] {sounds like a pencil tapping on a table} this. 9. When a word or phrase is not clear, type DOUBLE PARENTHESES (( )) around what you think you hear. If there is no way to tell what the speaker said, leave 1 blank space between the double parentheses, indicating speech has been left out because it was unintelligible. EXAMPLE: A: So when I finally did ((take up)) the violin, I progressed pretty quickly in the beginning. B: Of course, that was in college which was a long time ago, so (( )) I remember. 10. Marking untopical speech for possible trimming: Use an "at sign", @, and a double "at sign", @@, to designate potential "trim points" at the beginning or end of conversations. These would exclude speech that either is not part of the conversation itself, or refers directly to the protocol. For example, it sometimes happens that callers accidentally press the touchtone button that begins recording, and are being recorded during the "warmup period" and don't know it. All such speech should be marked for trimming. Other examples would be speech that: a.) explicitly refers to the SWITCHBOARD protocols; b.) refers to the process of making the call; c.) uses the TITLE of the prompt (e.g., "music"); or d.) repeats or paraphrases the PROMPT itself. [The TITLE and the PROMPT for each topic will be found on your information sheet; they are keyed to the topic number, which is on the log sheet for each conversation.] Marking these trim points means that EVERYTHING BEFORE '@' AND/OR EVERYTHING AFTER '@@' may be discarded without losing the main body of the conversation on the topic. These symbols may therefore only be used ONCE AT THE BEGINNING (@) AND/OR ONCE AT THE END (@@) of the conversation. They must also be used ONLY AT TURN-TAKING POINTS, i.e., at the left hand margin, before an "A:" or "B:", NOT part of the way through someone's turn. One or both may be used in a single conversation, i.e., trimming of material at the beginning is independent of trimming at the end. Social niceties and transitional talk are neutral. That is, they may be left alone, but should be trimmed if they occur next to material that definitely deserves trimming. EXAMPLE: A: Okay, so what am I supposed to do now? Wait, let me read, B: I think you're supposed to push one. A: let's see, it says here to push, okay, but I think I already, okay are you ready? B: Yep. [Talking about protocol up to here.] A: Here we go. Alright, now, tell me, what is your favorite kind of music? [Using topic TITLE explicitly.] @B: I enjoy Mozart and reggae, but I really love rap. [OK] . . . A: I've certainly enjoyed hearing what you have to say. [Trim optional here.] @@B: Well, if we've talked enough, do I need to push a button or anything? I guess not, we can just hang up. So long. [Talk of protocol should be trimmed.] A: Bye. Nice talking to you. ANOTHER EXAMPLE: A: Hi, there, how are you doing? B: Fine, how about you? A: Just great, except for all this heat. [Chitchat up to here could be left alone if no reason to trim occurred.] B: Well. Care of the elderly, huh? That's our topic? [Need to trim because it mentions the topic TITLE.] @A: Yes. Do you have any relatives that need special care? [This is OK as part of the conversation, since only the word "care" is repeated from the prompt. It is not trimmed--initial trimming ends with the '@'.] . . . @@B: Well, I guess we have solved the problem of care of the elderly, and how to choose nursing homes, haven't we? [Trimmed because it contains both TITLE and a paraphrase of prompt.] A: Sure did. I hope your grandmother gets better. So long now, it's been fun talking to you. [Social pleasantries would not be trimmed themselves, but no harm in trimming them in order to get rid of the previous turn.] 11. Simultaneous talking: Wherever possible, mark where both speakers talked simultaneously with TWO PAIRS of pound signs (#), ONE BEFORE AND ONE AFTER each of the segments spoken at the same time. One of these segments MUST BEGIN A TURN; in other words, if one person is an "interruptor", his interruption starts a new turn. Remember, BOTH speakers' turns must contain TWO pound signs each. A SIMPLE EXAMPLE: A: Okay, well, I guess that's about it. B: Yeah. A: Nice talking to you. B: # Right, bye. # A: # Bye bye. # ANOTHER EXAMPLE: A: I never heard such nonsense, you know, B: # Yeah, I know. # [B interrupts while A continues.] A: # as I heard that # day when I blah blah blah. [A continues beyond the simultaneously spoken words.] WHICH COULD ALSO BE WRITTEN: A: I never heard such nonsense, you know, # as I heard that # B: # Yeah, I know. # A: day when I blah blah blah ANOTHER EXAMPLE: A: I never heard such nonsense, # you know, # [A starts.] B: #Yeah, # [B starts to step on A.] A: as I heard that day when # I was at that meeting. # [A continues without stopping.] B: # I agree with you all the way # [B comes in over A again.]