SPIDRE: A User's Manual TABLE OF CONTENTS 1. Summary Abstract 2. Overview of directory and file structure 3. The .wav files 4. The .txt files 5. The .mrk files 6. Ancillary text files: database tables 7. How the data was selected 8. How to report errors 9. References ATTACHMENTS Attachment 1: Instruction manual for SPIDRE transcribers Attachment 2 SWITCHBOARD Time Aligned Transcription Specifications Attachment 3 Topic listing 1. Summary Abstract The SPIDRE corpus is a subset of the much larger SWITCHBOARD corpus. It should be noted that this manual is an abridged and slightly modified version of the original Switchboard manual. SWITCHBOARD is a corpus of spontaneous conversations which addresses the growing need for large multispeaker databases of telephone bandwidth speech. Collected at Texas Instruments with funding by DARPA, the complete set of CD-ROMs includes about 2430 conversations averaging 6 minutes in length; in other terms, over 240 hours of recorded speech, and about 3 million words of text, spoken by over 500 speakers of both sexes from every major dialect of American English. The SPIDRE corpus is a 2 disc CD-ROM set which includes 280 conversations, averaging 5 minutes in length, or totaling approximately 60 hours of recorded speech over 2 channels. SWITCHBOARD was collected without human intervention, under computer control. Interaction with the system was via touchtones and recorded instructions, but the two talkers, once connected, could "warm up" before recording began. From a human factors perspective, automation guards against the intrusion of experimenter bias, and guarantees a degree of uniformity throughout the long period of data collection. The protocols were further intended to elicit natural and spontaneous speech by the participants. The transcribers' ratings indicate that they perceived the conversations as highly natural. The use of T1 lines and automatic switching software made it possible to collect the digital version of the speech signals directly from the telephone network, and also to isolate the two sides of the conversations. The goal was to have real telephone speech, routed through the public network, but with no degradation due to the collection system. Isolation of the callers, within the limits of network echo cancelling performance, permits researchers to train on each speaker's voice separately, and then test on either one or both speakers in any conversation. The speech is fully transcribed, and the transcription conventions documented. Court reporters produced most of the verbatim transcripts, following a manual prepared specifically for the project. Their work was checked for formatting errors by an awk script, then twice more by humans during quality control (QC) inspections. Each transcript is accompanied by a time alignment file, which estimates the beginning time and duration of each word in the transcript in centiseconds. The time alignment was accomplished with supervised phone-based speech recognition, as described by Wheatley et al. [1]. The corpus is therefore capable of supporting not only purely text-independent approaches to speaker verification, but also those which make use of any degree of knowledge of the text, including phonetics. It should also facilitate studies of the phonetic characteristics of spontaneous speech on a scale not previously possible. The participants' demographics, as well as the dates, times, and other pertinent information about each phone call, are recorded in relational database tables. Except for personal information about the callers, these tables are included with the corpus. The volunteers who participated provided information relevant to studies of voice, dialect, and other aspects of speech style, including age, sex, education, current residence, and places of residence during formative years. The exact time and the area code of origin of each call is provided, as well as a means of telling which calls by the same person came from different telephones. Many callers made calls from multiple handsets, in order to facilitate study of the effects of that variable on voice recognition. 2. Overview of Directory and File Structure There are 2 speech discs in the SPIDRE Corpus (NIST Speech Discs 18-1.1 and 18-2.1). Each disc has a "readme.doc" file and a "spidre" subdirectory at the top-level directory. The "readme.doc" file contains information concerning the directory and file structure. Unlike Switchboard, SPIDRE has both the transcriptions and the audio files on the same disc, but in different directories. The "readme.doc" file in the top-level directory will explain where these files are located. The orthographic transcription files are named "swXXXX.txt" where XXXX is a conversation number. The time-aligned marked transcripts are named "swXXXX.mrk" where XXXX is a conversation number. For each word in the .txt file, the .mrk file gives an estimated start time and duration. Both, the text files and the time marked files, are exactly as they appeared in the Switchboard Corpus. In the following sections are examples illustrating the contents of each file type, and some information on the conventions used in writing them. 3. The .wav Files Conversation number 4940 is being used as an example. This conversation was used in the SWITCHBOARD corpus but not in SPIDRE. All .wav files follow the same format. The information in the header of the file sw4940.wav can be read with the SPHERE utility h_read: speaker_id1 1423 speaker_id2 1662 recording_date 920508 recording_time 2204 conversation_id 4940 database_id SPIDRE0 data_origins swb1,1.0,4062 channel_count 2 sample_max1 4015.500000 sample_max2 4015.500000 sample_coding mu-law channels_interleaved TRUE sample_count 4798496 sample_rate 8000 sample_n_bytes 1 sample_sig_bits 8 "speaker_id1" is the number of the speaker who initiated the call. In the transcripts this speaker will be called "A". In the database tables the identification number will be under the attribute "CALLER_NO". "speaker_id2" is the number of the speaker who received the call. In the transcripts this speaker will be called "B". "recording_date" is in YYMMDD format, so the date of this conversation was May 8, 1992. "recording_time" is in HHMM format; recording of this call began at 10:04 p.m. CDT. "sample_max1" is the maximum amplitude of the signal on speaker_id1's channel, expressed as a positive linear value; 4015.5 is full scale. "sample_max2" is the maximum amplitude of the signal on speaker_id2's channel, which was also full scale. "sample_coding" tells how to interpret the binary data in the .wav file; these are coded as mu-law values, exactly as read from the digital telephone line. "channels_interleaved" has the value TRUE indicating that alternate bytes are the values from alternate channels; speaker_id1's data are the odd bytes, speaker_id2's data are the even bytes (where the first byte is byte 1); summing successive pairs gives the entire conversation. "sample_count" is the total number of bytes of speech data. Since there is one byte per sample, but both sides of the conversation are represented at each sample time, there are 16000 samples per second, or 960000 samples per minute. Thus a good rule of thumb is "one Megabyte per minute," so 4798496 samples represents nearly five minutes of speech. "sample_rate" is 8000 samples per second. "sample_n_bytes" is 1, the number of bytes per sample in the mu-law format. "sample_sig_bits" is the number of bits per sample value, which is 8. 4. The .txt Files The transcripts begin with a header-like section which can be ignored by skipping down to the line consisting entirely of "====". Some of this information matches the .wav header information, and was used to verify and maintain consistency between the two files when transcribers worked on the .txt files. The rest is information inserted by the original transcriber after completing the transcript, then reviewed and corrected if necessary by one or more QC transcribers. The instructions given to the transcribers for rating the difficulty, amount of echo or noise, etc. are found below in the section on "TRANSCRIPTION", but will be described very briefly here. A scale of 1 to 5 is used, where 1 implies good quality, easy to understand, etc., and 5 is bad quality, more difficult to deal with, etc. The header section of conversation 4940 is reproduced here for illustration: FILENAME: 4940_1423_1662 TOPIC#: 302 DATE: 920508 TRANSCRIBER: nk DIFFICULTY: 2 TOPICALITY: 1 NATURALNESS: 2 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 2 STATIC_ON_B: 1 BACKGROUND_A: 2 BACKGROUND_B: 2 REMARKS: None ============================================================ The first four lines are self-explanatory. The Topic number code is further explained in Attachment 2. The topic information was included but is not of any relevence to the SPIDRE corpus, as SPIDRE was designed primarily for speaker identification. "DIFFICULTY" means the overall difficulty of transcribing this conversation compared to the rest of the SWITCHBOARD conversations this transcriber has done. It is a subjective catch-all, designed to alert the user, where no other standard category of problem is noted, that there may be a soft-spoken, mumbling, or otherwise difficult-to-understand caller. The transcriber thought conversation 4940 was not very difficult, but harder than some. "TOPICALITY" refers to whether the callers conversed generally about what was suggested by the recorded prompt. Conversations were not rejected if callers strayed from the prompt, or even ignored it entirely. However, those who wish to group calls for vocabulary studies, language modeling, etc., may find this a useful guide. The transcriber thought that the speakers in conversation 4940 stayed right with the topic suggested by the prompt. "NATURALNESS" is another very subjective rating, intended partly to study after the fact how well the human factors in SWITCHBOARD succeeded in eliciting natural conversational speech. The transcriber felt this was a natural sounding conversation, but less so than some others. "ECHO_FROM_B" estimates how loud the crosstalk from the other channel (B) was on this channel (A), at the times when A was silent and B was talking. A score of "1" means inaudible or almost so; a score of "5" means the crosstalk was almost as loud as the speech on the A channel itself. This conversation apparently had little or no crosstalk in either direction. "ECHO_FROM_A" is the same estimate, but for the B channel. To make these ratings, of course, transcribers had to listen to each channel separately as well as the combined signal. "STATIC_ON_A" was intended to isolate the occurrence of electrical noises often described as static, some of which were caused by the collection system, from other types of unwanted acoustic signals on channel A. It is not clear how well the transcribers understood this distinction, so there may be many "false positives" from acoustic noise in a caller's environment. But in the cases where strong digital noise was present, they did seem to note it and lower the ratings accordingly. The transcriber heard some static in this conversation, and noted two places in the transcript where it occurred with the term [static]. "STATIC_ON_B" is the same for the other channel. None was noted in this conversation, hence a rating of 1. "BACKGROUND_A" refers to the presence of noise, including any unwanted signal of any kind, coming from the environment of caller A. In this example, the noise of people talking, children playing, and dishes being washed caused a rating of 2 on the A channel. "BACKGROUND_B" refers to the same on channel B. In this conversation there were also voices and children on the B side, and the same rating was given. "REMARKS" was a field for transcribers or QCers to insert unlimited free-form comments on the conversation; they were encouraged to note any unusual characteristics that might help in studying the speech, and especially any overall sources of difficulty not well identified in the ratings. For example, if one caller was eating all through the conversation, or had a head cold, this was the place to note it. The remainder of the .txt file is the verbatim transcript of what was said, with the speakers indicated by "A:" and "B:", and a number of conventional symbols and expressions which will be explained in the TRANSCRIPTION section below. Here are the first fifty lines of the file sw4940.txt, the example used above: FILENAME: 4940_1423_1662 TOPIC#: 302 DATE: 920508 TRANSCRIBER: nk DIFFICULTY: 2 TOPICALITY: 1 NATURALNESS: 2 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 2 STATIC_ON_B: 1 BACKGROUND_A: 2 BACKGROUND_B: 2 REMARKS: None ============================================================ A: Okay [children]. B: Okay Carol. So, air quality. A: Yeah. Is it, [noise] {sounds like water running and she is doing dishes} I know in here, uh, downtown Dallas, it's, you, I mean you drive by and you can just, you can see it. B: Uh-huh. A: But, then again [throat_clearing] I originally was from California and, uh, there is a big difference between Texas and California. #And, uh# -- B: #Surely.# A: -- they'd have their smog alerts and where you'd have to stay indoors for so many hours with an air conditioner. And, of course, they don't have that here in Texas so, [breathing] there's ... B: You mean they don't have the, uh, the smog alerts? A: No, not in, not in Te-, well not in Dallas, that is. B: Right. I, I, A: [throat_clearing]. B: yeah, I spent a summer i-, i-, in Tyler so I know, just east of Dallas there. A: Yeah. We're going there tomorrow. B: Oh, really #[laughter].# 5. The .mrk Files For ease of use the .mrk files are arranged in fixed records of four fields, where the first field is the speaker (A or B), followed by a turn indicator, A.x or B.x (where x indicates the speaker turn); the second is the estimated start time in seconds, the third is the estimated duration in seconds, and the fourth is the word whose start time and duration are estimated. A "word" in the transcript is sometimes not actually a spoken word, and in these cases an asterisk is placed in the start time and duration fields. This occurs for certain punctuation marks, for bracketed expressions indicating acoustic events other than speech of the callers, for transcribers' comments in braces, etc. The same convention is used also where there is simultaneous speech--one talker's words are time marked in that case, and the other's are left with asterisks in the time fields. The first 100 lines of file sw4940.mrk are reproduced here to illustrate some of these conventions. A.1 0.04 0.42 Okay A.1 * * [children]. B.2 0.82 0.22 Okay B.2 1.06 0.34 Carol. B.2 3.58 0.34 So, B.2 3.92 0.20 air B.2 4.12 0.70 quality. A.3 5.40 0.22 Yeah. A.3 6.16 0.16 Is A.3 6.32 0.16 it, A.3 * * [noise] A.3 * * {sounds A.3 * * like A.3 * * water A.3 * * running A.3 * * and A.3 * * she A.3 * * is A.3 * * doing A.3 * * dishes} A.3 7.02 0.10 I A.3 7.12 0.22 know A.3 7.34 0.08 in A.3 7.42 0.30 here, A.3 7.80 0.22 uh, A.3 8.36 0.44 downtown A.3 8.80 0.46 Dallas, A.3 9.26 0.22 it's, A.3 9.60 0.20 you, A.3 9.82 0.10 I A.3 9.92 0.20 mean A.3 10.12 0.08 you A.3 10.20 0.28 drive A.3 10.52 0.26 by A.3 10.78 0.08 and A.3 10.86 0.08 you A.3 10.94 0.16 can A.3 11.10 0.24 just, A.3 11.96 0.10 you A.3 12.06 0.14 can A.3 12.20 0.40 see A.3 12.60 0.16 it. B.4 12.76 0.32 Uh-huh. A.5 13.78 0.38 But, A.5 14.34 0.42 then A.5 14.88 0.36 again A.5 * * [throat_clearing] A.5 15.52 0.16 I A.5 15.90 0.54 originally A.5 16.58 0.22 was A.5 16.80 0.14 from A.5 16.94 0.66 California A.5 17.72 0.26 and, A.5 17.98 0.18 uh, A.5 18.60 0.16 there A.5 18.76 0.10 is A.5 18.86 0.08 a A.5 18.94 0.30 big A.5 19.28 0.58 difference A.5 20.36 0.34 between A.5 20.70 0.48 Texas A.5 21.18 0.12 and A.5 21.30 0.72 California. A.5 * * #And, A.5 * * uh# A.5 * * -- B.6 22.56 0.34 #Surely.# A.7 * * -- A.7 22.90 0.10 they'd A.7 23.00 0.28 have A.7 23.42 0.44 their A.7 23.86 0.42 smog A.7 24.28 0.34 alerts A.7 24.62 0.22 and A.7 25.50 0.10 where A.7 25.60 0.20 you'd A.7 25.80 0.10 have A.7 25.90 0.10 to A.7 26.00 0.48 stay A.7 26.48 0.44 indoors A.7 26.92 0.10 for A.7 27.04 0.24 so A.7 27.28 0.22 many A.7 27.50 0.30 hours A.7 27.80 0.16 with A.7 27.96 0.06 an A.7 28.06 0.12 air A.7 28.18 0.60 conditioner. A.7 28.78 0.16 And, A.7 28.94 0.02 of A.7 28.96 0.30 course, A.7 29.26 0.08 they A.7 29.34 0.12 don't A.7 29.46 0.14 have A.7 29.60 0.12 that A.7 29.72 0.14 here A.7 29.86 0.08 in A.7 29.94 0.52 Texas A.7 31.54 0.42 so, A.7 * * [breathing] A.7 32.64 0.22 there's 6. Ancillary Text Files: Database Tables In the directory /spidre/tables there are tables containing information about the callers, conversations, etc. To design experiments with SPIDRE, these tables can be incorporated into a relational database management system (RDBMS) using at least the relations caller, conversation, and caller_conversation. To insure anonymity, the names of the callers are not included in the tables, and the telephone numbers have been encoded as follows. The area code and first three digits of the phone number have not been altered. For each six-digit prefix, a list was made of all phone numbers. These lists were sorted into ascending order. For the first phone number in each list, we replaced the last four digits with "0000". For the second phone number in each list, we replaced the last four digits with "0001". This was done for all phone numbers in the tables so that it is still possible to tell when callers were using the same extension, but the actual phone number will not be revealed. Here are the suggested relations, and a few rows from the tables to illustrate their structure: The CALLER relation-- SQL> describe caller Name Null? Type ------------------------------- -------- ---- CALLER_NO NOT NULL NUMBER(4) SEX CHAR(6) BIRTH_YEAR NUMBER(4) DIALECT_AREA CHAR(13) EDUCATION NUMBER(1) REMARKS CHAR(120) SQL> select * from caller where caller_no < 1046; CALLER SEX BIRTH YEAR DIALECT EDU REMARKS ------ ------ ---------- -------- --- ------------------------ 1000 FEMALE 1954 SOUTH MIDLAND 1 1001 MALE 1940 WESTERN 3 1002 FEMALE 1963 SOUTHERN 2 1003 MALE 1947 NORTH MIDLAND 2 1004 FEMALE 1958 NORTHERN 2 1005 FEMALE 1956 WESTERN 2 1007 FEMALE 1965 NEW ENGLAND 2 1008 FEMALE 1939 MIXED 1 1010 MALE 1932 NEW ENGLAND 1 1011 FEMALE 1964 SOUTH MIDLAND 2 1013 FEMALE 1957 SOUTH MIDLAND 2 1014 FEMALE 1947 MIXED 1 1015 FEMALE 1967 NEW ENGLAND 2 1016 FEMALE 1945 SOUTHERN 2 1018 FEMALE 1962 SOUTH MIDLAND 3 1019 MALE 1941 NEW ENGLAND 3 1020 FEMALE 1956 NORTH MIDLAND 2 1021 MALE 1957 NORTHERN 3 1022 FEMALE 1959 SOUTH MIDLAND 2 1023 MALE 1939 SOUTHERN 2 1024 MALE 1964 NORTH MIDLAND 2 1025 MALE 1953 SOUTH MIDLAND 2 1026 FEMALE 1957 SOUTHERN 2 1027 FEMALE 1961 NORTH MIDLAND 2 1028 MALE 1965 NYC 3 1031 FEMALE 1940 SOUTH MIDLAND 3 1032 FEMALE 1943 SOUTHERN 2 1033 FEMALE 1965 SOUTH MIDLAND 1 1034 MALE 1961 NORTHERN 3 1035 FEMALE 1953 NORTH MIDLAND 2 1037 MALE 1947 WESTERN 3 1038 FEMALE 1963 UNK 2 1039 MALE 1943 SOUTHERN 3 The CONVERSATION relation-- SQL> describe conversation Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(5) CALLER_FROM NUMBER(4) CALLER_TO NUMBER(4) IVI_NO NUMBER(4) TALK_DAY CHAR(7) TIME_START NUMBER(6) TIME_STOP NUMBER(6) REMARKS CHAR(240) SQL> / CONVERSATION NO CALLER FROM CALLER TO IVI NO TALK DAY TSTART TSTOP REMARKS --------------- ----------- --------- ------ -------- ------- ------- ----------------------- 2030 1071 1123 334 910306 1909 1919 2031 1151 1126 353 910306 1912 1922 2032 1167 1093 308 910306 1929 1937 2033 1078 1024 360 910306 2056 2106 2034 1000 1083 356 910307 1701 1706 2035 1176 1107 358 910307 1721 1726 2036 1013 1063 309 910307 1751 1757 2037 1132 1175 336 910307 1828 1838 2038 1073 1039 346 910307 1849 1859 2039 1152 1101 339 910307 1911 1919 2040 1130 1119 309 910307 1951 2001 2041 1110 1179 356 910307 2038 2048 2042 1221 1219 310 910307 2117 2127 2043 1169 1139 315 910307 2122 2132 2044 1219 1005 313 910307 2134 2144 2045 1033 1055 364 910307 1834 1840 The CALLER_CONVERSATION relation-- SQL> describe caller_conversation Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(5) CALLER_NO NOT NULL NUMBER(4) PHONE_NUMBER CHAR(10) LENGTH NUMBER(6) IVI_NO NOT NULL NUMBER(4) REMARKS CHAR(240) ACTIVE CHAR(1) Note: IVI_NO is the number of the recorded prompt which indicates topic. SQL> describe rating Name Null? Type ------------------------------- -------- ---- CONVERSATION_NO NOT NULL NUMBER(4) DIFFICULTY NUMBER(1) TOPICALITY NUMBER(1) NATURALNESS NUMBER(1) ECHO_A NUMBER(1) ECHO_B NUMBER(1) STATIC_A NUMBER(1) STATIC_B NUMBER(1) BACKGROUND_A NUMBER(1) BACKGROUND_B NUMBER(1) REMARKS CHAR(120) SQL> select * from rating; CONVERSATION NO DIFFICULTY TOPICALITY NATURALNESS ECHO_A ECHO_B --------------- ---------- ---------- ----------- ---------- ---------- STATIC_A STATIC_B BACKGROUND_A BACKGROUND_B ---------- ---------- ------------ ------------ REMARKS -------------------------------------------------------------------------------- 2001 1 1 2 1 3 1 1 1 1 2002 3 1 1 1 2 4 4 1 3 2003 4 2 1 3 2 5 5 1 1 2004 1 1 1 1 1 2 4 1 1 2005 4 1 2 3 3 1 1 2 2 2006 1 1 1 1 1 1 1 1 1 2007 1 1 1 1 4 1 2 1 1 2008 1 1 3 3 1 1 3 1 3 2009 1 1 1 4 2 1 1 1 1 2010 1 1 1 3 3 1 2 1 1 7. How The Data Was Selected The data for the SPIDRE corpus was taken directly from the SWITCHBOARD corpus. 1.) Target Speakers To qualify as a target speaker, a speaker needed to have participated in 4 calls, exactly 3 of which were from different handsets. 45 target speakers were chosen for the SPIDRE corpus. SWITCHBOARD conversation that were listed on existing bug reports, were not considered for use in the SPIDRE corpus. The target speakers were divided onto 2 CD-ROMs. Among the two discs, these speakers were separated to gain a balance of the same dialect area; age group; and sex. 2.) Non-Target Speakers To qualify as a non-target conversation, neither speaker involved, could be used on either side of any target speaker's conversations. The speaker also had to have at least 60 seconds of speech within the first 210 seconds of the conversation. All non-target conversations were limited to 5 minutes. Duplications of speakers in non-target conversations were limited, but could not be avoided. 100 non-target conversations were chosen for the SPIDRE corpus. There are 50 non-target conversations are on each disc. SWITCHBOARD conversation that were listed on existing bug reports, were not considered for use in the SPIDRE corpus. For detailed information on the actual collection and recording process of the conversations, see chapters 8 through 12 of the "manual.doc" in the "/doc" directory of the SWITCHBOARD corpus [4]. 8. How to Report Errors SPIDRE users discovering any type of error involved with the corpus should report the error by filling out a bug report form which is available via ftp in "/bugs/data/doc". The form with the error report should be emailed to "debugger@jaguar.ncsl.nist.gov". 9. References [1] B. Wheatley, G. Doddington, C. Hemphill, J. Godfrey, E.C. Holliman, J. McDaniel, and D. Fisher, "Robust Automatic Time Alignment of Orthographic Transcriptions with Unconstrained Speech," Proc. ICASSP-92, Vol. I, 533-536, 1992. [2] B. Wheatley and J. Picone, "Voice Across America: Toward Robust Speaker-Independent Speech Recognition for Telecommunications Applications," Digital Signal Processing 1:2, 1991. [3] G.R. Doddington, "Phonetically Sensitive Discriminants for Improved Speech Recognition," Proc ICASSP-89, 1989. [4] Switchboard Corpus, Collected by Texas Instruments, Produced by the National Institute of Standards and Technology, and Sponsored by DARPA. ======================== ATTACHMENTS ATTACHMENT 1: SWITCHBOARD Transcription Manual, Revision 4: 17 March 1992 1Part I: HEADER FORMAT AND INSTRUCTIONS 1. When the transcription is finished, fill out the template at the top of the text file as in the following example: FILENAME: 3021_1279_1108 TOPIC#: 314 DATE: 910606 TRANSCRIBER: RDL DIFFICULTY: 1 TOPICALITY: 1 NATURALNESS: 1 ECHO_FROM_B: 1 ECHO_FROM_A: 1 STATIC_ON_A: 1 STATIC_ON_B: 2 BACKGROUND_A: 1 BACKGROUND_B: 3 REMARKS: Conversation was dominated by Speaker A. Near the end of the conversation there was a silence of about 30 seconds while B went to answer the doorbell. ============================================================ 2. The first three items are filled in from information provided on the log sheets for each conversation; the fourth is the transcriber's initials; the fifth through the thirteenth are "ratings", which are to be given by the transcriber immediately after finishing a conversation. The key to the ratings is given below in #3. The last item, "REMARKS:", is for brief comments about unusual characteristics of the conversation, if any. See #4 below for more details. If there are no comments, just type the word "None." There should be a blank line after the end of the remarks and two more blank lines after the "======" line, before the transcription itself begins. 3. Use the following key in rating each conversation; remember that 1 is good and 5 is bad. SWITCHBOARD CONVERSATION RATING KEY On a scale of 1 to 5, please rate the conversation according to the following characteristics: DIFFICULTY: The conversation was very easy (1) 1 2 3 4 5 or very difficult (5) to transcribe. TOPICALITY: The conversation generally stayed on 1 2 3 4 5 one topic (1) or strayed far from it (5). NATURALNESS: The conversation sounded natural (1) 1 2 3 4 5 or artificial or forced (5). ECHO_FROM_B: In listening to A separately, B could hardly be heard (1) or was nearly as loud as A (5) 1 2 3 4 5 (Caller A's side) ECHO_FROM_A: In listening to B separately, A could hardly be heard (1) or was nearly as loud as B (5) 1 2 3 4 5 (Caller B's side) STATIC_ON_A: There was no static-like noise or 1 2 3 4 5 (Caller A's side) distortion (1) or a great deal of it (5) FROM THE TELEPHONE LINE ITSELF. STATIC_ON_B: There was no static-like noise or 1 2 3 4 5 (Caller B's side) distortion (1) or a great deal of it (5) FROM THE TELEPHONE LINE ITSELF. BACKGROUND_A: The conversation was mostly clear 1 2 3 4 5 (Caller A's side) and intelligible (1) or distorted, muffled, or otherwise hard to understand (5) BECAUSE OF THE SPEAKERS' BEHAVIOR OR THE BACKGROUND WHERE THEY WERE CALLING FROM. BACKGROUND_B: The conversation was mostly clear 1 2 3 4 5 (Caller B's side) and intelligible (1) or distorted, muffled, or otherwise hard to understand (5) BECAUSE OF THE SPEAKERS' BEHAVIOR OR THE BACKGROUND WHERE THEY WERE CALLING FROM. 4. In rating the conversations, remember that you are listening to an audio cassette recording of a computerized recording of a live phone conversation. Any problem caused by the taping will not be part of the database, and should NOT be noted in the transcription and the ratings, but rather in a separate note to TI. However, it can be difficult to distinguish between problems that might originate on the phone lines, on the computer recording, or on the tape recording. Perhaps the following will help: The most common problem from tape recording is a type of "dropout" caused when the computer, while playing back the speech to the cassette recorder, stops playing and then starts again. This leaves up to several seconds of silence on the tape, but no speech is lost--that is, the recording picks up exactly where it quit, even in the middle of a syllable. Ignore this in transcribing; if it gets bad enough to affect the ability to transcribe, return to TI for re-recording. Dropout can also occur on phone lines, usually on long distance calls, or even in the computer recording process. In these cases, however, some speech does get lost during the silences. If this occurs, use a descriptive comment like {dropout, part of a word lost} in the text. If it occurs often, mention this in the REMARKS. Slowing down or speeding up of speech would be caused by magnetic tape slipping or sticking, and should not be noted in the transcript. Return for re-recording if the problem is serious. In general, DO NOT REFER to tape-related problems in rating the conversation, or in the REMARKS, or in {comments} in the text (see below). If in doubt, say so in the comments and in the REMARKS section. If a tape has several such events that you cannot identify, or that make it very hard to transcribe, call the TI lab number or return the tape to TI with a note as soon as possible. EXAMPLE of a comment in the text: {dropout, possibly on phone line?} EXAMPLE of a REMARK in the header: REMARKS: Several episodes of very brief dropout on A's side might have been from the telephone line rather than the tape. Too short to be sure. Part II. GENERAL INSTRUCTIONS 1. Transcribe "verbatim", without correcting grammatical errors: "I seen him," "me and him gone to the movies," etc. 2. Do not try to imitate pronunciation; use a dictionary form: "no" will do for "naw," "nah," etc., "oh" for "aw,"; "going to" (not gonna or goin to); "you all" rather than "y'all"; "kind of" instead of "kinda"; etc. Nonstandard words which are not in the dictionary (e.g., kiddo) should be typed normally, i.e. without quotes or other special notation. 3. Follow the dictionary on hyphenating compounds in clear-cut cases. But "when in doubt, leave them out." 4. Try to avoid word abbreviations: Fort Worth, not Ft. Worth; percent, not %; dollars, cents, and so forth. 5. Contractions are allowed, but be conservative. For example, contraction of "is" (it's a boy, running's fun) is common and standard, but there'll (there will) be forms that're (that are) better left uncontracted. It is always permitted to spell out forms in full, even if the pronunciation suggests the contracted form. Thus it is O K to type he is and they are and we would even if it's he's and they're and we'd you heard. 6. 7Use normal capitalization on proper names of persons, streets, restaurants, cities, states, etc., but put titles (of books, journals, movies, songs, plays, TV shows, etc.--what would properly be in italics.) in ALL CAPS, i.e., uppercase letters. 7. If it is necessary to use accent marks, insert the number 3 before the letter which would receive the accent, e.g., fianc3e. 8. Punctuation: although normal punctuation rules apply, spontaneous conversational speech is full of difficult situations. Strive for simplicity and consistency, with the following specific guidelines: -- terminate each sentence with a period unless a question mark or exclamation point is clearly justified; -- use a comma instead of ... or -- or fancier punctuation when speakers change thoughts or grammatical structures in the middle of a sentence; --for more detail, and for special rules involving interruptions, etc., see below under SPECIAL CONVENTIONS. 9. Be sure to run a spell check upon completion of the transcript. Remember to watch for common spelling confusions like: its and it's, they're and there and their, by and bye, etc. PART III. SPECIAL CONVENTIONS FOR SWITCHBOARD CONVERSATIONS 1. Speakers should be indicated by "A: " and "B: " at the left margin, with two spaces after the colon, and with a blank line between speakers (i.e., an extra carriage return before each A: or B: ). On the audio tape, A will be THE SPEAKER ON THE FIRST OF THE TWO SEPARATELY RECORDED SIDES. IT IS IMPERATIVE TO KEEP THIS DESIGNATION CORRECT AND CONSISTENT, even when the crosstalk or echo is so strong that both speakers are equally loud. The log sheet for each conversation will show the first few words by each speaker, to help you confirm the assignment. EXAMPLE: A: Blah blah blah blah. B: Blah blah blah. A: Etcetera. 2. Spell out letter and number sequences: D F W, seven forty-seven, U S A, one oh one, F B I, etc., unless the letter sequence is pronounced as a word, as in NASA, ROM, DOS. Transcribe years like 1983 as "nineteen eighty-three," with hyphens only between the tens and ones digits. When a letter sequence is used as part of an inflected word, add the inflection with a dash: T I -er, B S -ing, the Oakland A -s, a witness I D -ed him. This leads to clumsy-looking possessive forms, as in: the U S -'s policy, the T I -er's last name, all the C E O -s' votes, but it saves lots of time later on. 3. Partial words: if a speaker does not finish a word, and you think you know what the word was, you may spell out as much of the word as is pronounced, and then use a single dash followed by a comma, -,. If you cannot tell what word the speaker is trying to say, leave it out. EXAMPLE: A: Well, th-, that's what they kept tell-, wanted me to believe. B: I, I, I just am not to-, totally sure, uh, about that. 4. Hesitation sounds: use "uh" for all hesitations consisting of a vowel sound (rather than trying to distinguish uh, ah, er, etc.), and "um" for all hesitations with a nasal sound (rather than uhm, hm, mm, etc.) 5. Yes/no sounds: use "uh-huh" (yes) and "huh-uh" (no) for anything remotely resembling these sounds of assent or denial; you may use "yeah," "yep," and "nope" if that is what the words sound like. 6. Punctuation: use commas instead of ... or -- or other "fancy" punctuation when speakers change thoughts or grammatical structures in the middle of a "sentence." Terminate each sentence with a period unless a question mark or exclamation point is clearly justified. Only use suspension dots ... if a speaker leaves a sentence unfinished at the end of his/her turn, and a period cannot be used, or at the end of a conversation where the speaker's turn was cut off by the computer timing out: EXAMPLE: A: I was going to do that, but then I ... B: Right, me too. Use a double dash if a speaker breaks a sentence off and picks it up at the beginning of the next turn, with another double dash where the pickup begins: EXAMPLE: A: I was going to do that, but then I -- B: Right, me too. A: -- thought I better not after all. 7. Non-speech sounds during conversations: indicate these using only the following list of expressions in brackets. When making judgments, pick the closest description; [noise] will be adequate to describe most sounds that are not represented below. Note underscores (not spaces or hyphens) to connect the double word descriptions. [TV] [baby] [baby_crying] [baby_talking] [barking] [beep] [bell] [bird_squawk] [breathing] [buzz] [buzzer] [child] [child_crying] [child_laughing] [child_talking] [child_whining] [child_yelling] [children] [children_talking] [children_yelling] [chiming] [clanging] [clanking] [click] [clicking] [clink] [clinking] [cough] [dishes] [door] [footsteps] [gasp] [groan] [hiss] [horn] [hum] [inhaling] [laughter] [meow] [motorcycle] [music] [noise] [nose_blowing] [phone_ringing] [popping] [pounding] [printer] [rattling] [ringing] [rustling] [scratching] [screeching] [sigh] [singing] [siren] [smack] [sneezing] [sniffing] [snorting] [squawking] [squeak] [static] [swallowing] [talking] [tapping] [throat_clearing] [thumping] [tone] [tones] [trill] [tsk] [typewriter] [ugh] [wheezing] [whispering] [whistling] [yawning] [yelling] If the event being described lasts longer than a few words, then indicate the beginning in brackets [ ], and the end in brackets with a "/", [/ ]. EXAMPLES: 1. Separate multiple sounds by a space, each one in brackets: A: Oh, that's funny. [laughter] [cough] Excuse me, I have a cold. B: That's all right, [sneezing] so do I. [barking] [child_talking] 2. Use "/" to show end of a continuous sound: A: Well, it all depends, uh, on, you know, [baby_crying] how the family reacts. I mean, it can be a positive or a negative thing, you know? B: Yeah, well, I guess so. It just seems [/baby_crying] to me that it's a very difficult, uh, difficult issue. 8. When a comment is needed to describe an event, put the comment in curly braces { }: {very faint}, {sounds like speaker is talking to someone else in the room}, {speaker imitates a woman's voice here}. EXAMPLE: 1. Curly braces to describe the speech: B: Yeah, yeah, I agree {very faint} right. 2. Combine curly braces and brackets if more explanation is needed to describe the word in the brackets: A: Did it sound like this? [clicking] {sounds made with mouth} B: No, more like [clicking] {sounds like a pencil tapping on a table} this. 9. When a word or phrase is not clear, type DOUBLE PARENTHESES (( )) around what you think you hear. If there is no way to tell what the speaker said, leave 1 blank space between the double parentheses, indicating speech has been left out because it was unintelligible. EXAMPLE: A: So when I finally did ((take up)) the violin, I progressed pretty quickly in the beginning. B: Of course, that was in college which was a long time ago, so (( )) I remember. 10. Marking untopical speech for possible trimming: Use an "at sign", @, and a double "at sign", @@, to designate potential "trim points" at the beginning or end of conversations. These would exclude speech that either is not part of the conversation itself, or refers directly to the protocol. For example, it sometimes happens that callers accidentally press the touchtone button that begins recording, and are being recorded during the "warmup period" and don't know it. All such speech should be marked for trimming. Other examples would be speech that: a.) explicitly refers to the SWITCHBOARD protocols; b.) refers to the process of making the call; c.) uses the TITLE of the prompt (e.g., "music"); or d.) repeats or paraphrases the PROMPT itself. [The TITLE and the PROMPT for each topic will be found on your information sheet; they are keyed to the topic number, which is on the log sheet for each conversation.] Marking these trim points means that EVERYTHING BEFORE '@' AND/OR EVERYTHING AFTER '@@' may be discarded without losing the main body of the conversation on the topic. These symbols may therefore only be used ONCE AT THE BEGINNING (@) AND/OR ONCE AT THE END (@@) of the conversation. They must also be used ONLY AT TURN-TAKING POINTS, i.e., at the left hand margin, before an "A:" or "B:", NOT part of the way through someone's turn. One or both may be used in a single conversation, i.e., trimming of material at the beginning is independent of trimming at the end. Social niceties and transitional talk are neutral. That is, they may be left alone, but should be trimmed if they occur next to material that definitely deserves trimming. EXAMPLE: A: Okay, so what am I supposed to do now? Wait, let me read, B: I think you're supposed to push one. A: let's see, it says here to push, okay, but I think I already, okay are you ready? B: Yep. [Talking about protocol up to here.] A: Here we go. Alright, now, tell me, what is your favorite kind of music? [Using topic TITLE explicitly.] @B: I enjoy Mozart and reggae, but I really love rap. [OK] . . . A: I've certainly enjoyed hearing what you have to say. [Trim optional here.] @@B: Well, if we've talked enough, do I need to push a button or anything? I guess not, we can just hang up. So long. [Talk of protocol should be trimmed.] A: Bye. Nice talking to you. ANOTHER EXAMPLE: A: Hi, there, how are you doing? B: Fine, how about you? A: Just great, except for all this heat. [Chitchat up to here could be left alone if no reason to trim occurred.] B: Well. Care of the elderly, huh? That's our topic? [Need to trim because it mentions the topic TITLE.] @A: Yes. Do you have any relatives that need special care? [This is OK as part of the conversation, since only the word "care" is repeated from the prompt. It is not trimmed--initial trimming ends with the '@'.] . . . @@B: Well, I guess we have solved the problem of care of the elderly, and how to choose nursing homes, haven't we? [Trimmed because it contains both TITLE and a paraphrase of prompt.] A: Sure did. I hope your grandmother gets better. So long now, it's been fun talking to you. [Social pleasantries would not be trimmed themselves, but no harm in trimming them in order to get rid of the previous turn.] 11. Simultaneous talking: Wherever possible, mark where both speakers talked simultaneously with TWO PAIRS of pound signs (#), ONE BEFORE AND ONE AFTER each of the segments spoken at the same time. One of these segments MUST BEGIN A TURN; in other words, if one person is an "interruptor", his interruption starts a new turn. Remember, BOTH speakers' turns must contain TWO pound signs each. A SIMPLE EXAMPLE: A: Okay, well, I guess that's about it. B: Yeah. A: Nice talking to you. B: # Right, bye. # A: # Bye bye. # ANOTHER EXAMPLE: A: I never heard such nonsense, you know, B: # Yeah, I know. # [B interrupts while A continues.] A: # as I heard that # day when I blah blah blah. [A continues beyond the simultaneously spoken words.] WHICH COULD ALSO BE WRITTEN: A: I never heard such nonsense, you know, # as I heard that # B: # Yeah, I know. # A: day when I blah blah blah ANOTHER EXAMPLE: A: I never heard such nonsense, # you know, # [A starts.] B: #Yeah, # [B starts to step on A.] A: as I heard that day when # I was at that meeting. # [A continues without stopping.] B: # I agree with you all the way # [B comes in over A again.] ATTACHMENT 2: SWITCHBOARD Time Aligned Transcription Specifications PART 1: Time Aligning the Data From the time SWITCHBOARD was first planned, two things were very clear; first, that the value of the corpus would be greatly enhanced by some form of time alignment between the speech signal and its transcription, and second, that the most desirable forms of alignment, e.g., word by word markings created or verified by human skill, would be far too expensive to justify. At the time it was not thought feasible to mark stretches of several minutes of truly spontaneous conversational speech automatically, e. g., at the word level. However, experiments conducted during the early months of SWITCHBOARD collection indicated that the technique of supervised recognition would probably succeed in aligning speech and text far more accurately than the specifications required, and at less cost. Beginning in July 1991, therefore, all the conversations were processed by this method, which is described briefly here. More details can be found in [1]. Each conversation in SWITCHBOARD has an orthographic transcription and a time-marked transcription. The time-marked transcription was generated using an automatic time alignment procedure involving the following steps: 1.) Create a supervision grammar from the orthographic transcription. 2.) Generate a grammar for each word in the transcription, based on an on-line dictionary and phonological rule set. 3.) Execute supervised recognition. 4.) Extract the timing information from the recognition output and merge it with the orthographic transcription. From the orthographic transcription, we automatically generate a finite-state grammar uniquely characterizing the observed word sequence. This grammar dictates a strict linear progression through the text except for simultaneous speech, as discussed below. Nonspeech sounds, such as breath noises and laughter, are also indicated in the transcription but are not explicitly represented in the top level grammar; however, the grammar does have self-loops at each node, i.e., initially, finally, and between each pair of words. Acoustic models trained on the Texas Instruments Voice Across America (VAA) long-distance telephone corpus [2] are used for silence, inhalation, exhalation, and lipsmacks, while all other nonspeech sounds are accommodated through the use of a score threshold which automatically classifies as nonspeech any input frame not sufficiently close to any candidate recognition model. Each word in a conversation generates a finite-state grammar representing one or more pronunciations, which are obtained from an on-line dictionary. A separate path through the word-level grammar is generated for each alternate pronunciation represented in the dictionary. In addition, alternate paths are added for optional variants derived by applying phonological rules, such as alveolar stop flapping. All the steps in conversation-level and word-level grammar creation are fully automated. The sole manual operation in the time-alignment procedure is adding new words to the dictionary as they occur in conversations. Initially, each conversation required the addition of 20-25 words, but this rate decayed rapidly to about 2 words per conversation, most of them proper nouns. Word pronunciations are realized in terms of a set of context-independent phoneme models. These phoneme models are continuous-density HMMs that have been trained for speaker-independent recognition of long-distance telephone speech on 1,000 phonetically balanced sentences (based on TIMIT sentences) in the VAA corpus. Each phoneme has two variants, one trained on male speakers and one on female speakers. The sex of each speaker determines which set of phoneme variants is specified in the supervision grammar. Each conversation is time-aligned by a hierarchical-grammar speech recognition algorithm [3], using the corresponding conversation, word, and phoneme models. The recognizer outputs the beginning time and duration for each word. Since the recognition models use 20 millisecond frames, all times are in multiples of 0.02 seconds. The recognition output is then combined with the original transcription to produce a time-marked transcription showing speaker turns. Two interrelated issues that arose in defining this procedure are use of the combined-channel signal versus the two single-channel signals, and treatment of simultaneous speech. For reasons of cost and time efficiency, the combined-channel signal was used, since aligning each channel separately would require twice the processing time. In addition, alignment of the single-channel signal is vulnerable to errors associated with the "silent" portions of each signal, i.e., the times when the other participant was speaking. For example, some conversations contain considerable cross-channel echo, resulting in a relatively strong speech signal not reflected in a supervision grammar representing only one side of the conversation. This unrepresented signal tends to introduce spurious alignments, resulting in overall alignment failure. Aligning the entire conversation with the combined-channel signal, however, requires an effective method of handling simultaneous speech segments. Stretches of simultaneous speech are labeled as such during transcription, but it is not generally feasible to specify a precise interleaving of words during simultaneous speech. Hence, a simple nonbranching supervision grammar based directly on the transcription would not yield satisfactory alignment performance. The solution was to insert alternate paths in the grammar for the duration of the simultaneous speech portion. Constrained by such a grammar, the recognizer aligns the words for one participant or the other, but not both; it automatically selects between the two paths, based on which aligns better. This method was successful in enabling the alignment procedure to handle simultaneous speech without going astray. The disadvantage is that it yields word-level timing data for only one participant during simultaneous speech segments. However, since simultaneous speech is typically rather brief, even the unaligned words are localized to a small stretch of time. The automatic time-marking procedure seems to be fairly robust. Out of about 2,500 files, only 12 had to be marked manually for at least some portions of the file. The primary failure mode in these files is an extremely quiet speaker; when the energy level is exceptionally low, the alignment process may fail to find expected words, resulting in overall alignment failure. The accuracy of the automatic alignment was estimated by marking 10 randomly selected 30-second excerpts by hand and comparing the results with the automatically determined times. Table 1 shows the difference between hand-marked and automatically marked word alignments, measured separately for word beginning times, word ending times, and word durations. For all data, the mean difference in beginning and ending times is approximately one frame (0.02 second). For 95% of the words, the mean difference is 0.005 second or less, with a standard deviation of approximately three frames or fewer. Independent support for this level of accuracy is provided by comparisons performed at NIST, where keywords occurring in a selected subset of the corpus were marked by hand and the times compared with the automatically generated times. About 95% of these words were marked "correctly" in the sense that the centroid of the word according to the automatic marking fell within the hand assigned beginning and ending times. ---------------------------------------------------------------------------- Differences (sec) | Begin Times | End Times | Durations ---------------------|-------------------|----------------|----------------- ALL | Mean | -0.019 | -0.022 | -0.003 DATA | Std Dev | 0.134 | 0.137 | 0.080 (N=1025) | Range | -1.60 to 0.51 | -1.77 to 0.54 | -0.62 to 0.42 | | | | ----------|----------|-------------------|----------------|----------------- EXCUDING | Mean | -0.005 | -0.004 | -0.001 OUTLIERS | Std Dev | 0.048 | 0.050 | 0.064 (N=975) | Range | -0.22 to 0.22 | -0.22 to 0.21 | -0.22 to 0.22 | | | | ---------------------------------------------------------------------------- As the third row in Table 1 shows, the remaining 5% exhibit wider variation; in a few cases, alignment errors exceeded 1.5 seconds. Examination of these cases indicated that the failures were attributable to exceptionally prolonged words. The acoustic models used for time marking are finite-duration models, which are generally more robust than infinite-duration models for telephone-quality speech. However, such models impose a maximum duration on each word, leading to errors when the input violates the durational assumptions built into the models. PART 2: Time Alignment Specifications 1. In .marked files, each record must have 4 fields; the first is the talker, A or B; the second is the estimated start time of the current word, the third is the duration of the word, and the fourth is the word. 2. The first field may contain the dummy symbol "*" when the event in field 4 is not attributed to either speaker. Examples: "[beep]" at the beginning indicating the dtmf tone, or the "..." at the end indicating that the conversation was cut off (timed out). * * * [Beep] @A 1.36 0.28 Okay, @A 1.64 0.08 I 3. Fields 2 and 3 may contain the dummy symbol "*" when the event in field 4 is not a word, but a nonspeech event, a comment, or a stand-alone string of punctuation (i.e., something which does not receive a duration from the time alignment algorithm), as well as the cases in (2) above. A * * [lipsmack] A * * {pause} A 311.02 0.62 economic 4. Fields 2 and 3 may contain the dummy symbol "*" when there is simultaneous speech by A and B. Here the time alignment algorithm is allowed to recognize the speech of EITHER talker--usually the louder one. The other talker's speech is not time aligned during this period, but the words should be attributed to A and B correctly. A 113.96 0.24 thing A 114.20 0.10 is A 114.30 0.44 still, A * * {pause} A * * #you A * * know# A * * -- B 116.40 0.20 #Your B 116.60 0.56 education.# A * * -- A 117.16 0.22 getting A 117.38 0.10 your A 117.48 0.60 education. 5. In field 1, "@" or "@@" may occur before A or B in field 1, to mark suggested trim points. ATTACHMENT 3: Topic Listing 301: AIDS 302: AIR POLLUTION 303: CLOTHING AND DRESS 304: CREDIT CARD USE 305: CARE OF THE ELDERLY 306: RECIPES, FOOD, COOKING 307: FOOTBALL 308: MUSIC 309: PUERTO RICAN STATEHOOD 310: VACATION SPOTS 311: BOOKS AND LITERATURE 312: CRIME 313: WEATHER AND CLIMATE 314: GUN CONTROL 315: MIDDLE EAST 316: RESTAURANTS 317: AFFIRMATIVE ACTION 318: AUTO REPAIRS 319: BASKETBALL 320: BUYING A CAR 321: CAMPING 322: CAPITAL PUNISHMENT 323: CHILD CARE 324: CHOOSING A COLLEGE 325: COMPUTERS 326: BOATING AND SAILING 327: UNIVERSAL PUBLIC SERVICE 328: VIETNAM WAR 329: WOMEN'S ROLES 330: DIRECTIONS 331: FAMILY REUNIONS 332: HOME REPAIRS 333: VOTING 334: SOCIAL CHANGE 335: RECYCLING 336: RIGHT TO PRIVACY 337: SAVINGS AND LOAN BAILOUT 338: SOVIET UNION 339: TV PROGRAMS 340: TAXES 341: TRIAL BY JURY 342: UNIVERSAL HEALTH INSURANCE 343: HOUSES 344: IMMIGRATION 345: JOB BENEFITS 346: LATIN AMERICA 347: METRIC SYSTEM 348: MOVIES 349: NEWS MEDIA 350: PAINTING 351: PETS 352: POLITICS 353: PUBLIC EDUCATION 354: DRUG TESTING 355: ELECTIONS AND VOTING 356: EXERCISE AND FITNESS 357: FAMILY FINANCE 358: FAMILY LIFE 359: FEDERAL BUDGET 360: FISHING 361: GARDENING 362: GOLF 363: HEALTH CARE 364: HOBBIES AND CRAFTS 365: BASEBALL 366: CONSUMER GOODS 367: ETHICS IN GOVERNMENT 368: SPACE FLIGHT AND EXPLORATION 369: MAGAZINES 370: WOODWORKING