MACROPHONE FINAL REPORT ---------- ----- ------ Kelsey Taussig, Project Leader July 13, 1994 PROJECT OVERVIEW Macrophone was an effort that produced a large corpus of telephone speech appropriate to the development of automatic voice-interactive telephone services. The corpus includes over 200,000 transcribed utterances from over 5000 speakers. All data was collected in 8-bit mulaw digital form directly from T1 telephone channels. SUMMARY OF WORK This project was divided into three phases: setup, collection, and file preparation. Tasks for the setup phase included: 1) Procure and configure requisite collection hardware and software. We purchased and installed the hardware required for data collection which included a two-sided printer, two 2-GByte disks, an exabyte drive and exabyte tapes. We also updated the collection software and the software that generated the prompt sheets. 2) Finalize material design. The material design was finalized to the following specification: Of the 34 read utterances, we specified - 4 digit strings, - 3 natural numbers, - 4 dollar amounts, - 1 fraction, - 2 places, - 6 application words, - 2 spelled words, - 7 sentences (3 TIMIT + CSR, 2 WSJ, 2 ATIS), - 1 date, - 1 name at agency, and - 3 names at street addresses. Of the 11 spontaneous responses, we specified - 5 yes/no, - 1 major city, - 1 current date, - 1 current time, - 1 birth date, - 1 natural number, and - 1 comment. Although comments were collected as part of Macrophone, they were not transcribed and are not part of the 200,000 delivered utterances. A separate document entitled Macrophone Materials contains a detailed description of the material and is included with this report. 3) Print sheets for mailing In total, 22,000 unique sheets were printed for mailing. Twenty thousand (20,000) were printed originally, and then an additional 2000 were printed to compensate for a low response rate in the 18-28 year old category. 4) Settle on a sample and schedule with the market research mailing firm. We received bids from Market Facts, NPD (formerly HTI), and NFO. We accepted NFO's bid of $24,000 for 2000 mailings. We later made an additional mailing of 2000 targeted at 18-28 year olds at a cost of $3000. We specified the following respondent characteristics: -even gender distribution, -flat geographic distribution, -ages 10-80, -wide income intervals at the bottom of the scale. Due to NFO's underestimation of the response rate of 10-18 year olds, we received many more calls from juveniles than desired. Those calls were selectively filtered out for a more desirable age distribution. Due to the higher household income specification, we received many fewer calls from 18-28 year olds than desired. In an effort to smooth the age distribution, we sent an additional mailing of 2000 to that age group. Tasks for the collection phase included: 1) Monitor and store 5,000 incoming calls. We collected 6700 calls from the original 20,000 mailings, at a 33% response rate. The additional mailing of 2000 to 18-28 year olds resulted in 310 calls of which only 201 were in the target age group. We suspected the 109 callers out of the targeted age group were parents calling in for their children. The response rate of 10%, suggests that perhaps university postings or electronic bulletin boards might be better sources than panel houses for this particular age group. 2) Hire and train temporary workers for file verification and transcription. We hired and trained 6 half-time transcribers. Their training instructions were distributed to Jack Godfrey, and are included at the end of the report. 3) Transcribe demographic information from calls; write to headers and archive. Demographic information was transcribed for all of the collected calls. Demographic information included a gender decision made by the transcriber as well as responses to the following utterances: -Do you speak any language besides English at home? -Are you using a cordless phone? -What is your date of birth? The sheet identifier (in the form of a 10-digit telephone number) was also transcribed at this time. Since each of the 22,000 sheets contained a unique set of read material, the sheet identifier was used to supply the default transcriptions for the read utterances for a particular sheet. SPHERE headers were written for all files. A typical file header is shown below: NIST_1A 1024 birthday -s6 530317 speaking_mode -s4 read caller_id -s8 11001248 non_native_speaker -s2 no cordless_phone -s2 no gender -s4 male panel_number -s7 0137631 sheet_identifier_1 -s10 4454978797 recording_date -s6 930914 recording_time -s6 130719 database_id -s10 MACROPHONE database_version -s3 1.0 microphone -s9 telephone sample_rate -i 8000 sample_count -i 61504 channel_count -i 1 sample_n_bytes -i 1 sample_sig_bits -i 8 sample_byte_format -s6 mu-law prompt_text -s26 Say the credit card number transcription -s69 two three two four dash six six oh seven \ dash three three three three response_category -s6 digits end_head 4) Prepare and package groups of utterance files for shipment. We delivered 204,160 utterances from 5005 callers. The transcription conventions were documented and delivered to the LDC and are included at the conclusion of this report. Utterances from an additional 292 callers were transcribed and later discarded due to either a low number of acceptable utterances or the age of the caller. In order to insure the accuracy of the transcriptions, we used a two step checking process. The first step was an automatic check that corrected spelling, typos, and other known problems. The second step was the manual verification of all utterances in categories where we expected the most transcriptions errors. A study of 28,000 utterances showed that slightly over 5% of all transcribed utterances contained transcription errors, of which .5% were spelling errors or typos. Since the verification task was not bid into the original contract, and we had neither the time nor resources to verify all 200,000 utterances, we concentrated our verification effort on the utterance types that contained the most transcription errors. Utterance types and transcription error rates are listed below. A large share of the transcription errors for "names", "WSJ/TIMIT", and "place names" were for mispronunciations that the transcribers didn't catch. type % of utts with errors ---- --------------------- names 4.5 WSJ/TIMIT 2.7 place names 2.7 ATIS 2.3 dates 2.2 dollars 1.8 panel ID 1.8 personal 1.7 numbers 1.7 spelled words 1.5 credit card # 1.5 phone # 1.1 city in state .73 fractions .73 words .54 yes/no .44 time 0 Tables containing information about speakers and calls were created and distributed via ftp. A description of the tables follows. TABLE 1: Caller Demographics ============================ This gives the call number, the caller's sex, age, home state, income group, and education group. Sex and age are determined from the caller's responses to questions asked during data collection, or, if those weren't available (or couldn't be determined), we used information from the panel house. The home state, income group, and education group were all determined from panel house information, tracked with the panel ID number the callers' gave. A significant number of callers did not give a valid panel ID number, so not all information could be listed for them. Question marks are used in any field were the information couldn't be determined from any source. Table entries are a comma separated lists: Call #, Sex, Age, State, Income, Education 12000058,F,47,VA,4,5 12000060,M,26,MI,5,4 12000061,M,48,MI,4,7 12000062,F,41,??,?,? 12000064,F,34,SC,1,3 12000065,M,39,MI,5,7 12000068,F,22,WI,3,5 Income Decoding table: 1 - Under 12,500 2 - 12,500 - 24,999 3 - 25,000 - 39,999 4 - 40,000 - 59,999 5 - 60,000 and Over Education Decoding table: (We list the higher of Female Head of Household's Education and Male Head of Household Education) 1 - Elementary: Less than 8 years 2 - Elementary: 8 years (graduate) 3 - High School: 1-3 years 4 - High School: 4 years (graduate) 5 - College: 1-3 years (attended college or Associate degree) 6 - College: (graduate) 7 - College: (postgraduate studies) 0 - No Answer TABLE 2: Profile of the Call ============================ This lists call number, date, time, incoming line number, whether it was on a cordless phone, and the number of good utterances from the call. The cordless phone field is filled in based on the caller's response to a question at the beginning of data collection. If the answer couldn't be determined, a '?' is listed. Table entries are comma separated lists: Call #, Date, Time, Line Number, Cordless?, # of Good Utts 12000058,930819,062233,12,N,27 12000060,930819,072016,12,Y,39 12000061,930819,074140,12,N,35 12000062,930819,080350,12,N,30 12000063,930819,082740,12,Y,37 12000064,930819,084126,12,Y,20 12000065,930819,085038,12,N,34 12000066,930819,092639,12,N,28 12000067,930819,093919,12,N,21 12000068,930819,095110,12,Y,40 TABLE 3: Transcription and Utterance Profile ============================================ This table lists the call number, the utterance number within the call, the utterance type, and transcription for the utterance (in quotes). Table entries are comma separated lists: Call #, Utt #, Utt type, Transcription 12000058,01,yes/no,"yes" 12000058,02,yes/no,"no" 12000058,04,yes/no,"no" 12000058,05,natural_number,"one thousand five hundred twenty" 12000058,07,time,"nine forty six a m eastern standard time" 12000058,08,date,"september twenty fifth nineteen forty six" 12000058,09,place,"richmond virginia" 12000058,10,digits,"zero five five five one six eight" 12000058,11,application_word,"divided by" 12000058,14,place,"des moines iowa" TABLE 4: Panel House Information ================================ This table contains the raw panel house information, and can be decoded according to information provided in the file "panlhous.doc". Sample entries are shown here: 0000125421290294120359115444211113115551242347 1178111831 \ 381 07 0000127420030295120457085555211107115551242252 097810880108832\ 362 07 Each entry is 90 characters wide, and contains 50 distinct, fixed-width data fields. There are no explicit separators between fields; space characters represent (portions of) fields that have been left blank. TABLE 5: Summary Utterance Inventory ==================================== This table was compiled by LDC from the Transcription and Utterance Profile (Table 3). Like the first two tables, it contains one entry per call, with the first field of each entry being the call number. The second field is a 44-character string which encodes the utterances that are present and absent for the call. Utterances that are present in the call are represented by an alphabetic character that indicates the response type for that utterance; missing utterances are represented by an underscore character. Following this string there are 16 numeric fields, representing the number of utterances present for each of the 15 response types, and the total number of utterances present in the call. All fields are separated by commas. The following sample lines from this table show a call in which all utterances are present (none are missing), and a call in which only 26 utterances are present (18 are missing): 09000010,yyyyndtdpowAopsnrnTtwWacadTwawoTrpfawsrAnWwy,6,5,4,4,3,3,3,\ 3,3,2,2,2,2,1,1,44 12000106,yy_yn_t__o_Ao_sn__Ttw_a_a_Twaw_T____wsrAnW__,4,3,3,3,1,2,0,\ 0,3,2,2,2,1,0,0,26 The following list shows, for each of the response types, the number of such responses that would be present in a complete call, and the letter used to represent each type in second field of the table; the sequence of types in this list is identical to the sequence of numeric fields in the table: 6 w application_word 5 y yes/no 4 n natural_number 4 a dollar_amount 3 r name_at_address 3 o digits 3 p place 3 d date 3 T TIMIT 2 s spelled_word 2 t time 2 A ATIS 2 W WSJ 1 c name_at_agency 1 f fraction For convenience, an additional file has been provided, called "uttsumry.hdr", that consists only of three lines containing properly aligned headings for each column of "uttsumry.tbl". To provide labels for each column of the table, simply append this ".hdr" file at the beginning of the ".tbl" file.