Macrophone Final Report

<title> Macrophone Final Report </title>
<h2>

		 	MACROPHONE FINAL REPORT
<p> </h2>
		     Kelsey Taussig, Project Leader
<P>
			     July 13, 1994

<P> <h3>
Project overview </h3> <P>

Macrophone was an effort that produced a large corpus of telephone 
speech appropriate to the development of automatic voice-interactive
telephone services.  The corpus includes over 200,000 transcribed 
utterances from over 5000 speakers.  All data was collected in 8-bit 
mulaw digital form directly from T1 telephone channels.
<P>
<h3>
Summary of work. </h3> <P>
This project was divided into three phases: setup, collection, and
file preparation.
<P>
Tasks for the setup phase included:
<ol>
<li> Procure and configure requisite collection hardware and software.
	<p>
	We purchased and installed the hardware required for data
	collection which included a two-sided printer, two 2-GByte 
	disks, an exabyte drive and exabyte tapes.  We also updated 
	the collection software and the software that generated the 
	prompt sheets.

<li> Finalize material design.
<P>
	The material design was finalized to the following specification:
<P>
	Of the 34 read utterances, we specified <ul>
		<li> 4 digit strings,
		<li> 3 natural numbers,
		<li> 4 dollar amounts,
		<li> 1 fraction,
		<li> 2 places,
		<li> 6 application words,
		<li> 2 spelled words,
		<li> 7 sentences (3 TIMIT + CSR, 2 WSJ, 2 ATIS),
		<li> 1 date,
		<li> 1 name at agency, and
		<li> 3 names at street addresses.
	</ul> 
	Of the 11 spontaneous responses, we specified <ul>
		<li> 5 yes/no,
		<li> 1 major city,
		<li> 1 current date,
		<li> 1 current time,
		<li> 1 birth date,
		<li> 1 natural number, and
		<li> 1 comment.
	</ul>
	Although comments were collected as part of Macrophone, they were not
	transcribed and are not part of the 200,000 delivered utterances.
<P>
	A separate document entitled Macrophone Materials contains a detailed
	description of the material and is included with this report.

<li> Print sheets for mailing
<P>
	In total, 22,000 unique sheets were printed for mailing.  Twenty
	thousand (20,000) were printed originally, and then an additional 2000
	were printed to compensate for a low response rate in the 18-28 year
	old category.

<li> Settle on a sample and schedule with the market research mailing firm.
<P>
	We received bids from Market Facts, NPD (formerly HTI), and NFO.
	We accepted NFO's bid of $24,000 for 2000 mailings.  We later
	made an additional mailing of 2000 targeted at 18-28 year olds
	at a cost of $3000.
<P>
	We specified the following respondent characteristics: <ul>
		<li>even gender distribution,
		<li>flat geographic distribution,
		<Li>ages 10-80,
		<li>wide income intervals at the bottom of the scale.
</ul> 
	
	Due to NFO's underestimation of the response rate of 10-18 year
	olds, we received many more calls from juveniles than desired.
	Those calls were selectively filtered out for a more desirable
	age distribution.
<P>
	Due to the higher household income specification, we received
	many fewer calls from 18-28 year olds than desired.  In an effort
	to smooth the age distribution, we sent an additional mailing of 
	2000 to that age group.
</ol> 	

Tasks for the collection phase included:
<oL>
<li> Monitor and store 5,000 incoming calls.
	<P>
	We collected 6700 calls from the original 20,000 mailings, at
	a 33% response rate.  
	<p>
	The additional mailing of 2000 to 18-28 year olds resulted in 310  
	calls of which only 201 were in the target age group.  We suspected 
	the 109 callers out of the targeted age group were parents calling 
	in for their children.  The response rate of 10%, suggests that perhaps
        university postings or electronic bulletin boards might be better 
	sources than panel houses for this particular age group.

<li> Hire and train temporary workers for file verification and transcription.
<P>
	We hired and trained 6 half-time transcribers.  Their training
	instructions were distributed to Jack Godfrey, and are included
	at the end of the report.

<li> Transcribe demographic information from calls; write to headers and archive.
<P>
	Demographic information was transcribed for all of the collected
	calls.
<P>
	Demographic information included a gender decision made by the
	transcriber as well as responses to the following utterances: <ul>
		<li>Do you speak any language besides English at home?
		<li>Are you using a cordless phone?
		<li>What is your date of birth?
</ul>
	The sheet identifier (in the form of a 10-digit telephone number) was
	also transcribed at this time.  Since each of the 22,000 sheets
	contained a unique set of read material, the sheet identifier was used
	to supply the default transcriptions for the read utterances for a
	particular sheet.
<P>
	SPHERE headers were written for all files.  A typical file header is
	shown below:
<pre>
	NIST_1A
  	1024
	birthday -s6 530317
	speaking_mode -s4 read
	caller_id -s8 11001248
	non_native_speaker -s2 no
	cordless_phone -s2 no
	gender -s4 male
	panel_number -s7 0137631
	sheet_identifier_1 -s10 4454978797
	recording_date -s6 930914
	recording_time -s6 130719
	database_id -s10 MACROPHONE
	database_version -s3 1.0
	microphone -s9 telephone
	sample_rate -i 8000
	sample_count -i 61504
	channel_count -i 1
	sample_n_bytes -i 1
	sample_sig_bits -i 8
	sample_byte_format -s6 mu-law
	prompt_text -s26 Say the credit card number
	transcription -s69 two three two four dash six six oh seven \
		dash three three three three
	response_category -s6 digits
	end_head
</pre>

<li> Prepare and package groups of utterance files for shipment.
<P>
	We delivered 204,160 utterances from 5005 callers.  The transcription
	conventions were documented and delivered to the LDC and are included
	at the conclusion of this report.
<P>
	Utterances from an additional 292 callers were transcribed and later
	discarded due to either a low number of acceptable utterances or the
	age of the caller.
<P>
	In order to insure the accuracy of the transcriptions, we used a two
	step checking process.  The first step was an automatic check that
	corrected spelling, typos, and other known problems.  The second step
	was the manual verification of all utterances in categories where we
	expected the most transcriptions errors.  A study of 28,000 utterances
	showed that slightly over 5% of all transcribed utterances contained
	transcription errors, of which .5% were spelling errors or typos.
	Since the verification task was not bid into the original contract,
	and we had neither the time nor resources to verify all 200,000
	utterances, we concentrated our verification effort on the utterance
	types that contained the most transcription errors.
<p>
	Utterance types and transcription error rates are listed below.  A
	large share of the transcription errors for "names", "WSJ/TIMIT", and
	"place names" were for mispronunciations that the transcribers didn't
	catch.
<pre>
		type            % of utts with errors
		----            ---------------------
		names                    4.5
		WSJ/TIMIT                2.7
		place names              2.7
		ATIS                     2.3
		dates                    2.2
		dollars                  1.8
		panel ID                 1.8
		personal                 1.7
		numbers                  1.7
		spelled words            1.5
		credit card #            1.5
		phone #                  1.1
		city in state             .73
		fractions                 .73
		words                     .54
		yes/no                    .44
		time                     0
</pre>


	Tables containing information about speakers and calls were created
	and distributed via ftp.  A description of the tables follows.
<P>
	TABLE 1: Caller Demographics
<P>
	This gives the call number, the caller's sex, age, home state, income
	group, and education group.  Sex and age are determined from the
	caller's responses to questions asked during data collection, or, if
	those weren't available (or couldn't be determined), we used
	information from the panel house.  The home state, income group, and
	education group were all determined from panel house information,
	tracked with the panel ID number the callers' gave.  A significant
	number of callers did not give a valid panel ID number, so not all
	information could be listed for them.  Question marks are used in any
	field were the information couldn't be determined from any source.
<P>
	Table entries are a comma separated lists:
<pre>
		Call #, Sex, Age, State, Income, Education

		12000058,F,47,VA,4,5
		12000060,M,26,MI,5,4
		12000061,M,48,MI,4,7
		12000062,F,41,??,?,?
		12000064,F,34,SC,1,3
		12000065,M,39,MI,5,7
		12000068,F,22,WI,3,5
</pre> 
	Income Decoding table:
<pre>
		1 - Under 12,500
		2 - 12,500 - 24,999
		3 - 25,000 - 39,999
		4 - 40,000 - 59,999
		5 - 60,000 and Over
</pre>
	Education Decoding table:
	(We list the higher of Female Head of Household's Education and Male
 	Head of Household Education)
<pre>
	1 - Elementary: Less than 8 years
	2 - Elementary: 8 years (graduate)
	3 - High School: 1-3 years
	4 - High School: 4 years (graduate)
	5 - College: 1-3 years (attended college or Associate degree)
	6 - College: (graduate)
	7 - College: (postgraduate studies)
	0 - No Answer
</pre>

	TABLE 2: Profile of the Call
<p>
	This lists call number, date, time, incoming line number, whether it
	was on a cordless phone, and the number of good utterances from the
	call. The cordless phone field is filled in based on the caller's
	response to a question at the beginning of data collection.  If the
	answer couldn't be determined, a '?' is listed.
<P>
	Table entries are comma separated lists:
<pre>
	Call #, Date, Time, Line Number, Cordless?, # of Good Utts

	12000058,930819,062233,12,N,27
	12000060,930819,072016,12,Y,39
	12000061,930819,074140,12,N,35
	12000062,930819,080350,12,N,30
	12000063,930819,082740,12,Y,37
	12000064,930819,084126,12,Y,20
	12000065,930819,085038,12,N,34
	12000066,930819,092639,12,N,28
	12000067,930819,093919,12,N,21
	12000068,930819,095110,12,Y,40
</pre>

	TABLE 3: Transcription and Utterance Profile
<p>
	This table lists the call number, the utterance number within the
	call, the utterance type, and transcription for the utterance (in
	quotes).
<P>
	Table entries are comma separated lists:
<prE>
	Call #, Utt #, Utt type, Transcription

	12000058,01,yes/no,"yes"
	12000058,02,yes/no,"no"
	12000058,04,yes/no,"no"
	12000058,05,natural_number,"one thousand five hundred twenty"
	12000058,07,time,"nine forty six a m eastern standard time"
	12000058,08,date,"september twenty fifth nineteen forty six"
	12000058,09,place,"richmond virginia"
	12000058,10,digits,"zero five five five one six eight"
	12000058,11,application_word,"divided by"
	12000058,14,place,"des moines iowa"

</pre>
	TABLE 4: Panel House Information
<P>
	This table contains the raw panel house information, and can be
	decoded according to information provided in the file "panlhous.doc".
	Sample entries are shown here:
<pre>
	0000125421290294120359115444211113115551242347   1178111831     \
		  381           07
	0000127420030295120457085555211107115551242252   097810880108832\
	          362           07
</pre>
	Each entry is 90 characters wide, and contains 50 distinct,
	fixed-width data fields.  There are no explicit separators between
	fields; space characters represent (portions of) fields that have been
	left blank.
<P>

	TABLE 5: Summary Utterance Inventory
<P>
	This table was compiled by LDC from the Transcription and Utterance
	Profile (Table 3).  Like the first two tables, it contains one entry
	per call, with the first field of each entry being the call number.
	The second field is a 44-character string which encodes the utterances
	that are present and absent for the call.  Utterances that are present
	in the call are represented by an alphabetic character that indicates
	the response type for that utterance; missing utterances are
	represented by an underscore character.  Following this string there
	are 16 numeric fields, representing the number of utterances present
	for each of the 15 response types, and the total number of utterances
	present in the call.  All fields are separated by commas.
<P>
	The following sample lines from this table show a call in which all
	utterances are present (none are missing), and a call in which only 26
	utterances are present (18 are missing):
<pre>
09000010,yyyyndtdpowAopsnrnTtwWacadTwawoTrpfawsrAnWwy,6,5,4,4,3,3,3,\
	3,3,2,2,2,2,1,1,44

12000106,yy_yn_t__o_Ao_sn__Ttw_a_a_Twaw_T____wsrAnW__,4,3,3,3,1,2,0,\
	0,3,2,2,2,1,0,0,26
</pre>
	The following list shows, for each of the response types, the number
	of such responses that would be present in a complete call, and the
	letter used to represent each type in second field of the table; the
	sequence of types in this list is identical to the sequence of numeric
	fields in the table:
<pre>
		6 w application_word
		5 y yes/no
		4 n natural_number
		4 a dollar_amount
		3 r name_at_address
		3 o digits
		3 p place
		3 d date
		3 T TIMIT
		2 s spelled_word
		2 t time
		2 A ATIS
		2 W WSJ
		1 c name_at_agency
		1 f fraction
</pre>
	For convenience, an additional file has been provided, called
	"uttsumry.hdr", that consists only of three lines containing properly
	aligned headings for each column of "uttsumry.tbl".  To provide labels
	for each column of the table, simply append this ".hdr" file at the
	beginning of the ".tbl" file.
<P>