File: spkrinfo.doc ------------------ The file "spkrinfo.tbl" presents all the information available regarding speaker demographics. The fields in this table are described below. (The "caller" is the person who initiated the call, and whose voice appears on channel A of the speech and transcripts.) Note that there is no information available about the age, education or geographic origin of the callee (the speaker on channel B). Field# Content ------------------- 1 Call-ID (four digits, as they appear in data file names) 2 Gender of caller (M or F) 3 Age of caller 4 Years of education completed by caller 5 Where the caller grew up (2-digit state code, or "varied") 6 Country-code or area-code plus first three digits of telephone number dialed (last four digits of number are encrypted as three letters) The telephone number that was dialed has been encrypted in such a way as to provide anonymity, while still preserving the identity of each number -- when the same phone number occurred twice in the collection, it has been rendered as the same alphanumeric string. Phone numbers that begin with "011" represent overseas calls, and the others are domestic calls. Despite the intention to avoid repeat speakers in this collection, the following cases have been found of the same voice occurring in more than one conversation (channel "A" refers to the caller, channel "B" refers to the callee): Call-ID's Remarks --------------------------------------- 0638 4092 Same voice on channel A (*) 4569 4673 Same voice on channel A 4092 4184 Same voice on channel B 4490 5278 5713 Same voice on channel B 4941 6107 Same voice on channel B 5208 6314 Same voice on channel B 5242 5532 Same voice on channel B 6045 6252 Same voice on channel B 6047 6298 Same voice on channel B 6067 6079 Same voice on channel B 6161 6625 Same voice on channel B 6447 6456 Same voice on channel B (*) The caller in 0638 and 4092 appears in spkrinfo.tbl with different values for age and education in these two calls; this is because a period of about two years passed between making the two calls. The cases of repeat callees (same voice on channel B) stemmed from recruiting American citizens to make overseas calls. The recruitment tended in some cases to follow social networks, where two or more people knew the same individual (a native speaker of American English) currently living in a foreign country. While our recruitment and initial call auditing tried to eliminate repeat callers, we failed to detect the repeat callees in these cases until after the calls had been transcribed and prepared for publication. The two cases of repeat callers is due to the fact that the foreign and domestic calls were originally collected under separate projects, and individuals were encouraged to participate in both projects when possible. Later, after recruitment for overseas English calls fell short of the required quantity, some recordings from the domestic-call project were selected for use in CallHome English. At that stage, we mistakenly included the two calls shown above, and this went undetected until the point of publication. David Graff Linguistic Data Consortium December 12, 1996