OVERVIEW OF THE OGI SPELLED AND SPOKEN WORD CORPUS 1. Files and directories All calls were numbered in the order they were received. There are numerous gaps in the sequence due to hangups, etc. Each caller answered several questions (see icslp92.tex or icslp92.ps in this directory). The response to that question is stored in a file which identifies the call number plus the question. The question "response types", along with the associated directory names and file name prefixes, are: Question/Prompt Directory File Prefix --------------- --------- ----------- alphabet alphabet alp_ apostrophe apostrph apo_ call_from callfrom cal_ capital capital cap_ hometown hometown hom_ hyphen hyphen hyp_ letter_A_yesno letrA_yn let_ results_yesno reslt_yn res_ say_fname say_fnam saf_ say_lname say_lnam sal_ spell_fname_pause sp_fnamp sfp_ spell_lname sp_lnam sln_ spell_lname_pause sp_lnamp slp_ For example, "hometown/hom_322.wav" is the wav-format sample file for caller 322's response to the hometown question. Similarly, "hometown/hom_322.ptl" is the OGI lola-format label file for that response. Wav and ptl files are kept in separate directories with a parallel structure: "speech" for wav files, "handlabl" for ptl files. Within these directories, the files are further subdivided by call number div 100. So sp_lnam/0 contains responses to the spell_lname prompt for calls through 99, sp_lnam/1 for calls 100 through 199, etc. 2. PHONEDB log file The files db/phonedb.lo (calls 61-2000) and db/phonedb.hi (calls 2001-4218) are the log files created when the calls were processed (the corpus log is divided into two files due to its very large size). For each call there are a number of entries all of which begin with the call number and the response type, e.g. "1652 say_lname". The second field in each line is either a response type, e.g. callfrom, or a keyword that indicates global (applies to the whole call) information, e.g. age. The responses to each response type (listed above) were generated according to the following prompts: say_fname: answer to what is your first name prompt say_lname: answer to what is your last name prompt spell_lname: last name spelled spell_fname_pause: first name spelled with pauses spell_lname_pause: last name spelled with pauses callfrom: answer to where are you calling from prompt hometown: answer to what town did you grow up in prompt letterA_yesno: answer to does your last name contain the letter A results_yesno: answer to would you like more information hyphen: response to please say hyphen capital: response to please say capital apostrophe: response to please say apostrophe alphabet: "We will now ask you to say the alphabet. We need you to pause briefly bewween letters, like this: A B C D E F G. You may hang up when you are finished. Please begin speaking now." The keywords and the values which follow on the line are: labeler label_date gender <"male" or "female" or "unknown" based on listening judgement> age <"adult" or "child"> intelligibility <"poor" or "typical" -- refers to speaker intelligence> connection <"poor" or "typical" -- refers to telephone connection> After each response type are keywords followed by the values for the key words: text original mean min_samp max_samp samples comments The "comments" keyword, if it exists, can have the following (multiple blank-separated) values: ES -- there is extraneous speech in the response (beyond answer to question EN -- there is environmental noise in the response BN -- there is excessive breath noise in the response NI -- caller did not follow instructions NC -- not complete LN -- there is line noise in the response FS -- response contains fluent spelling Consistency in application of these comment codes was not as carefully checked as was word transcription... We find it convenient to process PHONEDB.LO/HI logs into dbm files for fast lookup of transcriptions. (See the other files in the "db" directory). 3. Note on Gradient recordings. The Gradient Desklabs we used to record the database have 14 bit samples which are stored in 16 bit shorts. The least significant 2 bits are always 0. To see this, add "original mean" to every sample. The utterances were recorded with gain set to 10, which is the limit of what the manufacturer recommends. No highpass filter was used. A lowpass anti-aliasing filter was used.