Yes/No Corpus Release 1.2 Center for Spoken Language Understanding UPDATED: 23 August 2002 This document describes the file naming conventions used for this distribution and gives a brief description of the various file formats used. File Naming Convention ---------------------- Each filename in /speech and /trans directories encodes information about the call number, utterance type, and file type. A typical filename will look like: YN-1214.evermarried.wav The "YN" prefix indicates the corpus name, i.e. Yes/No. The number between the "-" and the first "." is the call number. Call numbers are described below. The string between the first and second "." is the utterance type. The following utterance types are in this corpus: cellular "Are you calling from a cellular phone?" ever_married "Have you ever been married?" have_home_phone "Do you have a telephone at home?" hispanic "Are you of Hispanic origin?" letterA "Is the letter A in your last name?" no "Please say 'no'". results "Would you like to hear about the results of our research?" yes "Please say 'yes'". yorn Various prompts The final, three letter, extention indicates the file type. The following types are in this distribution: wav The speech data txt The text-based transcription of the speech data File Formats ------------ The "wav" files contain speech data and use the RIFF standard file format. This file format is 16-bit linearly encoded. The "trans" file in the /docs directory is a list of all of the text transcriptions. Each file transcription is on a separate line. The first value on the line, separated by a single space, is a call number, utterance type, and transcription type triplet. This pair uniquely defines each file. The transcription type is either txt or mtxt for human or machine based transcriptions, respectively. The remaining words on the line are the transcription. FOR THIS VERSION OF THE CORPUS, THE CONTENTS OF THE TRANS.TXT FILE HAVE BEEN EXTRACTED INDIVIDUALLY INTO THE /TRANS DIRECTORY. The /trans directory file structure exactly parallels the structure of the /speech directory. Each file in the /trans directory is in .txt format and contains a line, as desribed in the previous paragraph, that uniquely defines each corresponding sound file.