Foreign Accented English Corpus Release 1.2 Center for Spoken Language Understanding UPDATED: 3 June 2002 This document describes the file naming conventions used for this distribution and gives a brief description of the various file formats used. File Naming Convention ---------------------- File naming follows the following convention: FBP00006.wav The first letter ("F") is the prefix indicating the corpus to which this data belongs, the second and third letters ("BP") represent the speaker's native language, and the numbers ("00006") are a unique speaker ID number. File Formats ------------ There are two file formats used in this corpus (other than The documentation). The .wav file format used is the RIFF standard file format. This file format is 16-bit linearly encoded. The info file format is described in the overview.txt file. Some of the files in this corpus are also included in the CSLU 22 Language Speech corpus. Those files have been verified by a native speaker of the language. A variety of information about the speaker was collected into an .inf file. There are info files for 1785 of the calls, since native speakers have not screened all of the calls. As an example, these are the contents of AR00145.inf: 145 general dialect bahrain 145 general gender male 145 general age adult 145 general connection good 145 general intelligibility good The first field is the call number, the second is the comment category (all are general), the third field contains the variety of information being presented, and the final field is the value of that particular item. Thus this file tells us that the speaker is an adult male who speaks the Bahrain dialect of Arabic. We can also see that the level of connection (line) quality and speaker intelligibility were good.