Foreign Accented English Corpus Release 1.2 Center for Spoken Language Understanding UPDATED: 3 June 2002 Use of this corpus is permitted only under the conditions of the signed license agreement. Use or redistribution of this corpus outside the agreement is prohibited by law. Overview -------- The Foreign Accented English (FAE) Corpus is a subset of the CSLU 22 Language Corpus, and consists of continuous speech in English by native speakers of 22 different languages. The FAE Corpus consists of 4925 utterances, information about the speaker's linguistic background, and perceptual judgements of the degree of accent in the utterance. The callers were asked to speak about themselves, for 20 seconds, in English. Our goals in developing and releasing the FAE Corpus were to support the study of the underlying characteristics of foreign accent and to enable research, development, and evaluation of algorithms for the identification and understanding of accented speech. Distribution Directory Structure -------------------------------- This is the distribution for Release 1.2 of the Foreign Accented English Corpus. This corpus is distributed by the Center for Spoken Language Understanding of the Oregon Graduate Institute. Following is a description of the directory structure in this release: readme.txt General information regarding the corpus. docs/ The documentation directory. This directory contains further documentation for the Foreign Accented English corpus. labels/ Phonetic labeling directory. This directory is empty for this corpus. misc/ Miscellaneous directory, possibly containing software tools and scripts. This directory contains info files for many of the speech files. A description of info files can be found in the overview.txt file in the /docs directory. speech/ The speech directory contains the actual .wav files. There are many labeled subdirectories within the speech directory. trans/ The transcriptions directory. For this corpus, there are no transcriptions in the directory. This corpus requires approximately 1.4GB of disk space. Please see the /docs directory for further documentation. Contact Information ------------------- Further information about this corpus can be found our web site: . Refer specific questions to: - Alena Tkacova - Linguistic Data Services Manager - Center for Spoken Language Understanding - Oregon Health & Science University - email : alca@asp.ogi.edu - Phone : 503 748-1600 - FAX : 503 748-7038 - Address : 20000 NW Walker Road Beaverton, OR 97006 USA Constructive feedback about this corpus is appreciated.