===================================================== West Point Russian Speech ===================================================== Developers: Col. Stephen A. LaRocca, Dr. Christine Tomei, John Morgan Authors: Col. Stephen A. LaRocca and Dr. Christine Tomei The Center For Technology Enhanced Language Learning Department Of Foreign Languages United States Military Academy 745 Brewerton Road West Point, NY 10996 Email: gs0416@usma.edu Phone: 845-938-5286 Fax: 845-938-3585 Introduction: Staff and Faculty of the Department of Foreign Languages (DFL) and the Center for Technology Enhanced Language Learning (CTELL) designed the West Point Russian corpus to provide a set of recordings for the training and development of speaker-independent speech recognition systems for use by West Point cadets enrolled in the Russian language program. Materials: The Collection Script: The collection script consists of 96 sentences with a total of 528 tokens and 351 types. The prompts are labeled "s01" through "s96". The Lexicon: The file "dict.txt" contains 690 distinct orthographic word forms, including all words found in the collection script. Each line of the lexicon contains one word entry: the orthography is given first, followed by a tab character, then the phone string for the word, with space characters separating the individual phone symbols. All phone strings end with the "sp" (short pause) segment. The Phones: @back mid , unrounded vowel A back low tense vowel B voiced bilabial palatalized stop CH unvoiced palato--alveolar affricate D voiced dental palatalized stop E front mid tense vowel F unvoiced labio--dental palatalized fricative G voiced palatal stop I front high tense vowel M bilabial palatalized nasal N dental palatalized nasal O mid-back Round vowel P unvoiced bilabial palatalized stop R alveolar palatalized discontinuous affricate S unvoiced dental palatalized fricative SH unvoiced palato--alveolar palatized fricative T unvoiced dental palatalized stop V voiced labio--dental palatized fricative Z voiced dental palatalized fricative a back lax vowel b voiced bilabial stop c unvoiced dental affricate d voiced dental stop e mid--front lax vowel f unvoiced labio--dental fricative g voiced velar stop i front high lax vowel j semi--vowel consonantal glide k unvoiced velar stop l alveolar--dental liquid L alveolar--dental palatalized liquid m bilabial nasal n dental nasal p unvoiced bilabial stop r alveolar discontinuous affricate s unvoiced dental fricative sh unvoiced palato--alveolar fricative sil silence sp short pause t unvoiced dental stop u back high lax round vowel v voiced labio--dental fricative x velar fricative y back high lax unrounded vowel z voiced dental fricative zh voiced palato--alveolar fricative The Transcriptions Each waveform file has a monophone and word level master label file (*.mlf) transcription in HTK-format; the *.mlf files are presented in a concatenated form in "monophone.mlf" in the labels subdirectory. These files contain a multi-line entry for every speech file in the corpus -- the first line of each entry gives the file name, and the phones are provided in sequence on the following lines, one phone per line. Master label files are provided at both the word level and the phone level. All sentence transcripts begin and end with the "sil" (silence) segment. The Data: Speech data was collected using laptop computers running Windows NT. Recordings were captured at a sampling rate of 16 bit @ 22050 Hz using a Shure SM10A microphone and a RANE Model MS1 pre-amplifier. A visual display of the sentence, along with a digital recording of the sentence as read by a native speaker was presented. The informant pressed the Enter key to record their utterance. The informant's recording was played back for review, and the utterance was re-recorded, if necessary. The corpus consists of 4,181 speech files. Approximately 2,290 are from native informants and 1,891 files are from non-native informants. The following tables show the breakdown of corpus content in terms of male, female, native and non-native speakers. Table 1: number of speakers male female total native: 13 16 29 non-native: 16 10 26 totals: 29 26 55 Table 2: hours of data male female total native: 1.1 1.3 2.4 non-native 1.1 0.8 1.9 totals: 2.2 2.1 4.3 Table 3: Bytes of data male female total native: 177932000 211848000 389776000 non-native: 181528000 134068000 315592000 totals 359456000 345908000 705364000 Table 4: number of speech files male female total native: 1027 1263 2290 non-native: 1103 788 1891 totals: 2130 2051 4181 Acknowledgments Planning, execution and development of the West Point Russian Speech corpus was performed by the following members of the Center for Technology Enhanced Language Learning: John J Morgan, Col. Stephen A LaRocca, Charles Ruscelli, and Sherri Bellinger. The following members of the Department Of Foreign Languages at West Point are acknowledged: Dr. Lawrence Mansour, Maj. David Bennett. John Morgan thanks Dr. T.V. Raman for his audio interface to emacs called emacspeak.