README.TXT 02/05/2004 This file describes information stored on a DVD entitled The West Point Company G3 American English Speech Data Corpus. During the 2000-2001 academic year, cadets, staff and faculty members at the United States Military Academy volunteered to participate in a speech data collection project for American English. The goal of the project was to amass recordings from no less than one hundred adult speakers, fifty males and fifty females, to form a substantial corpus of high-quality read speech. The project was conducted by the Center for Technology Enhanced Language Learning, part of the U.S. Military Academy’s Department of Foreign Languages. Many of the one hundred-plus volunteers who provided the recordings were members of the staff and faculty of the Department of Foreign Languages. Other volunteers were friends and colleagues from other organizations who worked in offices in Washington Hall. The largest group of volunteers was from Cadet Company G, Third Regiment, United States Corps of Cadets. Cadet Company G3, encouraged by their tactical officer, Major Scott Custer, adopted the speech data collection effort as a community service project. Every female cadet in Company G3 recorded her voice, as did many of the male cadets, including the cadet company commander and Major Custer. The 185 sentences comprising the data collection script were written to elicit examples of all or most all of the possible syllables used in spoken American English. The G3 Corpus audio data comes from 53 female and 56 male volunteers, each of whom recorded approximately 104 utterances. The recordings are sampled at a 16 bit resolution, 22,050 samples per second. Recordings were made using headset microphones (Shure M10) with preamplifiers attached to the line input jack of desktop computers. Also included on the same DVD is a smaller set of recordings made during an earlier project sponsored by Lieutenant Colonel Jim Bass of the U.S. Army Research Laboratory. This speech data consists of military terms and simulates military message traffic. There is an issue of low recording volume affecting some of the 108 data sets. Among the males, five folders contain recordings that play back at very low volume: they are dsc8409, klm5138, lkm1836, sjs9379 and wgh6301. Among the females there are six folders: jjh4962, pbg3862, smt2413, car5177, eod9385 and gls9669. The folder lmb6606 (female) contains only 16 audio files. Researchers and technicians contributing the most to this effort are: John Morgan, Sherri Bellinger, Charles (Chip) Ruscelli, Stephen LaRocca. All information contained in the DVD entitled The West Point Company G3 American English Speech Data Corpus is the sole and exclusive property of the United States Military Academy. Support and assistance from the United States Army Research Laboratory is acknowledged. Questions concerning the corpus should be addressed to the Director, Center for Technology Enhanced Language Learning, Department of Foreign Languages, U.S. Military Academy, West Point, New York 10996, (845) 938-5286.