West Point Company G3 American English Speech

Item Name: West Point Company G3 American English Speech
Author(s): John Morgan, Stephen LaRocca, Sherri Bellinger, Charles (Chip) Ruscelli
LDC Catalog No.: LDC2005S30
ISBN: 1-58563-349-6
ISLRN: 739-195-943-085-5
Release Date: November 29, 2005
Member Year(s): 2005
DCMI Type(s): Sound
Sample Type: pcm
Sample Rate: 22050
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2005S30 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Morgan, John, et al. West Point Company G3 American English Speech LDC2005S30. Web Download. Philadelphia: Linguistic Data Consortium, 2005.

Introduction

During the 2000-2001 academic year, cadets, staff and faculty members at the United States Military Academy volunteered to participate in a speech data collection project for American English. The goal of the project was to amass recordings from no less than 100 adult speakers (50 males and 50 females) to form a substantial corpus of high-quality read speech.

The project was conducted by the Center for Technology Enhanced Language Learning, part of the U.S. Military Academy's Department of Foreign Languages. Many of the 100-plus volunteers who provided the recordings were members of the staff and faculty of the Department of Foreign Languages. Other volunteers were friends and colleagues from other organizations who worked in offices in Washington Hall.

The largest group of volunteers was from Cadet Company G, Third Regiment, United States Corps of Cadets. Cadet Company G3, encouraged by their tactical officer, Major Scott Custer, adopted the speech data collection effort as a community service project. Every female cadet in Company G3 recorded her voice, as did many of the male cadets, including the cadet company commander and Major Custer.

The 185 sentences comprising the data collection script were written to elicit examples of all or most all of the possible syllables used in spoken American English.

The G3 Corpus audio data comes from 53 female and 56 male volunteers, each of whom recorded approximately 104 utterances. The recordings are sampled at a 16-bit resolution, 22,050 samples per second. Recordings were made using headset microphones (Shure M10) with preamplifiers attached to the line input jack of desktop computers. The total amount of speech is about 15 hours.

Samples

For an example of this corpus, please listen to this audio sample.

Available Media

View Fees





Login for the applicable fee