Documentation for Speech in Noisy Environments 2 (SPINE2) Part 1

Introduction

This publication contains the Speech in Noisy Environments 2 (SPINE2) Part 1; Audio, created for the Department of Defense (DoD) Digital Voice Processing Consortium (DDVPC) by ARCON Corp., and produced by the Linguistic Data Consortium (LDC) catalog number LDC2001S04 and isbn 1-58563-206-6. A companion corpus, Speech in Noisy Environments 2 (SPINE2) Part 1 Transcripts, was also produced by the Linguistic Data Consortium (LDC) catalog number LDC2001T05, isbn 1-58563-207-4. These corpora support the 2001 Speech in Noisy Environments evaluation.

The 2001 Speech in Noisy Environments Evaluation 2 (SPINE2) is the second attempt to assess the state of the art and practice in speech recognition technology in noisy military environments and to exchange information on innovative speech recognition technology in the context of fully implemented systems that perform realistic tasks. It is intended to be of interest to all university, industrial and commercial speech system developers working on the problem of robust speech recognition. The evaluation gives participants the opportunity to participate in a flexible evaluation, suited to development needs and abilities.

More information on the SPINE 2 evaluation is available at elazar.itd.nrl.navy.mil/spine.

Technical Objective

The SPINE2 evaluation focuses on the task of transcribing speech produced in noisy environments with the emphasis on noisy military environments. The evaluation is designed to promote research progress in this area, to provide the opportunity for participants to try out new ideas for developing robust speech recognition systems that are of both scientific and practical interest, and to measure the performance of this technology.

Task

The evaluation task is to transcribe speech produced in noisy environments. The training and test speech data to be used for this evaluation were generated by ARCON Corp. for the DoD Digital Voice Processing Consortium (DDVPC) under controlled conditions. The speech data consists of conversations between two communicators working on a collaborative, battleship-like task in which they seek and shoot at targets (ARCON Communicability Exercise, ACE). Participants may talk freely, but the total vocabulary used is fairly limited. Each person is seated in a sound chamber in which a previously recorded military background noise environment is accurately reproduced. The participants use handsets and transmission channels that are resident to the particular environment. The part 1 data comprises two talker pairs (four speakers total) with 64 one to four minute conversations per talker pair (about 207 minutes total), which include the four scenarios described below.

The speech data is viewed as a sequence of "turns," where each turn is the period of time when one speaker is speaking. By its nature, the task induces short utterances with relatively long periods of silence intervening. There may be multiple speaker turns for each speaker, i.e. each successive turn may not result in a reversal of speaking and listening roles for the conversation participants. The transcription task is to produce the correct transcription for each of the specified turns.

Please see file.tbl for the directory structure of this publication, as well as a complete list of files.

Data Format

The audio files in this corpus are 2-channel, 16 KHz, 16 bit linear SPHERE files.

The ARCON Excel file G03G04.xls has information by speaker on the noise environments and vocoders.

Updates

Should any additional information, updates, or bug fixes become available, they will appear in the LDC catalog entry for this corpus: LDC2001S04.