Cellular TIMIT Speech Corpus

Training and Test Data
Version 1.0alpha

February, 1996

Developed by Lockheed-Martin Sanders, Inc.
Produced on CD-ROM by the Linguistic Data Consortium (LDC)

Copyright 1996 by Lockheed-Martin Sanders, Inc.
All rights reserved

1. Introduction

The CTIMIT corpus is a cellular-bandwidth adjunct to the TIMIT Acoustic- Phonetic Continuous Speech Corpus (NIST Speech Disc CD1-1.1/NTIS PB91-505065, October 1990). The corpus was contributed by Lockheed-Martin Sanders to LDC for distribution on CD-ROM media.

Please note that Lockheed-Martin Sanders, Inc. retains full copyright on the corpus and all associated materials.

The CTIMIT read speech corpus has been designed to provide a large, phonetically labeled database for use in the design and evaluation of speech processing systems operating in diverse, often hostile, cellular telephone environments. CTIMIT was collected by members of the Voice Communication Initiative (VCI) at Lockheed-Martin Sanders' Signal Processing Center of Technology (SPCOT) as part of internal R&D efforts, with additional sponsorship from the Wireless Communications Group in the company's Advanced Engineering and Technology (AE&T) Division. This file contains a brief description of the CTIMIT Speech Corpus. Additional information on CTIMIT may be found in documentation described below, and in the paper "CTIMIT: A Speech Corpus for the Cellular Environment with Applications to Automatic Speech Recognition" by K. L. Brown and E. B. George, which appears in the Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 105-108.

2. Database Characteristics

Referring to the "readme.doc" file of the TIMIT database (NIST Speech Disc CD1-1.1, directory "/timit"), CTIMIT has the following characteristics in common with TIMIT: corpus speaker distribution, corpus text distribution, suggested training/test subdivision, directory/file structure, and file types. CTIMIT is an incomplete collection of the original TIMIT data, with missing sentences documented (see Section 4). The other primary difference between TIMIT and CTIMIT is an 8 kHz sampling rate for CTIMIT compared with 16 kHz for TIMIT.

3. Collection Method

CTIMIT was generated by transmitting and redigitizing 3367 of the 6300 original TIMIT utterances over cellular telephone channels from a specially equipped van in a variety of driving conditions, traffic conditions, and cell sites in southern New Hampshire and Massachussetts. TIMIT utterances were digitally recorded in random order on DAT tapes in twelve sessions of ~30 minutes each. Two continuous cellular calls of fifteen minutes each were made from the first and second halves of each tape. Recorded data were played in the van over a loudspeaker/cellular handset combination in a test stand calibrated to duplicate "close talking" cellular conditions. Calls were received by a single telephone line in the SPCOT lab.

In the lab, each received call was digitized at 8 kHz, segmented and time-aligned with the original TIMIT utterances, and placed in a file structure equivalent to the "test" and "train" directories of TIMIT. To account for the sampling rate difference between TIMIT and CTIMIT, TIMIT phonetic label files were processed to integer divide sample numbers by two, and the resulting labels were placed in the CTIMIT directory structure. More documentation of the collection method may be found in the files "paper.*" and "poster.ps" in the "doc" subdirectory (see Section 4).

4. Online Documentation

Documentation associated with the CTIMIT corpus is located in the "doc" subdirectory. As in the TIMIT corpus, files in this directory with a ".doc" extension contain freeform descriptive text and files with a ".txt" extension contain tables of formatted text which can be searched programmatically. Lines in the ".txt" files beginning with a semicolon are comments and should be ignored on searches. Files with a ".ps" extension are PostScript documents. The following is a brief description of the contents of the "doc" subdirectory:

tapeNN.txt - Sequential list of sentences recorded on DAT tape NN, with no ".wav" extension.
present.txt- List of sentences present in the CTIMIT database, by session.
absent.txt - List of sentences absent from the CTIMIT database, by session.
spkrlist.txt - List of speakers present in CTIMIT database, along with the number of sentences present for each speaker.
paper.ps  - PostScript document describing CTIMIT database.
paper.doc - ASCII text document describing CTIMIT database.
poster.ps  - PostScript document of poster presentation on CTIMIT from ICASSP 1995.
spkrinfo.txt - Demographic information about speakers (copied from TIMIT)
spkrsent.txt - Table of sentences present per speaker (adapted from TIMIT)
prompts.txt - Complete listing of prompting sentences (copied from TIMIT)
timitdic.txt - Pronouncing lexicon covering all prompts (copied from TIMIT)
timitdic.doc - Explanation of phonemic codes in lexicon (copied from TIMIT)
phoncode.doc - Explanation of phonetic segment labels (copied from TIMIT)

5. Limitations

This experimental version of CTIMIT was primarily intended as a "concept validation" database, both to work out the considerable logistical problems associated with this type of collection and to test the effectiveness of using the resulting data in speech processing system design. Limitations of the corpus include a single vehicle used in the collection, a single receiving phone line, a limited number of cell phones used, and no "hands-free" mode simulation. A further limitation is a lack of specific documentation of call conditions, although generally sessions 1-5 are relatively good cellular channels, while sessions 6 and 7 are relatively bad channels characterized by significant interference and a high dropout rate. A related limitation is missing data, as evidenced by the "absent.txt" sentence list. "absent.txt" lists sentences lost due to call dropouts during playback, which are numerous in sessions 6 and 7. Furthermore, data collection was not performed for sessions 8-12.

6. Acknowledgements

The CTIMIT database collection was a true team effort at Lockheed-Martin Sanders. Dave Morgan, VCI leader at the time, and Bill Lindsay of the Wireless Group, provided financial support and were instrumental in defining the focus of the work. Bryan George led the collection effort, and was responsible for such diverse aspects as signal processing design, audio equipment calibration, and van driving. Kathy Brown designed and performed all validation experiments using the CTIMIT data in a phoneme recognizer. Martha Birnbaum provided the knowledge, experience, and software used to manage collecting received calls in the lab. Mike Macon monitored the collection in real-time, providing valuable feedback to Bryan when calls were dropped, and enduring episodes of unfocused irritation in the process. Special mention goes to Steve Kimball and Ed Real of SPCOT, for helping formulate the remarkably effective "chirp signal" alignment strategy.