TI 46-Word
Item Name: | TI 46-Word |
Author(s): | Mark Liberman, Robert Amsler, Ken Church, Ed Fox, Carole Hafner, Judy Klavans, Mitch Marcus, Bob Mercer, Jan Pedersen, Paul Roossin, Don Walker, Susan Warwick, Antonio Zampolli |
LDC Catalog No.: | LDC93S9 |
ISBN: | 1-58563-017-9 |
ISLRN: | 476-195-137-873-5 |
DOI: | https://doi.org/10.35111/zx7a-fw03 |
Member Year(s): | 1993 |
DCMI Type(s): | Sound |
Sample Type: | 1-channel 12-bit pcm |
Sample Rate: | 12500 |
Data Source(s): | microphone speech |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC93S9 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Liberman, Mark, et al. TI 46-Word LDC93S9. Web Download. Philadelphia: Linguistic Data Consortium, 1993. |
Related Works: | View |
Introduction
This release contains a corpus of over five hours of speech which was originally designed and collected at Texas Instruments, Inc. (TI) in 1980 and used initially in performance assessment tests of isolated-word speaker-dependent technology. (See "Speech Recognition: Turning Theory to Practice" by G. R. Doddington and T. B. Schalk, in IEEE Spectrum, Vol. 18, No. 9, September 1981.)
The 46-word vocabulary consists of two sub-vocabularies: (1) the TI 20-word vocabulary (consisting of the digits zero through nine plus the words "enter," "erase," "go," "help," "no," "rubout," "repeat," "stop," "start," and "yes" as well as (2) the TI 26-word "alphabet set" (consisting of the letters "a" through "z").
Data
The corpus contains read utterances from 16 speakers (eight males and eight females) each speaking 26 utterances of the 46-word vocabulary: 16 tokens designated as training and ten as test. Note these numbers reflect the aim of the collection and for various reasons, the full number of utterances was not reached for some speakers. See the included documentation for more information.
The corpus was collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone at 12.5kHz sample rate with 12-bit quantization. The files are in NIST SPHERE format and have a ".wav" filename extension.
Samples
Updates
As of October 5, 2016 the documentation was updated to more closely reflect the file inventory.