|Item Name:||TI 46-Word|
|Author(s):||Mark Liberman, Robert Amsler, Ken Church, Ed Fox, Carole Hafner, Judy Klavans, Mitch Marcus, Bob Mercer, Jan Pedersen, Paul Roossin, Don Walker, Susan Warwick, Antonio Zampolli|
|LDC Catalog No.:||LDC93S9|
|Sample Type:||1-channel 12-bit pcm|
|Data Source(s):||microphone speech|
LDC User Agreement for Non-Members
|Online Documentation:||LDC93S9 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Liberman, Mark, et al. TI 46-Word LDC93S9. Web Download. Philadelphia: Linguistic Data Consortium, 1993.|
This release contains a corpus of speech which was originally designed and collected at Texas Instruments, Inc. (TI) in 1980 and used initially in performance assessment tests of isolated-word speaker-dependent technology. (See "Speech Recognition: Turning Theory to Practice" by G. R. Doddington and T. B. Schalk, in IEEE Spectrum, Vol. 18, No. 9, September 1981.)
The 46-word vocabulary consists of two sub-vocabularies: (1) the TI 20-word vocabulary (consisting of the digits zero through nine plus the words "enter," "erase," "go," "help," "no," "rubout," "repeat," "stop," "start," and "yes" as well as (2) the TI 26-word "alphabet set" (consisting of the letters "a" through "z").
The corpus contains read utterances from 16 speakers (eight males and eight females) each speaking 26 utterances of the 46-word vocabulary: 16 tokens designated as training and ten as test. Note these numbers reflect the aim of the collection and for various reasons, the full number of utterances was not reached for some speakers. See the included documentation for more information.
The corpus was collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone at 12.5kHz sample rate with 12-bit quantization. The files are in NIST SPHERE format and have a ".wav" filename extension.
As of October 5, 2016 the documentation was updated to more closely reflect the file inventory.