The CMU SPHINX [1, 2] continuous speech recognition system uses context-dependent phonetic hidden Markov models (HMM) to achieve state-of-the-art recognition performance on large vocabulary applications. The system used to generate results for the designated Oct 89 evaluation test had the following notable features: (1) Speech input to the system was represented by 3 independent 8-bit codebooks made from the following features: - 12 bilinear transformed LPC cepstrum coefficients, at 10 ms frame rate. - 12 differenced (delta) cepstrum over a 40 ms window. - Normalized energy and differenced energy (2 features total). (2) Generalized triphone [2] models were trained by first training all triphone models (within-word and between-word triphones), and then clustering these triphone models using a maximum likelihood criterion. A total of 1100 generalized triphones were trained. Generalized triphone models are interpolated with context-independent phones. (3) A corrective training algorithm [3] is applied to enhance discrimination given a grammar. The model parameters are modified to make the ones contributing to correct recognition more likely, and the ones contributing to incorrect or near-miss recognition less likely. (4) Viterbi beam search augmented with word duration modeling is used for decoding. References: [1] Lee, K.F., "Automatic Speech Recognition: The Development of the SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [2] Lee, K.F., Hon. H.W., Reddy, R., "An Overview of the SPHINX Speech Recognition System", IEEE Transactions on Acoustics, Speech, and Signal Processing, January, 1990. [3] Lee, K.F., "Context-Dependent Phonetic Hidden Markov Models for Continuous Speech Recognition", IEEE Transactions on Acoustics, Speech, and Signal Processing, April, 1990. [4] Lee, K.F., Mahajan, S., "Corrective and Reinforcement Learning for Speaker-Independent Continuous Speech Recognition", Computer Speech and Language, April, 1990.