The BBN BYBLOS continuous speech recognition system uses context-dependent phonetic hidden Markov models (HMM) to achieve state-of-the-art recognition performance on large vocabulary applications [1]. The system used to generate results for the designated Oct 89 evaluation test had the following notable features: (1) Speech input to the system was represented by 3 independent 8-bit codebooks made from the following features: - 14 Mel scale warped cepstra (C1--C14), at 10 ms frame rate. - 14 delta-cepstra estimated by a linear regression on each cepstral feature over a 50 ms window. - Normalized energy and delta-energy (2 features total). For decoding, each codebook was represented as a mixture of Gaussians which was shared by all states in the HMMs. (2) Distinct models were trained for the phonemes in the dictionary as well as for all the left diphones, right diphones, and triphones (including those occurring across word boundaries) found in the training data. The parameters of the context-dependent models were interpolated together with the context-independent phoneme models. Interpolation weights (lambdas) were computed as a function of context type, state within a model, and number of occurrences of the context in training [2]. (3) A 'triphone co-occurrence' smoothing was applied to the context-dependent observation densities after training [3]. References: [1] Schwartz, R., C. Barry, Y. Chow, A. Derr, M. Feng, O. Kimball, F. Kubala, J. Makhoul, J. Vandegrift, "The BBN BYBLOS Continuous Speech Recognition System", Proceedings of the DARPA Speech and Natural Language Workshop, Philadelphia, Pennsylvania, February 1989, pp. 94-99. [2] Schwartz, R., Y. Chow, O. Kimball, S. Roucos, M. Krasner, J. Makhoul, "Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech", IEEE Int. Conf. Acoustics, Speech, Signal Processing, Tampa Florida, March 1985, pp. 1205-1208. [3] Schwartz, R., O. Kimball, F. Kubala, M. Feng, Y. Chow, C. Barry, J. Makhoul, "Robust Smoothing Methods for Discrete Hidden Markov Models", IEEE Int. Conf. Acoustics, Speech, Signal Processing, Glasgow, Scotland, May 1989, pp. 548-551.