YOHO Speaker Verification

Item Name: YOHO Speaker Verification
Author(s): Joseph Campbell, Alan Higgins
LDC Catalog No.: LDC94S16
ISBN: 1-58563-042-X
ISLRN: 125-762-148-524-1
DOI: https://doi.org/10.35111/3wc3-n668
Member Year(s): 1994, 1998
DCMI Type(s): Sound
Sample Type: 1-channel pcm compressed
Sample Rate: 8000
Data Source(s): microphone speech
Application(s): speaker verification
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC94S16 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Campbell, Joseph, and Alan Higgins. YOHO Speaker Verification LDC94S16. Web Download. Philadelphia: Linguistic Data Consortium, 1994.
Related Works: View
The YOHO database contains a large scale, high-quality speech corpus to support text-dependent speaker authentication research, such as is used in secure access technology. The data was collected in 1989 by ITT under a US Government contract, but has not been available for public use before. Note that certain changes have been made to the corpus, mainly to insure the privacy of the speakers and some data has been withheld by the government for future use in testing.

YOHO contains:

  • Combination lock phrases (e.g. 36-24-36)
  • Collected over three-month period in a real-world office environment
  • Four enrollment sessions per subject with 24 phrases per session
  • Ten test sessions per subject with four phrases per session
  • 8kHz sampling with 3.8 kHz analog bandwidth
  • 1.5 gigabytes of data
The number of trials is thus sufficient to permit evaluation testing at high confidence levels. In each session, a speaker was prompted with a series of phrases to be read aloud each phrase was a sequence of three two-digit numbers (e.g. 35 - 72 - 41, pronounced thirty-five seventy-two forty-one). The first four sessions for a given speaker were enrollment sessions of 24 phrases and all additional sessions were verification trials of four phrases each. In all there are 552 enrollment sessions and 1,380 trial sessions, with a nominal time interval of three days between sessions.


An update is available that corrects a bug in the original release.

Available Media

View Fees

Login for the applicable fee