2016 NIST Speaker Recognition Evaluation Test Set Authors: Craig Greenberg, Omid Sadjadi, Timothee Kheyrkhah (NIST) Karen Jones, Kevin Walker, Stephanie Strassel, David Graff (LDC) DESCRIPTION The 2016 NIST Speaker Recognition Evaluation Test Set was developed by LDC and NIST (National Institute of Standards and Technology). The evaluation data consists of short segments of telephone speech from the Call My Net 2015 Speech Collection which was built by LDC. NIST SRE is part of an ongoing series of evaluations which are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. DATA SOURCES The data in this release is drawn from the Call My Net 2015 (CMN15) Corpus collected by LDC. The corpus covers four languages and comprises multiple calls made by 220 unique speakers. Native speakers of Tagalog, Cantonese, Cebuano or Mandarin made a total of 10 calls each, talking to people within their existing social networks. Speakers were encouraged to use different telephone instruments in a variety of acoustic settings, and were instructed to talk for 8-10 minutes on a topic of their own choosing. All conversations were collected outside of North America. Collected data was encoded as a-law sampled at 8kHz in SPHERE formatted files. DIRECTORY STRUCTURE /docs/README.txt - this file /docs - contains SRE16 evaluation plan and Interspeech paper describing the corpus /data /dev/R148_0_0 /data a-law encoded segments in the development and unlabeled training sets /enrollment enrollment segments with each segment containing approximately 60 secs of speech (120 segments) /test test segments with each segment containing approximately 10 - 60 secs of speech (1207 files) /unlabeled unlabeled segments (for training) with each segment contains approximately 10 - 60 secs of speech. No information is given for these segments except if they belong in the major or minor language category (2472 files) /docs trial list and associated keys /metadata tables containing metadata information about the calls from which the enrollment and test segments were derived /LDC_README.txt explains the columns of the provided metadata tables /eval/R149_0_1 /data a-law encoded segments in the evaluation set /enrollment enrollment segments with each segment containing approximately 60 seconds of speech (1202 files) /test test segments with each segment containing approximately 10 - 60 seconds of speech (9294 files) /docs model_id to segment maps, as well as a trial list and associated keys /metadata tables containing metadata information about the calls from which the enrollment and test segments were derived (using the same column format as the dev metadata) -- README Updated June 24, 2019