LDC2020S04 2018 NIST Speaker Recognition Evaluation Test Set

Authors: 
Craig Greenberg, Omid Sadjadi (NIST) 
Elliot Singer (MIT-LL)
Kevin Walker, Karen Jones, Jonathan Wright, Stephanie Strassel (LDC)


1.0 DESCRIPTION 

The 2018 NIST Speaker Recognition Evaluation Test Set was 
developed by LDC and NIST (National Institute of Standards and 
Technology). The evaluation data is derived from two distinct 
sources: (1) Tunisian Arabic telephone speech, including Voice 
over Internet Protocol (VoIP), and (2) English speech as found in 
a sampling of YouTube videos (Audio from Video, or AfV).

NIST SRE is part of an ongoing series of evaluations conducted by 
NIST. These evaluations are an important contribution to the 
direction of research efforts and the calibration of technical 
capabilities. They are intended to be of interest to all 
researchers working on the general problem of text independent 
speaker recognition.

2.0 DATA SOURCES

The speech recordings in this release were collected as part of 
the Call My Net 2(CMN2) and Video Annotation for Speech 
Technology (VAST) corpora. For the CMN2 speech collection 
recruited participants called friends or relatives who agreed to 
have their conversations recorded. These telephone conversations 
lasted between 8-10 minutes. The telephone data speech segments 
in this release consist of PSTN and VoIP data collected in 
Tunisia and are presented as 8 bit a-law with a sample rate of 
8000.

The VAST data are composed of audio recordings originating 
from YouTube and are presented as 16 bit FLAC files sampled 
at 44 kHz.

3.0 DIRECTORY STRUCTURE

In addition to evaluation data, this package also consists of 
answer keys, trial and train files, development data and 
evaluation documentation.  The directory structure is as follows:

/data
	/dev
		/enrollment
		/test
		/unlabeled

	/eval
		/enrollment
		/test
/docs
	/sre18_dev_enrollment_diarization.tsv
	/sre18_dev_enrollment.tsv
	/sre18_dev_segment_key.tsv
	/sre18_dev_trial_key.tsv
	/sre18_dev_trials.tsv
	/sre18_eval_enrollment_diarization.tsv
	/sre18_eval_enrollment.tsv
	/sre18_eval_plan_2018-05-31_v6.pdf
	/sre18_eval_segment_key.tsv
	/sre18_eval_trial_key.tsv
	/sre18_eval_trials.tsv
	/README.txt (this file).

3.1 Audio Data
A file count of audio data is given below:

/dev
	/enrollment	185 files	0.25 GB	
	/test		1593 files	0.99 GB
	/unlabeled	2332 files	1.6 GB

/eval
	/enrollment	1417 files	2.05 GB
	/test		12450 files	8.34 GB

3.2 Metadata and /docs Files

The two enrollment tab files, called 
sre18_{dev|eval}_enrollment.tsv, list each segment (denoted by 
segmentid) provided to build a model for each target speaker 
(denoted by modelid).  Information about recording channel 
(side) is also provided. For SRE18 there are two enrollment 
conditions: either 1 segment or 3 segments are provided to build 
the speaker model. Example rows in sre18_dev_enrollment.tsv are 
given below:

modelid		segmentid		side
1001_sre18	dlrdnskt_sre18.sph	a
1002_sre18	yeamhnnt_sre18.sph	a
1003_sre18	svrkvkai_sre18.sph	a
1004_sre18	izynpvhs_sre18.sph	a
1004_sre18	oinujqaw_sre18.sph	a
1004_sre18	rjjcffcm_sre18.sph	a


The two trial files, called sre18_{dev|eval}_trials.tsv, list the 
enrollment speech segments from specific target speakers 
alongside each test segment and information about 
channel. e.g.

modelid		segmentid		side
1126_sre18	yqxltsco_sre18.sph	a

The two diarization tab files, called 
sre18_{dev|eval}_enrollment_diarization.tsv, are included in the 
release since each VAST recording may include speech from more 
than one speaker.  These diarization tables provide speaker time 
marks for the dev and eval enrollment segments. e.g.

segmentid	       speaker_type  	 start   end
boeemuji_sre18.flac    target		 5.20    13.44

The two tab files, called sre18_{dev|eval}_segment_key.tsv 
provide metadata about each speech segment including, segmentid, 
subjectid (LDC subject ID), gender (male/female), partition (enrollment, 
test, unlabeled), phone_number (anonymized phone number), 
speech_duration (duration of segment in seconds), data_source 
(cmn2/vast). e.g.

segmentid	zztnmqej_sre18.sph
subjectid	132608
gender		female
partition	enrollment
phone_number	2490gcq
speech_duration	61.8
data_source	cmn2

The two trial key tab files called sre18_{dev|eval}_trial_key.tsv 
reveal for each trial whether the target segment was target or 
non-target. Information about channel, the number of enrollment 
segments, whether there is a phone number match, gender, data 
source (cmn2/vast) and source type information (voip, pstn, afv) 
is also provided. An example record from sre18_dev_trial_key is 
provided below:

modelid		1001_sre18
segmentid	aadxhatk_sre18.sph
side		a
targettype	nontarget
num_enroll_segs 1
phone_num_match N
gender		male
source_type	pstn
data_source	cmn2

More information about the SRE18 evaluation can be found in the
NIST 2018 Speaker Recognition Evaluation Plan in
/docs/sre18_eval_plan_2018-05-31_v6.pdf.

README created on December 23, 2019 by Karen Jones