LDC2020S04 2018 NIST Speaker Recognition Evaluation Test Set Authors: Craig Greenberg, Omid Sadjadi (NIST) Elliot Singer (MIT-LL) Kevin Walker, Karen Jones, Jonathan Wright, Stephanie Strassel (LDC) 1.0 DESCRIPTION The 2018 NIST Speaker Recognition Evaluation Test Set was developed by LDC and NIST (National Institute of Standards and Technology). The evaluation data is derived from two distinct sources: (1) Tunisian Arabic telephone speech, including Voice over Internet Protocol (VoIP), and (2) English speech as found in a sampling of YouTube videos (Audio from Video, or AfV). NIST SRE is part of an ongoing series of evaluations conducted by NIST. These evaluations are an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. 2.0 DATA SOURCES The speech recordings in this release were collected as part of the Call My Net 2(CMN2) and Video Annotation for Speech Technology (VAST) corpora. For the CMN2 speech collection recruited participants called friends or relatives who agreed to have their conversations recorded. These telephone conversations lasted between 8-10 minutes. The telephone data speech segments in this release consist of PSTN and VoIP data collected in Tunisia and are presented as 8 bit a-law with a sample rate of 8000. The VAST data are composed of audio recordings originating from YouTube and are presented as 16 bit FLAC files sampled at 44 kHz. 3.0 DIRECTORY STRUCTURE In addition to evaluation data, this package also consists of answer keys, trial and train files, development data and evaluation documentation. The directory structure is as follows: /data /dev /enrollment /test /unlabeled /eval /enrollment /test /docs /sre18_dev_enrollment_diarization.tsv /sre18_dev_enrollment.tsv /sre18_dev_segment_key.tsv /sre18_dev_trial_key.tsv /sre18_dev_trials.tsv /sre18_eval_enrollment_diarization.tsv /sre18_eval_enrollment.tsv /sre18_eval_plan_2018-05-31_v6.pdf /sre18_eval_segment_key.tsv /sre18_eval_trial_key.tsv /sre18_eval_trials.tsv /README.txt (this file). 3.1 Audio Data A file count of audio data is given below: /dev /enrollment 185 files 0.25 GB /test 1593 files 0.99 GB /unlabeled 2332 files 1.6 GB /eval /enrollment 1417 files 2.05 GB /test 12450 files 8.34 GB 3.2 Metadata and /docs Files The two enrollment tab files, called sre18_{dev|eval}_enrollment.tsv, list each segment (denoted by segmentid) provided to build a model for each target speaker (denoted by modelid). Information about recording channel (side) is also provided. For SRE18 there are two enrollment conditions: either 1 segment or 3 segments are provided to build the speaker model. Example rows in sre18_dev_enrollment.tsv are given below: modelid segmentid side 1001_sre18 dlrdnskt_sre18.sph a 1002_sre18 yeamhnnt_sre18.sph a 1003_sre18 svrkvkai_sre18.sph a 1004_sre18 izynpvhs_sre18.sph a 1004_sre18 oinujqaw_sre18.sph a 1004_sre18 rjjcffcm_sre18.sph a The two trial files, called sre18_{dev|eval}_trials.tsv, list the enrollment speech segments from specific target speakers alongside each test segment and information about channel. e.g. modelid segmentid side 1126_sre18 yqxltsco_sre18.sph a The two diarization tab files, called sre18_{dev|eval}_enrollment_diarization.tsv, are included in the release since each VAST recording may include speech from more than one speaker. These diarization tables provide speaker time marks for the dev and eval enrollment segments. e.g. segmentid speaker_type start end boeemuji_sre18.flac target 5.20 13.44 The two tab files, called sre18_{dev|eval}_segment_key.tsv provide metadata about each speech segment including, segmentid, subjectid (LDC subject ID), gender (male/female), partition (enrollment, test, unlabeled), phone_number (anonymized phone number), speech_duration (duration of segment in seconds), data_source (cmn2/vast). e.g. segmentid zztnmqej_sre18.sph subjectid 132608 gender female partition enrollment phone_number 2490gcq speech_duration 61.8 data_source cmn2 The two trial key tab files called sre18_{dev|eval}_trial_key.tsv reveal for each trial whether the target segment was target or non-target. Information about channel, the number of enrollment segments, whether there is a phone number match, gender, data source (cmn2/vast) and source type information (voip, pstn, afv) is also provided. An example record from sre18_dev_trial_key is provided below: modelid 1001_sre18 segmentid aadxhatk_sre18.sph side a targettype nontarget num_enroll_segs 1 phone_num_match N gender male source_type pstn data_source cmn2 More information about the SRE18 evaluation can be found in the NIST 2018 Speaker Recognition Evaluation Plan in /docs/sre18_eval_plan_2018-05-31_v6.pdf. README created on December 23, 2019 by Karen Jones