README FILE FOR: 2019 NIST SRE Audio-Visual Development and Test Sets LDC Catalog-ID: LDC2023V01 1. Introduction This package contains the SRE19 Audio-Visual Development and Test sets. The directory structure is organized as follows: docs/ README.txt this file 2019_nist_multimedia_speaker_recognition_evaluation_plan_v3.pdf dev/ lists, keys and annotations for each partition eval/ (see section 2. for details) data/ (see section 3.) dev/ video (mp4) files in the development set enrollment/ 52 files test/ 108 files eval/ video (mp4) files in the test set enrollment/ 149 files test/ 452 files 2. Overview of documentation The "dev" and "eval" directories under "./docs/" each contain six tab-delimited tables, as follows: 2.1 sre19_av_(dev|eval)_enrollment_boundingbox.tsv Col# header_label sample_value 1 segmentid engsptyj_sre19 2 target_type target 3 face_frame_sec 10 4 bounding_box 208,1,398,275 5 face_covered partface 6 eyewear yes 7 facial_hair beard 2.2 sre19_av_(dev|eval)_enrollment_diarization.tsv Col# header_label sample_value 1 segmentid trydllly_sre19 2 speaker_type target 3 start 0.35 4 end 6.10 2.3 sre19_av_(dev|eval)_enrollment.tsv Col# header_label sample_value 1 modelid 1976_sre19 2 segmentid engsptyj_sre19 3 side a 2.4 sre19_av_(dev|eval)_segment_key.tsv Col# header_label sample_value 1 segmentid engsptyj_sre19 2 subjectid 19 3 gender male 4 partition enrollment 2.5 sre19_av_(dev|eval)_trial_key.tsv Col# header_label sample_value 1 modelid 1976_sre19 2 segmentid afyuffzi_sre19 3 side a 4 targettype nontarget 2.6 sre19_av_(dev|eval)_trials.tsv Col# header_label sample_value 1 modelid 1976_sre19 2 segmentid afyuffzi_sre19 3 side a 3. Overview of video data All videos are encoded/compressed as "mp4" files. Audio tracks all have a sample rate of 44.1 KHz with AAC encoding; most are stereo, but some are mono. File durations range from 17.5 seconds to nearly 13 minutes. The audio can be extracted for processing using ffmpeg as follows: ffmpeg -v 8 -i -vn -ar 16000 -ac 1 -f wav Here, the audio is extracted at 16 kHz, but the videos do support higher/lower rates for processing if desired. Specific frames (in seconds) can be extracted from videos using ffmpeg as follows: ffmpeg -v 8 -ss -i -vframes 1 The following ffmpeg command can be used to extract a frame every second: ffmpeg -v 8 -i -vf fps=1 _%04d.png ------------- README originally created by Omid Sadjadi, July 19, 2019 adapted for general release by David Graff, July 1, 2021