Mixer 4 and 5 Speech
|Item Name:||Mixer 4 and 5 Speech|
|Author(s):||Linda Brandschain, Kevin Walker, David Graff, Christopher Cieri, Abby Neely, Nikki Mirghafori, Barbara Peskin, Jack Godfrey, Stephanie Strassel, Fred Goodman, George R. Doddington, Mike King|
|LDC Catalog No.:||LDC2020S03|
|Release Date:||March 13, 2020|
|Data Source(s):||telephone conversations, microphone conversation|
|Project(s):||MIXER, NIST SRE|
|Online Documentation:||LDC2020S03 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Brandschain, Linda, et al. Mixer 4 and 5 Speech LDC2020S03. Hard Drive. Philadelphia: Linguistic Data Consortium, 2020.|
Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct speakers. The material was collected in 2007 as part of the Mixer project and recordings in this corpus were used in the 2008 NIST Speaker Recognition Evaluation (SRE).
The data in this release was collected in 2007 by LDC at its Human Subjects Data Collection Laboratories in Philadelphia and by the International Computer Science Institute (ICSI) at the University of California, Berkeley. The Mixer 4 and Mixer 5 collections were conducted simultaneously, as a collaborative, carefully coordinated activity at both recording sites.
The telephone protocol connected recruited speakers through a robot operator to carry on casual conversations. In Mixer 4, 400 subjects made ten 10-minute calls; half of those subjects also visited one of the collection sites where they made two telephone calls while also being recorded on a cross-channel platform. In Mixer 5, 300 subjects each completed ten calls and six interview sessions at either LDC or ICSI; those sessions were conducted on a cross channel platform and included a telephone call in one of three vocal-effort conditions - normal, high and low. Mixer participants were nearly all native English speakers, the rest being bilingual English speakers.
Researchers interested in applying NIST 2008 SRE benchmark test sets should consult the respective NIST Evaluation Plans for guidelines on allowable training data for those tests. Training, evaluation and supplemental data from 2008 SRE are available in the LDC Catalog: 2008 NIST Speaker Recognition Evaluation Training Set Part 1 (LDC2011S05), 2008 NIST Speaker Recognition Evaluation Training Set Part 2 (LDC2011S07), 2008 NIST Speaker Recognition Evaluation Test Set (LDC2011S08) and 2008 NIST Speaker Recognition Evaluation Supplemental Set (LDC2011S11).
The Mixer 4 and 5 collection contains 2,568 recordings made via the public telephone network and 2,152 sessions of multiple microphone recordings in office-room settings. The telephone recordings are presented as 8-KHz 2-channel NIST SPHERE files, and the microphone recordings are 16-KHz 1-channel flac/ms-wav files.
When the microphone recording flac files are uncompressed, they become ms-wav/RIFF files (flac compression does not presently support SPHERE file format).
The telephone audio is presented in SPHERE format because this is consistent with other LDC telephone audio releases and because flac does not support ulaw sample encoding. The open-source SoX utility is able to handle both formats as input. Other utilities are available for flac and SPHERE formats.
Metadata about the calls and speakers is also included in this release, along with time-aligned entries for many of the component portions of the recording sessions.
None at this time.