Home › Language Resources › Data

Mixer 4 and 5 Speech

Item Name:	Mixer 4 and 5 Speech
Author(s):	Linda Brandschain, Kevin Walker, David Graff, Christopher Cieri, Abby Neely, Nikki Mirghafori, Barbara Peskin, Jack Godfrey, Stephanie Strassel, Fred Goodman, George R. Doddington, Mike King
LDC Catalog No.:	LDC2020S03
ISBN:	1-58563-922-2
ISLRN:	102-906-715-140-9
DOI:	https://doi.org/10.35111/xq98-yj91
Release Date:	March 13, 2020
Member Year(s):	2020
DCMI Type(s):	Sound
Sample Type:	pcm
Sample Rate:	16000
Data Source(s):	microphone conversation, telephone conversations
Project(s):	MIXER, NIST SRE
Application(s):	speaker identification
Language(s):	English
Language ID(s):	eng
Online Documentation:	LDC2020S03 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Brandschain, Linda, et al. Mixer 4 and 5 Speech LDC2020S03. Web Download. Philadelphia: Linguistic Data Consortium, 2020.
Related Works: Hide	View hasOutcome LDC2011S05 2008 NIST Speaker Recognition Evaluation Training Set Part 1 LDC2011S07 2008 NIST Speaker Recognition Evaluation Training Set Part 2 LDC2011S08 2008 NIST Speaker Recognition Evaluation Test Set LDC2011S11 2008 NIST Speaker Recognition Evaluation Supplemental Set isContinuationOf LDC2023S02 Mixer 3 Speech hasContinuation LDC2013S03 Mixer 6 Speech relatesTo LDC2023S09 REMIX Telephone Collection

Description

Introduction

Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct speakers. The material was collected in 2007 as part of the Mixer project and recordings in this corpus were used in the 2008 NIST Speaker Recognition Evaluation (SRE).

The data in this release was collected in 2007 by LDC at its Human Subjects Data Collection Laboratories in Philadelphia and by the International Computer Science Institute (ICSI) at the University of California, Berkeley. The Mixer 4 and Mixer 5 collections were conducted simultaneously, as a collaborative, carefully coordinated activity at both recording sites.

The telephone protocol connected recruited speakers through a robot operator to carry on casual conversations. In Mixer 4, 400 subjects made ten 10-minute calls; half of those subjects also visited one of the collection sites where they made two telephone calls while also being recorded on a cross-channel platform. In Mixer 5, 300 subjects each completed ten calls and six interview sessions at either LDC or ICSI; those sessions were conducted on a cross channel platform and included a telephone call in one of three vocal-effort conditions - normal, high and low. Mixer participants were nearly all native English speakers, the rest being bilingual English speakers.

Researchers interested in applying NIST 2008 SRE benchmark test sets should consult the respective NIST Evaluation Plans for guidelines on allowable training data for those tests. Training, evaluation and supplemental data from 2008 SRE are available in the LDC Catalog: 2008 NIST Speaker Recognition Evaluation Training Set Part 1 (LDC2011S05), 2008 NIST Speaker Recognition Evaluation Training Set Part 2 (LDC2011S07), 2008 NIST Speaker Recognition Evaluation Test Set (LDC2011S08) and 2008 NIST Speaker Recognition Evaluation Supplemental Set (LDC2011S11).

Data

The Mixer 4 and 5 collection contains 2,568 recordings made via the public telephone network and 2,152 sessions of multiple microphone recordings in office-room settings. The telephone recordings are presented as 8-KHz 2-channel NIST SPHERE files, and the microphone recordings are 16-KHz 1-channel flac/ms-wav files.

When the microphone recording flac files are uncompressed, they become ms-wav/RIFF files (flac compression does not presently support SPHERE file format).

The telephone audio is presented in SPHERE format because this is consistent with other LDC telephone audio releases and because flac does not support ulaw sample encoding. The open-source SoX utility is able to handle both formats as input. Other utilities are available for flac and SPHERE formats.

Metadata about the calls and speakers is also included in this release, along with time-aligned entries for many of the component portions of the recording sessions.

Samples

Please listen to this telephone sample (SPH) and microphone sample (FLAC).

Updates

None at this time.

Additional Licensing Instructions

This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.