Mixer 7 Spanish Speech

Item Name: Mixer 7 Spanish Speech
Author(s): Linda Brandschain, Kevin Walker, David Graff
LDC Catalog No.: LDC2023S04
ISLRN: 178-136-879-934-4
DOI: https://doi.org/10.35111/rvd7-7107
Release Date: July 17, 2023
Member Year(s): 2023
DCMI Type(s): Sound
Sample Type: pcm, ulaw
Sample Rate: 16000, 8000
Data Source(s): microphone conversation, microphone speech, telephone conversations
Project(s): MIXER, NIST SRE
Application(s): speaker identification
Language(s): Spanish, English
Language ID(s): spa, eng
Online Documentation: LDC2023S04 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Brandschain, Linda, Kevin Walker, and David Graff. Mixer 7 Spanish Speech LDC2023S04. Web Download. Philadelphia: Linguistic Data Consortium, 2023.
Related Works: View

Introduction

Mixer 7 Spanish Speech (LDC2023S04) was developed by the Linguistic Data Consortium (LDC) and contains 9,600 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 191 distinct native Spanish speakers. This material was collected by LDC in 2011 and 2012 as part of the Mixer project. The recordings in this corpus were used in the 2012 NIST Speaker Recognition Evaluation test set.

The speech data in this release was collected by LDC at its Human Subjects Data Collection Laboratories in Philadelphia. The telephone collection protocol was similar to other LDC Mixer collections: recruited speakers were connected through a robot operator to carry on casual conversations lasting up to 10 minutes, usually about a daily topic announced by the robot operator at the start of the call. The raw digital audio content for each call side was captured as a separate channel, and each full conversation is presented as a 2-channel interleaved audio file, with 8000 samples/second and u-law sample encoding. Each speaker was asked to complete 15 calls.

The multi-microphone portion of the collection utilized 14 distinct microphones installed identically in two multi-channel audio recording rooms at LDC. Each session was guided by collection staff using prompting and recording software to conduct the following activities: (1) repeat questions (less than one minute); (2) informal conversation (typically 15 minutes); (3) transcript reading (15 minutes); and (4) up to three telephone calls under varying conditions  (10 minutes). The 14 channels were recorded synchronously into separate single-channel files, using 16-bit PCM sample encoding at 16000 samples/second.

Certain demographic information about the speakers was collected, including date of birth, level of education, native language, other language capability, place of birth, place of residence and occupation. 

Data

The collection contains 2,583 recordings made via the public telephone network and 678 sessions of multiple microphone recordings in office room settings. The telephone recordings are presented as 8-KHz 2-channel NIST SPHERE files, and the microphone recordings are 16-KHz 1-channel flac/ms-wav files.

When the flac files are uncompressed, they become ms-wav/RIFF files (flac compression does not presently support SPHERE file format). The telephone audio is presented in SPHERE format because (a) this is consistent with other telephone audio releases from LDC, and (b) flac does not support ulaw sample encoding. The current release of the open-source SoX utility is able to handle both formats as input. Other utilities are available for both flac and SPHERE formats.

Samples

SPH file

FLAC file

Updates

None at this time.

Additional Licensing Instructions

This members-only corpus is available to current members. Contact ldc@ldc.upenn.edu for information about becoming a member.

Available Media

View Fees





Login for the applicable fee