Home › Language Resources › Data

Switchboard Cellular Part 2 Audio

Item Name:	Switchboard Cellular Part 2 Audio
Author(s):	David Graff, Kevin Walker, David Miller
LDC Catalog No.:	LDC2004S07
ISBN:	1-58563-297-x
ISLRN:	047-363-770-147-0
DOI:	https://doi.org/10.35111/mgp6-4j96
Release Date:	October 26, 2004
Member Year(s):	2004
DCMI Type(s):	Sound
Sample Type:	2-channel ulaw
Sample Rate:	8000
Data Source(s):	telephone conversations
Project(s):	EARS, GALE, NIST SRE, SID
Application(s):	language identification, speaker identification
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2004S07 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Graff, David, Kevin Walker, and David Miller. Switchboard Cellular Part 2 Audio LDC2004S07. Web Download. Philadelphia: Linguistic Data Consortium, 2004.
Related Works: Hide	View isPartWith LDC2001S13 Switchboard Cellular Part 1 Audio LDC2001S15 Switchboard Cellular Part 1 Transcribed Audio LDC2001T14 Switchboard Cellular Part 1 Transcription hasOutcome LDC2002S13 2001 HUB5 English Evaluation LDC2004S04 2002 NIST Speaker Recognition Evaluation LDC2010S03 2003 NIST Speaker Recognition Evaluation isSimilarWith LDC93S8 Switchboard Credit Card LDC98S75 Switchboard-2 Phase I LDC99S79 Switchboard-2 Phase II LDC2002S06 Switchboard-2 Phase III Audio

Introduction

Switchboard Cellular Part 2 Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 200 hours of English telephone conversations collected by LDC in 2000. The Switchboard cellular collection focused primarily on cellular phone technology of all service types. The goal was to target 200 subjects balanced by gender to participate in 10 or more five- to six-minute conversations on cellular phones. The speech data was collected for research, development, and evaluation of automatic systems for speech-to-text conversion, speaker identification, language identification, and speech signal detection purposes.

Data

During the study period, LDC collected a total of 2,020 calls, or 4,040 sides (2,950 cellular). Here is a gender breakdown of the participant pool and call sides collected:

Gender	Participants	Sides
Female	250	2,405
Male	169	1,635
Total	419	4,040

Each speech file consists of a 1,024-byte ASCII-formatted Sphere header, followed by two-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by LDC's telephone collection platform. The header also indicates the caller_pin, callee_pin, topic_id, cellular service/handset information and speaker demographic information. The data files are not compressed.

This release contains speech data files with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). The documentation also contains reports on clipped files.

Other releases in this series include:

Sample

Please examine this example audio file to review a sample of this corpus.

Updates

There are no updates available at this time.

Switchboard Cellular Part 2 Audio

Introduction

Data

Sample

Updates

Copyright

Available Media

View Fees