Home › Language Resources › Data

Switchboard-2 Phase III Audio

Item Name:	Switchboard-2 Phase III Audio
Author(s):	David Graff, David Miller, Kevin Walker
LDC Catalog No.:	LDC2002S06
ISBN:	1-58563-222-8
ISLRN:	603-855-311-336-8
DOI:	https://doi.org/10.35111/ydsv-hw57
Release Date:	March 20, 2002
Member Year(s):	2002
DCMI Type(s):	Sound
Sample Type:	2-channel ulaw
Sample Rate:	8000
Data Source(s):	telephone speech
Project(s):	SID, GALE, EARS, NIST SRE
Application(s):	speaker identification
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2002S06 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Graff, David, David Miller, and Kevin Walker. Switchboard-2 Phase III Audio LDC2002S06. Web Download. Philadelphia: Linguistic Data Consortium, 2002.
Related Works: Hide	View hasOutcome LDC99S81 1999 Speaker Recognition Benchmark LDC2002S13 2001 HUB5 English Evaluation LDC2004S04 2002 NIST Speaker Recognition Evaluation isContinuationOf LDC98S75 Switchboard-2 Phase I LDC99S79 Switchboard-2 Phase II hasContinuation LDC2013S05 Greybeard isSimilarWith LDC93S8 Switchboard Credit Card LDC97S62 Switchboard-1 Release 2 LDC2001S13 Switchboard Cellular Part 1 Audio LDC2001S15 Switchboard Cellular Part 1 Transcribed Audio LDC2004S07 Switchboard Cellular Part 2 Audio LDC2013S03 Mixer 6 Speech relatesTo LDC2010S03 2003 NIST Speaker Recognition Evaluation

Introduction

The Switchboard-2 Phase III Audio corpus was produced by the Linguistic Data Consortium; catalog number LDC2002S06 and ISBN number 1-58563-222-8. This release contains speech data files ONLY, along with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). The data files are not compressed.

The Switchboard-2 Phase III collection was focused primarily in the American South. The collection commenced on October 21, 1997 and was completed on January 1, 1998. The project's goal was to target native speakers of English in the American South, balanced by gender, to participate in (10+) five to six minute conversations on a variety of telephone (land line) handsets.

Data

The speech data was collected for research, development, and evaluation of automatic systems for speech-to-text conversion, talker identification, language identification and speech signal detection purposes.

During the collection period, the LDC collected a total of 2,728 calls, or 5,456 sides, from 640 participants (292 Male, 348 Female), under varied environmental conditions.

Each speech file consists of a 1,024-byte ASCII-formatted Sphere header, followed by two-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by the LDC's telephone collection platform. The header also indicates the caller_pin, callee_pin, topic_id.

The speech files are named according to the following pattern:

sw_NNNNN.sph

where the five-digit string "NNNNN" represents the conversation-id; this string is used to identify all speech files and to identify the calls in the associated data base tables that provide information about the calls and participants (i.e. callstat.tbl, master.tbl).

Other documentation files available on the publication are:

0readme.1st	Field information for all database tables
swb_callaudit.tbl	Audit results for each channel
swb_callaudit.txt	Document describing audit table
swb_callstats.tbl	Information about recorded calls
swb_callstats.txt	Document describing callstats table
swb_callsubjects.tbl	Demographic information
swb_callsubjects.txt	Document describing callsubjects table
topics.txt	List of proposed call topics

There are a total of 2,657 data files (=~ 222 hours of audio)

Updates

No updates are available at this time.

Switchboard-2 Phase III Audio

Introduction

Data

Updates

Copyright

Available Media

View Fees