Switchboard-2 Phase I

Item Name: Switchboard-2 Phase I
Author(s): David Graff, Alexandra Canavan, George Zipperlen
LDC Catalog No.: LDC98S75
ISBN: 1-58563-138-8
ISLRN: 818-666-043-021-8
DOI: https://doi.org/10.35111/c7th-nf28
Member Year(s): 1998
DCMI Type(s): Sound
Sample Type: 2-channel ulaw
Sample Rate: 8000
Data Source(s): telephone conversations
Application(s): speaker identification
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC98S75 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Graff, David, Alexandra Canavan, and George Zipperlen. Switchboard-2 Phase I LDC98S75. Web Download. Philadelphia: Linguistic Data Consortium, 1998.
Related Works: View


Switchboard-2 Phase I consists of 3,638 5-minute telephone conversations involving 657 participants. This corpus was collected by the Linguistic Data Consortium (LDC), in support of a project on Speaker Recognition sponsored by the U.S. Department of Defense. This release consists of speech files only; these calls were not transcribed.


Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. Potential participants responded from all areas of the United States, although the majority of the subjects were from the Mid-Atlantic area: (PA=303), (NJ=116), (NY=53), (DE=13), (CT=12), (MD=14), (OH=13) and (MA=8). Most of the participants in SWB-2 Phase I were college students from the following universities: Penn State University, University of Delaware, University of Pennsylvania, Drexel University and Rutgers University. Of the 657 participants, 358 were female and 299 were male. An LDC recruiter asked all participants for the following demographic information: age, sex, years of completed education, country of birth, city and state where raised.

Each recruit was asked to participate in at least ten five-minute phone calls. Ideally each participant would receive five calls at a designated number and make five calls from phones with different telephone numbers (ANI codes). The average subject participated in 11 conversations; however, one gentleman participated in 64 calls. A suggested topic of discussion was given (read by the automated operator), although participants could chat about whatever they preferred.

Each of the 657 participants placed their calls via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project.

Upon conclusion of the study all calls were audited by LDC staff members. Particular attention was paid to PIN verification (matching speaker with PIN), checking call duration and call quality. Upon completion of this process checks were issued and mailed to participants.


09/29/2011: A file list and updated readme were added to reflect the data set's release on DVD.

Available Media

View Fees

Login for the applicable fee