Switchboard-2 Phase I


Item Name: Switchboard-2 Phase I
Authors: David Graff, Alexandra Canavan, and George Zipperlen
LDC Catalog No.: LDC98S75
ISBN: 1-58563-138-8
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 2-channel ulaw
Data Source(s): telephone conversations
Project(s): EARS, GALE, SID
Application(s): speaker identification
Language(s): English
Language ID(s): eng
Distribution: 4 DVD
Member fee: $0 for 1998 members
Non-member Fee: US $7500.00
Reduced-License Fee: US $3750.00
Extra-Copy Fee: US $800.00
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: David Graff, Alexandra Canavan, and George Zipperlen
1998
Switchboard-2 Phase I
Linguistic Data Consortium, Philadelphia

Introduction

Switchboard-2 Phase I consists of 3,638 5-minute telephone conversations involving 657 participants. This corpus was collected by the Linguistic Data Consortium (LDC), in support of a project on Speaker Recognition sponsored by the U.S. Department of Defense. This release consists of speech files only; these calls were not transcribed.

Data

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements) and personal contacts. Potential participants responded from all areas of the United States, although the majority of the subjects were from the Mid-Atlantic area: (PA=303), (NJ=116), (NY=53), (DE=13), (CT=12), (MD=14), (OH=13) and (MA=8). Most of the participants in SWB-2 Phase I were college students from the following universities: Penn State University, University of Delaware, University of Pennsylvania, Drexel University and Rutgers University. Of the 657 participants, 358 were female and 299 were male. An LDC recruiter asked all participants for the following demographic information: age, sex, years of completed education, country of birth, city and state where raised.

Each recruit was asked to participate in at least ten five-minute phone calls. Ideally each participant would receive five calls at a designated number and make five calls from phones with different telephone numbers (ANI codes). The average subject participated in 11 conversations; however, one gentleman participated in 64 calls. A suggested topic of discussion was given (read by the automated operator), although participants could chat about whatever they preferred.

Each of the 657 participants placed their calls via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project.

Upon conclusion of the study all calls were audited by LDC staff members. Particular attention was paid to PIN verification (matching speaker with PIN), checking call duration and call quality. Upon completion of this process checks were issued and mailed to participants.

Updates

09/29/2011: Added a file list, available through online docs, to reflect it's release on DVD. Also, an updated readme reflecting these changes.

Copyright

Portions 1998 Trustees of the University of Pennsylvania