Description of the Switchboard-2 Phase I telephone speech corpus _________________________________________________________________ May, 1998 Project Leader: David Graff Programming: George Zipperlin Zhibiao Wu Personnel: Alexandra Canavan Recruiters: Elisa Munoz-Franco Liz O'Connor Kara Rennert Yuan Shan Tung _________________________________________________________________ Switchboard-2 Phase I consists of 3,638 5-minute telephone conversations involving 657 participants. This corpus was collected by the Linguistic Data Consortium, (LDC) in support of a project on Speaker Recognition sponsored by the U.S. Department of Defense. Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements), and personal contacts. Potential participants responded from all areas of the United States, although the majority of the subjects were from the Mid-Atlantic area: (PA=303), (NJ=116), (NY=53), (DE=13), (CT=12), (MD=14), (OH=13), and (MA=8). Most of the participants in SWB-2 Phase I were college students from the following universities: Penn State University, University of Delaware, University of Pennsylvania, Drexel University, and Rutgers University. Of the 657 participants, 358 were female and 299 were male. An LDC recruiter asked all participants for the following demographic information: age, sex, years of completed education, country of birth, and city and state where raised. You will find this information in "spkrinfo.tbl." Each recruit was asked to participate in at least 10, 5-minute phone calls. Ideally each participant would receive 5 calls at a designated number and make 5 calls from phones with different telephone numbers (ANI codes). The average subject participated in 11 conversations; however, one gentleman participated in 64 calls. A suggested topic of discussion was given (read by the automated operator), although participants could chat about whatever they preferred. The file "topics.tbl" list the topics selected by the automated operator. Each of the 657 participants placed their calls via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. Upon conclusion of the study all calls were audited by LDC staff members. Particular attention was paid to PIN verification (matching speaker with PIN), checking call duration, and call quality. Upon completion this process checks were issued and mailed to participants.