This is the CD-ROM release of the Switchboard-2 Phase II Speech Corpus produced by the Linguistic Data Consortium. This release contains speech data files ONLY, along with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). Each speech file consists of a 1024-byte ASCII-formatted Sphere header, followed by 2-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by an InterVoice RobotOperator voice-response system. The header also indicates the caller_pin, callee_pin, and the topic_id. These files are not compressed. The speech files are named according to the following pattern: sw_NNNNN.sph where the five-digit string "NNNNN" represents the conversation-id; this string is used to identify all speech files and to identify the calls in the associated database tables that provide information about the calls and participants (i.e. callstat.tbl, master.tbl). In the root directory of each Switchboard-2 Phase II CD-ROM, you will find: README.1st (this file) master.tbl full listing of all speech files (see below) disc_NN.tbl listing of speech files on this disc /doc/swb2p2_all.dvd.tbl full listing of all speech files and their DVD. /doc/callinfo.doc description of auditing process /doc/callinfo.tbl audit results for each channel /doc/callstat.doc field description for callstat.tbl /doc/callstat.tbl information about recorded call /doc/spkrinfo.doc field description for spkrinfo.tbl /doc/spkrinfo.tbl demographic information /doc/swbinfo.doc description of recruitment and collection /doc/topic.tbl suggested topic list /swb2 directory containing speech files The "master.tbl" file on each disc of this corpus lists every speech file in the corpus, giving the CD-ROM volume where the file is stored, the 8.3-character file name, and a longer string showing the date of the call, the caller-ID and the callee-ID, separated by underscores. The "disc_NN.tbl" file simply lists the set of speech file names found on the current CD-ROM. The "swb2p2_all.dvd.tbl" file lists all the files and their contained DVD instead of CD.