The purpose of the SPeaker IDentification REsearch (SPIDRE) corpus is to provide a "starter kit" for research in the area of speaker identification. The data in the SPIDRE corpus has been drawn from the much larger Switchboard (SWB1) corpus in order to create a manageable data set for speaker identification research. The SPIDRE corpus data has also been selected to maximize its utility in this domain. The SPIDRE corpus contains 280 conversations, 180 of which contain at least one speaker who has been deemed to be a "target" speaker. The remaining 100 conversations contain only "non-target" speakers. The corpus contains 45 target speakers and 287 non-target speakers. Of the 287 non-target speakers, 161 are in a non-target conversation and the remaining 126 are speaking to a target speaker in a target conversation. The specific design and selection criteria used in forming the corpus are described below. SPIDRE SELECTION CRITERIA: -------------------------- Target Speakers --------------- 180 target conversations (92-disc 1/88-disc 2) were selected from SWB1 according to the following criteria: 1) Speakers must have participated in at least 4 calls. 2) Speakers must have used at least 3 different handsets. 3) If the speaker participated in more then 4 calls, 4 calls were selected somewhat randomly so that exactly 3 handsets were represented. Thus, one of the handsets is represented in two of the conversations. Target speakers were distributed on the two discs so as to balance the representation of gender, age, and dialect region on both discs evenly. Non-Target Speakers ------------------- 100 non-target conversations (50 on each disc) were selected from SWB1 according to the following criteria: 1) Speakers must not be involved in either side of the 180 target conversations in the SPIDRE corpus. 2) Each speaker must have at least 60 seconds of speech in the first 210 seconds of the conversation (balanced conversations). 3) Total length of the waveform file must be as close to 5 minutes as possible in order to insure that the conversations were representative of both speakers and so that the corpus could be contained on 2 discs. NOTES ----- 1) 13 of the 180 target conversations contain target speakers on both channels (8-disc 1/ 5-disc 2), therefore the same conversation will exist under different speakers. 2) Cases where a non-target speaker participated in more then one conversation with another non-target speaker could not be eliminated. In the cases where a non-target speaker participates in more then one conversation, the conversations were divided between the two discs in order to provide as many distinct non-target speakers, per disc, as possible. This was done to maximize the utility of the data in tests where the data on only 1 of the 2 discs is used. 3) In order to produce a somewhat "clean" corpus with respect to channel effects, conversations that were determined by the transcribers to contain either high static or high echo were eliminated. 4) Conversations that were listed on Switchboard's bug reports were also eliminated. FILE FORMAT: ------------ All SPIDRE corpus files are of the form: sw. Where, CONVERSATION-ID ::= 1000 ... 9999 (base 10) FILETYPE ::= .wav | .txt | .mrk SPIDRE Filetypes ---------------- .wav - two-channel u-law encoded audio waveform files with standard NIST SPHERE headers. Each .wav file contains one conversation of not more than ten minutes. Each channel was intended to contain the audio for one speaker in the conversation (although crosstalk between channels is known to exist for some conversations). For the earlier conversations, those preceding 3170, there was generally an initial time offset between the channels, and variation in the offset as the conversation proceeded. This was due to certain peculiarities in the collection process including some random losses of data. For the later conversations this problem was corrected. For some of these conversations, those with significant cross talk, using which the offset could be tracked, samples have been deleted from non-speech parts of the data to approximately correct the offsets. Each speech disc contains a list of conversations that were processed in this manner in the "readme.doc" file in the top-level directory of the disc. .txt - text files containing interleaved transcriptions of both channels. The .txt files contain headers which describe various parameters of the conversation. See "txt_spec.doc" for more details. .mrk - time-aligned word transcriptions. See "mrk_spec.doc" for more details.