Speaker Recognition Corpus Release 1.1 Center for Spoken Language Understanding UPDATED: 29 March 2002 Overview -------- The Speaker Recognition corpus (formerly known as Speaker Verification), consists of telephone speech from about 500 participants. Each participant has recorded speech in twelve sessions over a two-year period answering questions like "what is your eye color" correspond to prompts like "describe a typical day in your life." Most of the utterances in the release of the corpus have corresponding non-time-aligned word level transcriptions. This release contains a /speech directory that contains all Of the recorded utterances. In the /speech directory are Subdirectories for each speaker. Within those subdirectories are *another* layer of subdirectories for each recording session. The files themselves are encoded with the id and session number as well. See the formats.txt file in the /docs directory for more information. Along with the speech files, Release 1.1 contains non-time- aligned word-level transcriptions (which comply with the conventions in the CSLU Labeling Guide) of nearly all of the utterances, as well as gender and age information for each speaker. All of this information is found in the trans.txt file in the /docs directory. In addition, the individual transcriptions have been placed in the /trans directory, which uses a structure that exactly parallels the structure of the /speech directory. Collection Method ----------------- In most of the CSLU data collections, each participant calls a toll free telephone number and answers a few question. CSLU records the speech, transcribes it, then packages it as a released corpus. The Speaker Recognition data collection was quite a bit more complicated. The goal of the data collection was to collect speech from each participant over a two year period. Each participant called the data collection system twelve times over the two-year period and said the same utterances each time. Some of the recording sessions were only a few days apart and others several weeks apart. Participant followed the following calling schedule. During the first month, they called twice in a week. No calls were made in the second and third months. In the fourth month they made one call. No calls were made in the fifth and sixth months. This pattern repeated three more times for a total of twelve calls per participant. In order to balance the workload required to remind participants to call and to avoid large data collection bursts on the system, the participants were divided into twelve groups. Each group began the two-year schedule on subsequent months. The first group started in September, 1996. The second group started in October, 1996. And so on. Participants failing to make the required calls in a timely manner were dropped from the program and not notified of the future calls to make. Recording Conditions -------------------- All of the data in this corpus were collected over digital telephone lines. The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 khz 8-bit and stored as ulaw files. The .wav files contain speech data and use the RIFF standard file format. This file format is 16-bit linearly encoded. Subject Population ------------------ Every attempt was made to create a gender balanced subject pool. As each group started the data collection it had an equal number of both genders. However, as participants were dropped, the balance couldn't be perfectly maintained. Annotation ---------- Nearly all of the files included in this corpus have corresponding non-time-aligned word-level transcriptions that comply with the conventions in the CSLU Labeling Guide. The current releases have only transcribed some of the long spontaneous utterance. Protocol -------- Each speaker in this data collection called the system twelve times. Each time they were asked the same set of questions and, for the most part, in the same order. Each question, except as noted below, was asked four times during each recording session. We asked the participants to use the same answer each time they answered the question but there was some variability. Limited vocabulary utterances ----------------------------- Each participant answered the following questions four times during each recording session (for a total of sixteen utterances): What is your mother's maiden name? What color are your eyes? In which month were you born? In which city and state were you born? Numbers ------- Each participant repeated the following number strings four times during each recording session (for a total of 24 utterances): 5 3 8 2 4 6 1 oh 9 7 4 0 7 1 3 2 8 3 7 6 1 9 oh 5 4 0 5 2 3 9 Words ----- Each participant repeated the following words four times during each recording session (for a total of 32 utterances): azure button little offstage mango whereabouts choices decision Phrases ------- Each participant repeated the following phrases twice during each recording session (for a total of sixteen utterances). stop each car if it's little play in the street up ahead a fifth wheel caught speeding it's been about two years since davey kept shotguns charlie did you think to measure the tree tina got cued to make a quicker escape joe books very few judges here i was in miami and illinois Spontaneous Speech ------------------ Each participant was asked to speak for about 20 seconds in response to one of the following requests twice during each recording session (for a total of two utterances). tell us something about yourself describe a typical day in your life tell us what you like most about where you live tell us about your family tell us about your dream home tell us something about the town where you grew up tell us about your favorite restaurant tell us about your favorite sport or hobby tell us about your favorite movie or television show Password -------- During the first recording session the speaker was prompted to create a password (or passphrase). On subsequent recording sessions the speaker would be asked to repeat that password (or passphrase). The password/passphrase prompt appeared four times during each session (for a total of four utterances). Mimic ----- The final prompt of each session asked the speaker to listen carefully to the prompter then mimic her as best as possible when saying, "If it doesn't matter who wins, why do we keep score?" (one utterance).