CSLU: Speaker Recognition Version 1.1
|Item Name:||CSLU: Speaker Recognition Version 1.1|
|LDC Catalog No.:||LDC2006S26|
|Release Date:||May 18, 2006|
|Data Source(s):||telephone speech, telephone conversations|
|Online Documentation:||LDC2006S26 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||CSLU. CSLU: Speaker Recognition Version 1.1 LDC2006S26. Web Download. Philadelphia: Linguistic Data Consortium, 2006.|
This file contains documentation on the CSLU Speaker Recognition Corpus, Version 1.1, Linguistic Data Consortium (LDC) catalog number LDC2006S26 and ISBN 1-58563-382-8.
The Speaker Recognition corpus (formerly known as Speaker Verification), consists of telephone speech from 91 participants. Each participant has recorded speech in twelve sessions over a two-year period answering questions like "what is your eye color" or responding to prompts like "describe a typical day in your life." Most of the utterances in the release of the corpus have corresponding non-time-aligned word level transcriptions.
In most of the CSLU data collections, each participant calls a toll free telephone number and answers a few question. CSLU records the speech, transcribes it, then packages it as a released corpus.
The Speaker Recognition data collection was quite a bit more complicated. The goal of the data collection was to collect speech from each participant over a two-year period. Each participant called call the data collection system 12 times over the two-year period and say the same utterances each time.
Some of the recording sessions were only a few days apart and others several weeks apart. Participant followed the following calling schedule. During the first month, they called twice in a week. No calls were made in the second and third months. In the fourth month they made one call. No calls were made in the fifth and sixth months. This pattern repeated three more times for a total of 12 calls per participant.
In order to balance the workload required to remind participants to call and to avoid large data collection bursts on the system, the participants were divided into 12 groups. Each group began the two-year schedule on subsequent months. The first group started in September 1996. The second group started in October 1996. And so on.
For an example of the data in this corpus, please listen to the following audio sample.