CSLU: Voices
Item Name: | CSLU: Voices |
Author(s): | Alexander Kain |
LDC Catalog No.: | LDC2006S01 |
ISBN: | 1-58563-363-1 |
ISLRN: | 960-768-408-027-3 |
DOI: | https://doi.org/10.35111/7vr2-b249 |
Release Date: | January 19, 2006 |
Member Year(s): | 2006 |
DCMI Type(s): | Sound, Text |
Sample Type: | pcm |
Sample Rate: | 22050 |
Data Source(s): | microphone speech |
Application(s): | speaker identification, speaker verification, speech recognition, speech synthesis |
Language(s): | English |
Language ID(s): | eng |
License(s): |
CSLU Agreement |
Online Documentation: | LDC2006S01 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Kain, Alexander. CSLU: Voices LDC2006S01. Web Download. Philadelphia: Linguistic Data Consortium, 2006. |
Introduction
CSLU: Voices was developed by Alexander Kain and consists of approximately two hours of read speech in English and includes associated transcripts, laryngograph signals, pitch marks, and phonetic labels. The corpus was created for Kain's Ph.D. dissertation work on high resolution voice transformation (VT) and contains 12 speakers reading 50 phonetically rich sentences.
VT is a technology that modifies a source speaker's speech utterance to sound as if a target speaker had spoken it. The purpose of this corpus is to aid VT research and development by providing naturally time-aligned sentences. Consequently, removal of individual prosodic characteristcs, such as fundamental pitch and durations, requires only very little processing and results in high-quality speech samples that only differ in their segmental properties, which is the focus of transformation. These "prosody-normalized" speech samples are used for training VT systems, as well as for evaluating their transformation performance objectively and subjectively.
Data
The recording procedure involved a "mimicking" approach which resulted in a high degree of natural time-alignment between different speakers. The acoustic wave and the concurrent laryngograph signal were recorded for one "free" and two "mimicked" renditions of each sentence. Laryngograph signals, pitch marks calculated from the laryngograph, and time marks from a forced-alignment algorithm, have been added to the corpus.
The corpus includes seven male speakers and five female speakers.
Samples
For an example of the data contained in this publication, please review the following samples.
- Concurrent laryngograph (LAR)
- Pitch marks derived from laryngograph signal (PMV)
- Transcription (TXT)
- Wave file of speech (WAV)
Updates
None at this time.