The Child Subglottal Resonances Database
Item Name: | The Child Subglottal Resonances Database |
Author(s): | Steven M. Lulich, Abeer Alwan, Mitchell S. Sommers, Gary Yeung |
LDC Catalog No.: | LDC2022S02 |
ISBN: | 1-58563-985-0 |
ISLRN: | 550-643-277-274-6 |
DOI: | https://doi.org/10.35111/75r1-yj93 |
Release Date: | February 15, 2022 |
Member Year(s): | 2022 |
DCMI Type(s): | Sound, StillImage, Text |
Data Source(s): | microphone speech |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2022S02 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Lulich, Steven M., et al. The Child Subglottal Resonances Database LDC2022S02. Web Download. Philadelphia: Linguistic Data Consortium, 2022. |
Related Works: | View |
Introduction
The Child Subglottal Resonances Database was developed by Washington University and University of California Los Angeles and consists of 15.5 hours of simultaneous microphone and subglottal accelerometer recordings of 19 male and 9 female child speakers of American English between 7 years 6 months and 17 years 8 months of age.
The subglottal system is composed of the airways of the tracheobronchial tree and the surrounding tissues. It powers airflow through the larynx and vocal tract, allowing for the generation of most of the sound sources used in languages around the world. The subglottal resonances (SGRs) are the natural frequencies of the subglottal system. During speech, the subglottal system is acoustically coupled to the vocal tract via the larynx. SGRs can be measured from recordings of the vibration of the skin of the neck during phonation by an accelerometer, much like speech formants are measured through microphone recordings.
SGRs have received attention in studies of speech production, perception, and technology. They affect voice production, divide vowels and consonants into discrete categories, affect vowel perception, and can be useful in automatic speech recognition.
Data
Speakers were recruited by Washington University's Psychology Department through its subject pool and through advertisements and flyers posted in the St. Louis, MO area.
The corpus consists of 34 monosyllables in a phonetically neutral carrier phrase (“I said a ____ again”), with a median of 6 repetitions of each word by each speaker, resulting in 5,247 individual microphone (and accelerometer) waveforms. The monosyllables were comprised of 14 hVd words and 20 CVb words where C was b, d, g, and V included all AE monophthongs and diphthongs.
The target vowel in each utterance was hand-labeled to indicate the start, stop, and steady-state parts of the vowel. For diphthongs, the steady-state refers to the diphthong nucleus which occurs early in the vowel.
The height and age of each speaker are included in the corpus metadata.
Audio files are presented as single channel 16-bit flac compressed wav files with sample rates of 48kHz or 16kHz. Image files are bitmap image files, and plain text is UTF-8.
Samples
Please view these samples:
Sponsorship
This work was supported in part by National Science Foundation Grant No. 0905250.
Updates
None at this time.