Emotional Prosody Speech and Transcripts

Item Name: Emotional Prosody Speech and Transcripts
Author(s): Mark Liberman, Kelly Davis, Murray Grossman, Nii Martey, John Bell
LDC Catalog No.: LDC2002S28
ISBN: 1-58563-237-6
ISLRN: 191-383-337-125-7
DOI: https://doi.org/10.35111/37ff-a902
Release Date: July 23, 2002
Member Year(s): 2002
DCMI Type(s): Sound
Sample Type: 2-channel pcm
Sample Rate: 22050
Data Source(s): microphone speech
Application(s): speech recognition, prosody, pronunciation modeling
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2002S28 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Liberman, Mark, et al. Emotional Prosody Speech and Transcripts LDC2002S28. Web Download. Philadelphia: Linguistic Data Consortium, 2002.
Related Works: View

Introduction

Emotional Prosody Speech and Transcripts was developed by the Linguistic Data Consortium and contains audio recordings and corresponding transcripts, collected over an eight month period in 2000-2001 and designed to support research in emotional prosody. The recordings consist of professional actors reading a series of semantically neutral utterances (dates and numbers) spanning fourteen distinct emotional categories, selected after Banse & Scherers study of vocal emotional expression in German. (Banse, R. & Scherer, K. R. 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70, 614-636.)

Actor participants were provided with descriptions of each emotional context, including situational examples adapted from those used in the original German study. Flashcards were used to display series of four-syllable dates and numbers to be uttered in the approriate emotional category.

The Prosody Recordings Project was interested in capturing the aspects of speech (emotion, intonation) that are left out of the written form of a message. In these experiments, simple phrases are expressed in ways that reflect varied contexts. The same phrase might be used to answer different questions, address listeners at different distances from the speaker, or express different emotional states. Actors were used because they are experts at producing this kind of contextual variation in a natural and convincing way.

Data

There are 30 data files: 15 recordings in sphere format and their transcripts. 

The sphere files are encoded in two-channel interleaved 16-bit PCM, high-byte-first (big-endian) format, for a total of 2,912,067,980 bytes (2777 Mbytes) or nine hours of sphere data.

The utterences were recorded directly into WAVES+ datafiles, on two channels with a sampling rate of 22.05K. The two microphones used were a stand-mounted boom Shure SN94 and a headset Seinnheiser HMD 410.

The original session recordings are provided in their entirety, including informal chit-chat and discussion between each emotion category elicitation task. Time alignment is limited to utterances within the formal elicitation tasks and miscellanous regions have been marked as such.

Samples

Updates

There are no updates at this time.

Available Media

View Fees





Login for the applicable fee