Emotional Prosody Speech and Transcripts

Item Name: Emotional Prosody Speech and Transcripts
Authors: Mark Liberman, Kelly Davis, Murray Grossman, Nii Martey, and John Bell
LDC Catalog No.: LDC2002S28
ISBN: 1-58563-237-6
Release Date: Jul 23, 2002
Data Type: speech
Sample Rate: 22050 Hz
Sampling Format: 2-channel pcm
Data Source(s): microphone speech
Application(s): pronunciation modeling, prosody, speech recognition
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 2002 members
Non-member Fee: US $2500.00
Reduced-License Fee: US $1250.00
Extra-Copy Fee: US $200.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Mark Liberman, et al.
Emotional Prosody Speech and Transcripts
Linguistic Data Consortium, Philadelphia


This file contains documentation on the 2002 Emotional Prosody Speech and Transcripts, Linguistic Data Consortium (LDC) catalog number LDC2002S28 and ISBN 1-58563-237-6.

This publication contains audio recordings and corresponding transcripts, collected over an eight month period in 2000-2001 and designed to support research in emotional prosody. The recordings consist of professional actors reading a series of semantically neutral utterances (dates and numbers) spanning fourteen distinct emotional categories, selected after Banse & Scherers study of vocal emotional expression in German. (Banse, R. & Scherer, K. R. 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70, 614-636.)

Actor participants were provided with descriptions of each emotional context, including situational examples adapted from those used in the original German study. Flashcards were used to display series of four-syllable dates and numbers to be uttered in the approriate emotional category.

The Prosody Recordings Project is interested in capturing the aspects of speech (emotion, intonation) that are left out of the written form of a message. In these experiments, simple phrases are expressed in ways that reflect varied contexts. The same phrase might be used to answer different questions, address listeners at different distances from the speaker, or express different emotional states. Actors were used because they are experts at producing this kind of contextual variation in a natural and convincing way.

More information about this project can be found at http://www.ldc.upenn.edu/Projects/Prosody/.


There are 30 data files: 15 recordings in sphere format and their transcripts. For a sample transcript, please click on this example.

The sphere files are encoded in two-channel interleaved 16-bit PCM, high-byte-first (big-endian) format, for a total of 2,912,067,980 bytes (2777 Mbytes) or nine hours of sphere data.

The utterences were recorded directly into WAVES+ datafiles, on two channels with a sampling rate of 22.05K. The two microphones used were a stand-mounted boom Shure SN94 and a headset Seinnheiser HMD 410.

The original session recordings are provided in their entirety, including informal chit-chat and discussion between each emotion category elicitation task. Time alignment is limited to utterances within the formal elicitation tasks and miscellanous regions have been marked as such.



There are no updates at this time.

Content Copyright

Portions 2000-2002 Trustees of the University of Pennsylvania.