Voicemail Corpus Part I
Item Name: | Voicemail Corpus Part I |
Author(s): | M Padmanabhan, G Ramaswamy, B Ramabhadran, P S. Gopalakrishnan, C Dunn |
LDC Catalog No.: | LDC98S77 |
ISBN: | 1-58563-141-8 |
ISLRN: | 074-386-777-466-6 |
DOI: | https://doi.org/10.35111/96hh-2926 |
Member Year(s): | 1998 |
DCMI Type(s): | Sound |
Sample Type: | 1-channel ulaw |
Sample Rate: | 8000 |
Data Source(s): | telephone speech |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
Online Documentation: | LDC98S77 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Padmanabhan, M, et al. Voicemail Corpus Part I LDC98S77. Web Download. Philadelphia: Linguistic Data Consortium, 1998. |
Related Works: | View |
Introduction
This corpus was created by: M. Padmanabhan, G. Ramaswamy, B. Ramabhadran, P. S. Gopalakrishnan and C. Dunn
Data
This corpus consists of 1,801 messages, collected from volunteers at various IBM sites in the United States, comprising the training data set and 42 messages in the development test set. The average voicemail message is 31 seconds in duration and has about 100 words. Approximately 38% of the messages correspond to male speakers the remainder correspond to females. All messages were transcribed by IBM.
Samples
Please view the following samples:
Updates
There are no updates at this time.
Additional Licensing Instructions
This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.