CALLHOME American English Lexicon (PRONLEX)

Item Name: CALLHOME American English Lexicon (PRONLEX)
Author(s): Paul Kingsbury, Stephanie Strassel, Cynthia McLemore, Robert MacIntyre
LDC Catalog No.: LDC97L20
ISBN: 1-58563-110-8
ISLRN: 119-159-358-214-6
Member Year(s): 1994, 1995, 1996, 1997
DCMI Type(s): Text
Data Source(s): telephone conversations
Project(s): EARS, Hub5-LVCSR, GALE
Application(s): speech recognition
Language(s): English
Language ID(s): eng
License(s): CALLHOME Lexicon Agreement (Commercial)
CALLHOME Lexicon Agreement (Non-Commercial)
CALLHOME Lexicon Agreement (Non-Member)
Online Documentation: LDC97L20 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Kingsbury, Paul, et al. CALLHOME American English Lexicon (PRONLEX) LDC97L20. Web Download. Philadelphia: Linguistic Data Consortium, 1994.

Introduction

The CALLHOME English collection includes a lexical component. The CALLHOME American English Lexicon was originally distributed under the name COMLEX Pronouncing Lexicon, or PRONLEX. Organizations that have already received PRONLEX will not be required to purchase the CALLHOME American English Lexicon.

Data

The latest version of PRONLEX contains 90,988 lexical entries and includes coverage of WSJ30, WSJ64, Switchboard and CALLHOME English. (WSJ30K and WSJ64K are word lists selected from several years of Wall Street Journal texts used in recent ARPA Continuous Speech Recognition corpora. Switchboard is a three million word corpus of telephone conversations on a variety of topics.)

This lexicon is available by ftp to organizations who sign a license agreement, which is also found on the LDC FTP site.

The PRONLEX documentation describes the principles observed for word transcription. Although predictable variation in pronunciation due to dialect or variable reduction has not been notated in the lexicon itself, the documentation notes systematic dialectal variants, which may be generated by rule. In addition, alternate pronunciations are given for words whose pronunciation varies by part of speech (e.g., abstrAct, Abstract), or in less systematic but salient ways (especially names). Classes of exceptions to the transcription principles, such as names, function, words and foreign words, are tagged.

Here is a sample page. The transcripts and documentation (LDC97T14) are available, as well as a corpus of telephone speech (LDC97L20).

Updates

There are no updates at this time.

Available Media

View Fees

Member
Non-Member
Reduced-License
Extra Copy
Login for the applicable fee