Home › Language Resources › Data

CALLHOME Egyptian Arabic Speech

Item Name:	CALLHOME Egyptian Arabic Speech
Author(s):	Alexandra Canavan, George Zipperlen, David Graff
LDC Catalog No.:	LDC97S45
ISBN:	1-58563-114-0
ISLRN:	102-150-894-143-2
DOI:	https://doi.org/10.35111/d8yb-9m13
Member Year(s):	1997
DCMI Type(s):	Sound
Sample Type:	2-channel ulaw
Sample Rate:	8000
Data Source(s):	telephone conversations
Project(s):	Hub5-LVCSR, GALE, EARS
Application(s):	speech recognition
Language(s):	Egyptian Arabic
Language ID(s):	arz
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC97S45 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Canavan, Alexandra, George Zipperlen, and David Graff. CALLHOME Egyptian Arabic Speech LDC97S45. Web Download. Philadelphia: Linguistic Data Consortium, 1997.
Related Works: Hide	View hasAnnotation LDC97T19 CALLHOME Egyptian Arabic Transcripts LDC2007S10 2003 NIST Rich Transcription Evaluation Data LDC2012T09 Arabic-Dialect/English Parallel Text LDC2020T05 BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training hasContinuation LDC2002S37 CALLHOME Egyptian Arabic Speech Supplement isSimilarWith LDC96S49 CALLFRIEND Egyptian Arabic relatesTo LDC99L22 Egyptian Colloquial Arabic Lexicon

Introduction

The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary represents is Cairene Arabic.

Data

All calls, which lasted up to 30 minutes, originated in North America and were placed to locations overseas (typically Egypt). Most participants called family members or close friends.

This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (LDC97T19) are available separately, as is an associated lexicon (LDC99L22).

Samples

Please listen to this speech sample.

Updates

The "shorten" and "sphere" directories have been removed.

The sphere directory contained NIST "SPeech HEader REsources" (SPHERE): C-language source code libraries and utilities for manipulating NIST SPHERE-format waveform files.

The shorten directory contained files for Tony Robinson's "shorten" software for speech compression.

A more recent version of the SPHERE utilities is now available on the NIST web site; additional utilities for converting from SPHERE to other waveform file formats is also available at the LDC web site.

CALLHOME Egyptian Arabic Speech

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees