Arabic Broadcast News Speech

Item Name: Arabic Broadcast News Speech
Author(s): Mohamed Maamouri, David Graff, Christopher Cieri
LDC Catalog No.: LDC2006S46
ISBN: 1-58563-419-0
ISLRN: 537-141-493-555-6
Release Date: December 19, 2006
Member Year(s): 2006
DCMI Type(s): Sound
Sample Type: pcm
Sample Rate: 16000
Data Source(s): broadcast news
Application(s): machine translation, machine learning
Language(s): Standard Arabic
Language ID(s): arb
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2006S46 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Maamouri, Mohamed, David Graff, and Christopher Cieri. Arabic Broadcast News Speech LDC2006S46. DVD. Philadelphia: Linguistic Data Consortium, 2006.


Arabic Broadcast News Speech consists of 10 hours of speech recorded by the Linguistic Data Consortium (LDC) from Voice of America satellite radio news broadcasts in Arabic transmitted  between June 2000 and January 2001. The corresponding transcripts are available as Arabic Broadcast News Transcripts (LDC2006T20).

This work was undertaken in the Networking Data Centers (NetDC) project (MLIS-5017, NSF IIS-9982201) in conjunction with the European Language Resources Association (ELRA). ELRA collected 22.5 hours of Arabic broadcast data from Radio Orient (France) that is available in NetDC Arabic BNSC (Broadcast News Speech Corpus) ELRA-S0157. The goal of the NetDC project was to improve the infrastructure for language resources by designing and implementing new modes of cooperation between LDC and ELRA.


The recordings were captured from a dedicated satellite receiver and stored as 16-bit PCM, 16-kHz, single-channel, in NIST SPHERE format. The duration of each recording is either 60 minutes or 120 minutes, depending on the VOA broadcast schedule; the date (YYYYMMDD), start-time and end-time (HHMM EST) for each recording are indicated in the file names. The sample data are not compressed.


For an example of the speech in this corpus, please listen to this audio sample (wav format).

Available Media

View Fees

Login for the applicable fee