Iraqi Arabic - English Lexical Database

Item Name: Iraqi Arabic - English Lexical Database
Author(s): Mohamed Maamouri, David Graff
LDC Catalog No.: LDC2025L01
ISLRN: 362-004-101-706-6
DOI: https://doi.org/10.35111/7fr9-g791
Release Date: January 15, 2025
Member Year(s): 2025
DCMI Type(s): Text
Data Source(s): dictionaries
Project(s): DOE/IRS2008-0256
Application(s): language teaching, machine translation, part of speech tagging, pronunciation modeling
Language(s): Mesopotamian Arabic, English
Language ID(s): acm, eng
License(s): Iraqi Arabic - English Lexical Database Agreement
Online Documentation: LDC2025L01 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Maamouri, Mohamed, and David Graff. Iraqi Arabic - English Lexical Database LDC2025L01. Web Download. Philadelphia: Linguistic Data Consortium, 2025.
Related Works: View

Iraqi Arabic - English Lexical Database was developed by the Linguistic Data Consortium (LDC). It contains six interrelated tables presenting over 67,000 Iraqi Arabic words as orthographic forms in Arabic script and pronunciation forms in International Phonectic Alphabetic (IPA) format, along with more than 120,000 English tokens.

This release is the result of a collaboration with Georgetown University Press to enhance and update three dialectal Arabic dictionaries -- Iraqi, Moroccan and Syrian -- originally published in the 1960s. The Georgetown Dictionary of Iraqi Arabic was published in 2013. That work was based on, and expanded, two dictionaries, A Dictionary of Iraqi Arabic: English-Arabic (Clarity, Stowasser and Wolfe, eds., 2003) and A Dictionary of Iraqi Arabic: Arabic-English (Woodhead and Beene, eds., 2003).

The several enhancements developed by LDC in the updated and enhanced dictionary and the lexical database included facilitating comparisons across Arabic dialects and Modern Standard Arabic by providing Arabic script spellings and IPA pronunciations to Iraqi words and phrases; promoting ease of use by language learners and researchers by developing reasonable orthographic conventions for applying the Arabic alphabet to the dialect; and facilitating a user's understanding of morphological and lexical relations by adding information on the linguistic structures of Iraqi Arabic.

Data

The number of entries in each table is as follows:

Roots 4,512
Lemmas 17,224
Wordforms 22,988
Multi-word Expressions 261
Definitions 23,834
Phrases 15,714

Each table is presented as a UTF-8 encoded tab-delimited file with Unix-style (line-feed only) line breaks.

The documentation accompanying this release includes instructions for combining into one database the tables in this corpus with the tables in Moroccan Arabic - English Lexical Database LDC2023L01.

Acknowledgments

This work was supported by the U.S. Department of Education International Research Studies Program (#P017A0800441) with additional support from GUP and LDC.

Samples

Please view these samples:

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee