Buckwalter Arabic Morphological Analyzer Version 2.0

Item Name: Buckwalter Arabic Morphological Analyzer Version 2.0
Author(s): Tim Buckwalter
LDC Catalog No.: LDC2004L02
ISBN: 1-58563-324-0
ISLRN: 694-194-540-336-4
DOI: https://doi.org/10.35111/050q-5r95
Release Date: December 15, 2004
Member Year(s): 2004
DCMI Type(s): Sound
Project(s): TIDES, GALE
Application(s): natural language processing, machine translation, information retrieval
Language(s): Standard Arabic, English
Language ID(s): arb, eng
License(s): BAMA Agreement
Online Documentation: LDC2004L02 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02. Web Download. Philadelphia: Linguistic Data Consortium, 2004.
Related Works: View


This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0.


The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82158 entries representing 38600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1648 entries), stem-suffix combinations (1285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the authors Arabic transliteration system.


To see an example of the analyzers output, please examine this sample.

Additional Licensing Instructions

This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.

Available Media

View Fees

Login for the applicable fee