Buckwalter Arabic Morphological Analyzer Version 2.0
|Item Name:||Buckwalter Arabic Morphological Analyzer Version 2.0|
|LDC Catalog No.:||LDC2004L02|
|Release Date:||December 15, 2004|
|Application(s):||natural language processing, machine translation, information retrieval|
|Language(s):||Standard Arabic, English|
|Language ID(s):||arb, eng|
|Online Documentation:||LDC2004L02 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02. Web Download. Philadelphia: Linguistic Data Consortium, 2004.|
This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0 , Linguistic Data Consortium (LDC) catalog number LDC2004T27 and isbn 1-58563-311-9.
The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82158 entries representing 38600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1648 entries), stem-suffix combinations (1285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the authors Arabic transliteration system.
To see an example of the analyzers output, please examine this sample.
Additional Licensing Instructions
This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact firstname.lastname@example.org for information about becoming a member.