Buckwalter Arabic Morphological Analyzer Version 2.0
Item Name: | Buckwalter Arabic Morphological Analyzer Version 2.0 |
Author(s): | Tim Buckwalter |
LDC Catalog No.: | LDC2004L02 |
ISBN: | 1-58563-324-0 |
ISLRN: | 694-194-540-336-4 |
DOI: | https://doi.org/10.35111/050q-5r95 |
Release Date: | December 15, 2004 |
Member Year(s): | 2004 |
DCMI Type(s): | Sound |
Project(s): | TIDES, GALE |
Application(s): | natural language processing, machine translation, information retrieval |
Language(s): | Standard Arabic, English |
Language ID(s): | arb, eng |
License(s): |
BAMA Agreement |
Online Documentation: | LDC2004L02 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02. Web Download. Philadelphia: Linguistic Data Consortium, 2004. |
Related Works: | View |
Introduction
This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0.
Data
The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82158 entries representing 38600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1648 entries), stem-suffix combinations (1285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the authors Arabic transliteration system.
Samples
To see an example of the analyzers output, please examine this sample.
Additional Licensing Instructions
This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.