Buckwalter Arabic Morphological Analyzer Version 2.0


Item Name: Buckwalter Arabic Morphological Analyzer Version 2.0
Authors: Tim Buckwalter
LDC Catalog No.: LDC2004L02
ISBN: 1-58563-324-0
Release Date: Dec 15, 2004
Data Type: lexicon
Project(s): GALE, TIDES
Application(s): information retrieval, machine translation, natural language processing
Language(s): Modern Standard Arabic
Language ID(s): arb
Distribution: 1 CD
Member fee: $0 for 2004 members
Non-member Fee: N/A (Members Only)
Reduced-License Fee: N/A
Extra-Copy Fee: US $0.00
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Tim Buckwalter
2004
Buckwalter Arabic Morphological Analyzer Version 2.0
Linguistic Data Consortium, Philadelphia

Introduction

This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0, Linguistic Data Consortium (LDC) catalog number LDC2004L02 and ISBN 1-58563-311-9.

Note: This release, unlike Version 1, is available only to LDC members. To find out how to join, please consult our FAQ. There are additional licenseing terms that apply. To examine the license, please follow the Member License Online link above. You will also be presented with this license upon download and will be asked to accept. You must accept the terms in order for the download to proceed.

Data

The data consists primarily of three Arabic-English lexicon files: prefixes (548 entries), suffixes (906 entries), and stems (78,839 entries representing 40,219 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (2,435 entries), stem-suffix combinations (1,612 entries), and prefix-suffix combinations (1,138 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script (AraMorph.pl). Sample input (infile.txt) and corresponding output file (outfile.xml) are provided. The documentation consists of a readme file with a description of the three lexicon files, the three morphological compatibility tables, the morphology analysis algorithm, and a table with the authors Arabic transliteration system.

Samples

To see an example of the analyzers output, please examine this sample.

Availablity

The release is available to 2004 and 2006 members via download here. Copies may also be requested on CD for an additional fee of US$150.

Copyright

Portions 2002-2004 QAMUS LLC (www.qamus.org), 2002-2004 Trustees of the University of Pennsylvania