(An updated version of this page, listing known issues, will be kept at http://projects.ldc.upenn.edu/ArabicTreebank/) As noted in the readme-files.txt, we now categorize all source tokens with a status value of 1,2,3, or 4, depending on their relation with SAMA. To repeat the information from readme-files.txt, these tokens are categorized as: STATUS 1: 415924 Included in SAMA STATUS 2: 735 Limited Solution STATUS 3: 3474 Pending SAMA Solution STATUS 4: 12843 Excluded from check with SAMA ================ 432976 Thus, excluding the status 4 tokens, 415924/(415924+735+3474)=415924/420133=99.0% of the tokens have status 1. We have examined the most frequent instances of tokens with status 3, and and have corrected all the ones that had mild differences from the proper solution in SAMA. A residue is left that we comment on here. Each word is listed as it appears in the source text, and the number of times it occurs. We only include here tokens that occur 15 times or more, although other less frequently occurring tokens fall into some of the following groups. --------------------------- Correct solution not in SAMA --------------------------- There are other cases for which the correct solution is missing from SAMA 3.1 and needs a new entry: 61 ldynA 41 ldyhA 27 ldyhm 26 ldyh These appear as NOUN+PRON in SAMA 3.1, when they should be NOUN+POSS_PRON. --------------------------- Change required to treebank --------------------------- 84 jzylA 22 Almzyd 16 AlSdry 16 >vnA' 15 kAfp These are cases in which some instances of these tokens in the current segment have a morphological/pos solution that should be changed to be consistent with a solution in SAMA (other instances of these tokens are already consistent with SAMA, and so have status 1). In general, these are changes relating to NOUN and ADJ, and correcting these cases would require changes both to the tree and tokens. We have decided to leave this set of cases for the next revision of this segment. --------------------------- NOUN_PROP issues --------------------------- 41 AlErAqyp 15 AlEAlm SAMA has a default for unknown NOUN_PROP words with the lemma DEFAULT and no vocalization. Some such cases were given more informative annotations in the corpus.