(An updated version of this page, listing known issues, will be kept at http://projects.ldc.upenn.edu/ArabicTreebank/) As noted in the readme-files.txt, we now categorize all source tokens with a status value of 1,2,3, or 4, depending on their relation with SAMA. To repeat the information from readme-files.txt, these tokens are categorized as: STATUS 1: 122772 Included in SAMA STATUS 2: 411 Limited Solution STATUS 3: 2526 Pending SAMA Solution STATUS 4: 18490 Excluded from check with SAMA ================ 144199 Thus, excluding the punctuation and numeric tokens that receive status 4, 122772/(122772+411+2526)=122772/125709=97.7% of the tokens have status 1. We have examined the most frequent instances of tokens with status 3, and and have corrected all the ones that had mild differences from the proper solution in SAMA. A residue is left that we comment on here. Each word is listed as it appears in the source text, and the number of times it occurs. We only include here tokens that occur 15 times or more, although other less frequently occurring tokens fall into some of the following groups. --------------------------- Correct solution not in SAMA --------------------------- There are other cases for which the correct solution is missing from SAMA 3.1 and needs a new entry: 17 ldyhA 10 ldyhm 8 ldynA 5 ldyh 3 wldyhA These appear as NOUN+PRON in SAMA 3.1, when they should be NOUN+POSS_PRON. 67 >yDAF 4 AyDAF There is a "hole" in SAMA 3.1, such that the solution that appears for AyDA and >yDA does not appear when the "F" is included in the input string. 24 ynbgy The IV-based solution for ynbgy, as present in ATB3, is missing in SAMA 3.1. --------------------------- Change required to treebank --------------------------- 38 vlAvp 33 wAHd 22 wAHdp 15 wAHdAF 15 E$r These are cases in which some instances of these tokens in the current segment have a morphological/pos solution that should be changed to be consistent with a solution in SAMA (other instances of these tokens are already consistent with SAMA, and so have status 1). In general, these are changes relating to NOUN and ADJ, and correcting these cases would require changes both to the tree and tokens. We have decided to leave this set of cases for the next revision of this segment. --------------------------- missing IVSUFF_MOOD marker --------------------------- 23 yjry 22 ybdw 20 y&dy 18 tjry 17 y>ty 16 ysEY There are 962 cases in total in which the annotation for an IV is missing the usual IVSUFF_MOOD marker. These are overwhelmingly cases of verbs with a weak" letter (y,w). The mood markers for these cases will be added in the future. --------------------------- miscellaneous --------------------------- 37