File: 0readme.txt Readme File for the Korean-English Parallel Treebank Corpus This corpus consists of 33 texts originally written in Korean and translated into English for purposes of language training in a military setting. The texts were made available for linguistic research by the Defense Language Institute (DLI). They were delivered on paper to the Institute for Research in Cognitive Science (IRCS) at the University of Pennsylvania, where it was typed in to data files, using the KSC 5601 character set encoding (also known as KS X 1001 Wansung). Both the Korean and English texts are presented with complete Treebank annotation which was done manually at IRCS, including syntactic constituent bracketing and part-of-speech (POS) tagging. Further documentation about the parsing and POS specifications used in these annotations can be found here: http://www.cis.upenn.edu/~xtag/koreantag/ The text files mostly contain sets of question and answer sentences. A full, unannotated sentence is presented first, on a single line with an initial semi-colon character ";" -- the first token on such lines (the string preceding the first space character on the line) is a sentence-identifier tag that matches the English and Korean versions of the sentence. The parsed/POS-tagged annotation of the sentence follows on subsequent lines. There are a total of 5083 sentences in the 33 data files for each language; the number of sentences per data file ranges from 79 to 245. For convenience, two table files are provided that list the file names, sentence-ID tags and sentence data: "sentence-list.eng" for the English files, "sentence-list.kor" for the Korean. Acknowledgements: The Korean/English Treebank annotation at IRCS was funded by a subcontract from CoGenTex, based on an Army Research Lab SBIR Phase II, DAAL01-97-C-0016, for Korean/English Machine Translation of Battlefield Messages, and also by DARPA TIDES Grant N66001-00-1-8915. Members of the IRCS Korean/English annotation crew: Korean - Chung-Hye Han, Na-Rae Han, Eon-Suk Ko, Hee-Jong Yi English - Alan Lee, Chris Walker, John Duda, Nianwen Xue Quality Control - Nianwen Xue Project Management - Martha Palmer