Korean Treebank Annotations Version 2.0


Item Name: Korean Treebank Annotations Version 2.0
Authors: Na-Rae Han, Shijong Ryu, Sook-Hee Chae, Seung-yun Yang, Seunghun Lee, and Martha Palmer
LDC Catalog No.: LDC2006T09
ISBN: 1-58563-381-X
Release Date: Apr 17, 2006
Data Type: text
Data Source(s): newswire
Application(s): automatic content extraction, discourse analysis, information detection, information extraction, morphology learning, natural language processing, parsing, part of speech tagging, syntactic parsing
Language(s): Korean
Language ID(s): kor
Distribution: Web Download
Member fee: $0 for 2006 members
Non-member Fee: US $500.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: N/A
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Na-Rae Han, et al.
2006
Korean Treebank Annotations Version 2.0
Linguistic Data Consortium, Philadelphia

Introduction

The Korean Treebank Annotations Version 2.0 is an extension of the Korean English Treebank Annotations corpus, LDC2002T26 (2002). It is essentially an electronic corpus of Korean texts annotated with morphological and syntactic information. The original texts for the Korean Treebank 2.0 were selected from The Korean Newswire corpus published by LDC, catalog number LDC2000T45, which is a collection of Korean Press Agency news articles from June 2, 1994 to March 20, 2000. Korean Treebank 2.0 is based on the March 2000 portion of the corpus and includes 647 articles. The annotated corpus can find many uses, including training of morphological analyzers, part-of-speech taggers and syntactic parsers.

The text is encoded as KSC-5601(EUC-KR). Version 1.1 of the treebank is included in this release.

Samples

For an example of the data in the corpus, please review this sample.

Content Copyright

2001-2002 CoGenTex, Inc., 2000 Korean Press Agency, 2000-2005, 2006 Trustees of the University of Pennsylvania