Prague Dependency Treebank 2.0

Item Name: Prague Dependency Treebank 2.0
Author(s): Jan Hajič, Jarmila Panevová, Eva Hajičová, Petr Sgall, Petr Pajas, Jan Štěpánek, Jiří Havelka, Marie Mikulová, Zdeněk Žabokrtský, Magda Ševčíková-Razímová, Zdeňka Urešová
LDC Catalog No.: LDC2006T01
ISBN: 1-58563-370-4
ISLRN: 942-053-729-014-3
Release Date: July 21, 2006
Member Year(s): 2006
DCMI Type(s): Text
Data Source(s): newswire, news magazine, journal articles
Application(s): parsing, language teaching, language modeling, information retrieval, information extraction, tagging
Language(s): Czech
Language ID(s): ces
License(s): Prague Dependency Treebank 2.0
Online Documentation: LDC2006T01 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Hajič, Jan, et al. Prague Dependency Treebank 2.0 LDC2006T01. CD. Philadelphia: Linguistic Data Consortium, 2006.

Introduction

The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (two million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW) in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level.

PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well.

Samples

For an example of the data in this publication, please examine these samples.

Available Media

View Fees

Member
Non-Member
Reduced-License
Extra Copy
Login for the applicable fee