Prague Dependency Treebank 2.0
Item Name: | Prague Dependency Treebank 2.0 |
Author(s): | Jan Hajič, Jarmila Panevová, Eva Hajičová, Petr Sgall, Petr Pajas, Jan Štěpánek, Jiří Havelka, Marie Mikulová, Zdeněk Žabokrtský, Magda Ševčíková-Razímová, Zdeňka Urešová |
LDC Catalog No.: | LDC2006T01 |
ISBN: | 1-58563-370-4 |
ISLRN: | 942-053-729-014-3 |
DOI: | https://doi.org/10.35111/e6p0-9s32 |
Release Date: | July 21, 2006 |
Member Year(s): | 2006 |
DCMI Type(s): | Text |
Data Source(s): | newswire, news magazine, journal articles |
Application(s): | parsing, language teaching, language modeling, information retrieval, information extraction, tagging |
Language(s): | Czech |
Language ID(s): | ces |
License(s): | Prague Dependency Treebank 2.0 |
Online Documentation: | LDC2006T01 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Hajič, Jan , et al. Prague Dependency Treebank 2.0 LDC2006T01. Web Download. Philadelphia: Linguistic Data Consortium, 2006. |
Related Works: | View |
Introduction
The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (two million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW) in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level.
PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well.
Samples
For an example of the data in this publication, please examine these samples.