Home › Language Resources › Data

2007 CoNLL Shared Task - Greek, Hungarian & Italian

Item Name:	2007 CoNLL Shared Task - Greek, Hungarian & Italian
Author(s):	Dipartimento di Informatica of the University of Pisa, ILC-CNR, Institute for Language and Speech Processing, Institute of Informatics at the University of Szeged, Institute of Linguistics at the Hungarian Academy of Sciences, Morphologic Ltd.
LDC Catalog No.:	LDC2018T07
ISBN:	1-58563-828-5
ISLRN:	270-733-242-642-3
DOI:	https://doi.org/10.35111/f18m-s394
Release Date:	January 18, 2018
Member Year(s):	2018
DCMI Type(s):	Text
Data Source(s):	web collection, newswire, news magazine
Project(s):	CoNLL
Application(s):	syntactic parsing
Language(s):	Modern Greek (1453-), Hungarian, Italian
Language ID(s):	ell, hun, ita
License(s):	2007 CoNLL Shared Task – Greek, Hungarian & Italian Agreement
Online Documentation:	LDC2018T07 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Dipartimento di Informatica of the University of Pisa, et al. 2007 CoNLL Shared Task - Greek, Hungarian & Italian LDC2018T07. Web Download. Philadelphia: Linguistic Data Consortium, 2018.
Related Works: Hide	View isSameAs ELRA-W0122 http://catalog.elra.info/en-us/repository/browse/ELRA-W0122 isAnnotationOf Greek Dependency Treebank http://gdt.ilsp.gr/ ISST-CoNLL http://medialab.di.unipi.it/isst/ The Szeged Treebank (SzTB) http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html isContinuationOf LDC2018T06 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish hasContinuation LDC2018T08 2007 CoNLL Shared Task - Arabic & English isSimilarWith LDC2009T12 2008 CoNLL Shared Task Data LDC2012T03 2009 CoNLL Shared Task Part 1 LDC2012T04 2009 CoNLL Shared Task Part 2 LDC2015T11 2006 CoNLL Shared Task - Ten Languages LDC2015T12 2006 CoNLL Shared Task - Arabic & Czech LDC2017T13 2015-2016 CoNLL Shared Task

Introduction

2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Greek, Hungarian and Italian.

LDC also released the following 2006 & 2007 CoNLL Shared Task corpora:

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish (LDC2018T06)
2007 CoNLL Shared Task - Arabic & English (LDC2018T08)
2006 CoNLL Shared Task - Ten Languages (LDC2015T11)
2006 CoNLL Shared Task - 2006 CoNLL Shared Task - Arabic & Czech (LDC2015T12)

This corpus is cross listed and jointly released with ELRA as ELRA-W0122.

The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. The 2007 shared task added a domain adaptation track for English in addition to the multilingual track. More information about the 2007 shared task is available at the CoNLL Previous Tasks web site.

LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data (LDC2009T12) contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 (LDC2012T03 and LDC2012T04) consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets. 2015-2016 CoNLL Shared Task (LDC2017T13) contains Chinese and English resources used in the 2015 and 2016 shared tasks on dependency parsing.

Data

The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. The Penn Treebank (LDC99T42) is an example of a constituency or phrase structure approach. All of the data sets in this release are dependency treebanks.

The individual data sets are:

Samples

Please view these samples:

Updates

None at this time.

Copyright

Portions © 2007 Dipartimento di Informatica of the University of Pisa, © 2007 ILC-CNR, © 2005-2007 Institute for Language and Speech Processing, © 2000-2007 Institute of Informatics at the University of Szeged, Hungary, © 2000-2007 Institute of Linguistics at the Hungarian Academy of Sciences, © 2000-2007 Morphologic Ltd. Budapest, © 2018 Trustees of the University of Pennsylvania