2008 CoNLL Shared Task Data
Item Name: | 2008 CoNLL Shared Task Data |
Author(s): | Mihai Surdeanu, Richard Johansson, Lluis Marquez, Adam Meyers, Joakim Nivre |
LDC Catalog No.: | LDC2009T12 |
ISBN: | 1-58563-505-7 |
ISLRN: | 757-340-046-619-2 |
DOI: | https://doi.org/10.35111/mad1-yd84 |
Release Date: | May 22, 2009 |
Member Year(s): | 2009 |
DCMI Type(s): | Text |
Data Source(s): | newswire, news magazine |
Application(s): | natural language processing |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2009T12 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Surdeanu, Mihai, et al. 2008 CoNLL Shared Task Data LDC2009T12. Web Download. Philadelphia: Linguistic Data Consortium, 2009. |
Related Works: | View |
Introduction
2008 CoNLL Shared Task Data, Linguistic Data Consortium (LDC) catalog number LDC2009T12 and isbn 1-58563-505-7, contains the the trial corpus, training corpus, development and test data for the 2008 CoNLL (Conference on Computational Natural Language Learning) Shared Task Evaluation. The 2008 Shared Task developed syntactic dependency annotations, including information such as named-entity boundaries and the semantic dependencies model roles of both verbal and nominal predicates. The materials in the Shared Task data consist of excerpts from the following corpora: Treebank-3 LDC99T42, BBN Pronoun Coreference and Entity Type Corpus LDC2005T33, Proposition Bank I LDC2004T14 (PropBank) and NomBank v 1.0 LDC2008T23.
The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. The 2004 and 2005 CoNLL shared tasks were dedicated to semantic role labeling (SRL) in a monolingual setting (English). In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies and used corpora from up to thirteen languages. The 2008 shared task employed a unified dependency-based formalism and merged the task of syntactic dependency parsing and the task of identifying semantic arguments and labeling them with semantic roles.
LDC has also released the following CoNLL Shared Task data sets:
- 2006 CoNLL Shared Task - Ten Languages (LDC2015T11)
- 2006 CoNLL Shared Task - Arabic & Czech (LDC2015T12)
- 2009 CoNLL Shared Task Part 1 (LDC2012T03)
- 2009 CoNLL Shared Task Part 2 (LDC2012T04)
- 2015-2016 CoNLL Shared Task (LDC2017T13)
Data
The 2008 shared task was divided into three subtasks:
- parsing syntactic dependencies
- identification and disambiguation of semantic predicates
- identification of arguments and assignment of semantic roles for each predicate
Several objectives were addressed in this shared task:
- SRL was performed and evaluated using a dependency-based representation for both syntactic and semantic dependencies. While SRL on top of a dependency treebank has been addressed before, the approach of the 2008 Shared Task was characterized by the following novelties:
- The constituent-to-dependency conversion strategy transformed all annotated semantic arguments in PropBank and NomBank v 1.0, not just a subset;
- The annotations addressed propositions centered around both verbal (PropBank) and nominal (NomBank) predicates.
- Based on the observation that a richer set of syntactic dependencies improves semantic processing, the syntactic dependencies modeled are more complex than the ones used in the previous CoNLL shared tasks. For example, the corpus includes apposition links, dependencies derived from named entity (NE) structures, and better modeling of long-distance grammatical relations.
- A practical framework is provided for the joint learning of syntactic and semantic dependencies.
Due to the complexity of the 2008 shared task, only a single language, English, was used.
Samples
An example of the shared task annotations is provided below