2008 CoNLL Shared Task Data

Item Name: 2008 CoNLL Shared Task Data
Author(s): Mihai Surdeanu, Richard Johansson, Lluis Marquez, Adam Meyers, Joakim Nivre
LDC Catalog No.: LDC2009T12
ISBN: 1-58563-505-7
ISLRN: 757-340-046-619-2
Release Date: May 22, 2009
Member Year(s): 2009
DCMI Type(s): Text
Data Source(s): newswire, news magazine
Application(s): natural language processing
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2009T12 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Surdeanu, Mihai, et al. 2008 CoNLL Shared Task Data LDC2009T12. Web Download. Philadelphia: Linguistic Data Consortium, 2009.

2008 CoNLL Shared Task Data, Linguistic Data Consortium (LDC) catalog number LDC2009T12 and isbn 1-58563-505-7, contains the the trial corpus, training corpus, development and test data for the 2008 CoNLL (Conference on Computational Natural Language Learning) Shared Task Evaluation. The 2008 Shared Task developed syntactic dependency annotations, including information such as named-entity boundaries and the semantic dependencies model roles of both verbal and nominal predicates. The materials in the Shared Task data consist of excerpts from the following corpora: Treebank-3 LDC99T42, BBN Pronoun Coreference and Entity Type Corpus LDC2005T33, Proposition Bank I LDC2004T14 (PropBank) and NomBank v 1.0 LDC2008T23.

The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. The 2004 and 2005 CoNLL shared tasks were dedicated to semantic role labeling (SRL) in a monolingual setting (English). In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies and used corpora from up to thirteen languages. The 2008 shared task employed a unified dependency-based formalism and merged the task of syntactic dependency parsing and the task of identifying semantic arguments and labeling them with semantic roles.

The 2008 shared task was divided into three subtasks:

  1. parsing syntactic dependencies
  2. identification and disambiguation of semantic predicates
  3. identification of arguments and assignment of semantic roles for each predicate

Several objectives were addressed in this shared task:

  • SRL was performed and evaluated using a dependency-based representation for both syntactic and semantic dependencies. While SRL on top of a dependency treebank has been addressed before, the approach of the 2008 Shared Task was characterized by the following novelties:
    1. The constituent-to-dependency conversion strategy transformed all annotated semantic arguments in PropBank and NomBank v 1.0, not just a subset;
    2. The annotations addressed propositions centered around both verbal (PropBank) and nominal (NomBank) predicates.
  • Based on the observation that a richer set of syntactic dependencies improves semantic processing, the syntactic dependencies modeled are more complex than the ones used in the previous CoNLL shared tasks. For example, the corpus includes apposition links, dependencies derived from named entity (NE) structures, and better modeling of long-distance grammatical relations.
  • A practical framework is provided for the joint learning of syntactic and semantic dependencies.

Due to the complexity of the 2008 shared task, only a single language, English, was used.


An example of the shared task annotations is provided below

