BBN Pronoun Coreference and Entity Type Corpus

Item Name: BBN Pronoun Coreference and Entity Type Corpus
Author(s): Ralph Weischedel, Ada Brunstein
LDC Catalog No.: LDC2005T33
ISBN: 1-58563-362-3
ISLRN: 375-520-999-436-0
DOI: https://doi.org/10.35111/9fx9-gz10
Release Date: September 20, 2005
Member Year(s): 2005
DCMI Type(s): Text
Project(s): ACE, AQUAINT, GALE, TIDES
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2005T33 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Weischedel, Ralph, and Ada Brunstein. BBN Pronoun Coreference and Entity Type Corpus LDC2005T33. Web Download. Philadelphia: Linguistic Data Consortium, 2005.
Related Works: View

Introduction

BBN Pronoun Coreference and Entity Type Corpus was developed by BBN Technologies (BBN) and contains approximately 24,000 pronoun coreferences as well as entity and numeric annotation for approximately 2,300 documents.

This publication supplements the one million words of Wall Street Journal texts in Penn's Treebank-2 (LDC95T7). The corpus contains stand-off annotation of pronoun coreference, indicated by sentence and token numbers, as well as annotation of a variety of entity and numeric types. All annotation was done by hand at BBN using proprietary annotation tools. This corpus was developed by BBN to support the ACE and AQUAINT programs.

Data

The corpus contains two components:

  • Pronoun coreference: Stand-off annotation of pronoun coreference of the WSJ corpus is provided in a single file. Pronouns and antecedents are indexed by sentence and token numbers.
  • Entity types: The corpus includes annotation of 12 named entity types (Person, Facility, Organization, GPE, Location, Nationality, Product, Event, Work of Art, Law, Language, and Contact-Info), nine nominal entity types (Person, Facility, Organization, GPE, Product, Plant, Animal, Substance, Disease and Game), and seven numeric types (Date, Time, Percent, Money, Quantity, Ordinal and Cardinal). Several of these types are further divided into subtypes. Annotation for a total of 64 subtypes is provided.

Samples

For an example of the data in this corpus, please examing the following samples:

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee