README for BBN Pronoun Coreference and Entity Type Corpus Authors: Ralph Weischedel and Ada Brunstein LDC2005T33 1.0 Introduction This file contains documentation on the BBN Pronoun Coreference and Entity Type Corpus, Linguistic Data Consortium (LDC) catalog number LDC2005T33, ISBN 1-58563-362-3. This publication supplements the 1 million word Penn Treebank corpus of Wall Street Journal texts (LDC95T7). The corpus contains stand-off annotation of pronoun coreference, indicated by sentence and token numbers, as well as annotation of a variety of entity and numeric types. All annotation was done by hand at BBN using proprietary annotation tools. This corpus was developed by BBN to support the ACE and AQUAINT programs. 2.0 Data The corpus contains two components: 1) Pronoun coreference. Stand-off annotation of pronoun coreference of the WSJ corpus is provided in a single file. Pronouns and antecedents are indexed by sentence and token numbers. 2) Entity types. The corpus includes annotation of 12 named entity types (Person, Facility, Organization, GPE, Location, Nationality, Product, Event, Work of Art, Law, Language, and Contact-Info), nine nominal entity types (Person, Facility, Organization, GPE, Product, Plant, Animal, Substance, Disease and Game), and seven numeric types (Date, Time, Percent, Money, Quantity, Ordinal and Cardinal). Several of these types are further divided into subtypes. Annotation for a total of 64 subtypes is provided. 3.0 Structure The /data directory contains BBN-wsj-pronouns and WSJtypes-subtypes The /docs directory contains BBN-Types-Subtypes.html and README.txt 4.0 Copyright Portions (c) 1989 Wall Street Journal