The ARRAU (Anaphora Resolution and Underspecification) Corpus of Anaphoric
Information was developed by the University
of Essex and the University of Trento.
It contains annotations of multi-genre English texts for anaphoric relations
with information about agreement and explicit representation of multiple antecedents
for ambiguous anaphoric expressions and discourse antecedents for expressions
which refer to abstract entities such as events, actions and plans.
The source texts in this release include task-oriented dialogues from the TRAINS-91
and TRAINS-93 corpora (the
latter released through LDC, TRAINS Spoken Dialog Corpus LDC95S25), narratives
from the English Pear Stories (a collection
of narratives by subjects who watched a film and then recounted its contents),
articles from the Wall Street Journal portions of the Penn
Treebank (Treebank-2 LDC95T7) and the RST Discourse Treebank LDC2002T07, and
the Vieira/Poesio Corpus which consists of training and test files from Treebank-2
and RST Discourse Treebank.
The texts were annotated using the ARRAU guidelines which treat all noun phrases
(NPs) as markables. Different semantic roles are recognized by distinguishing
between referring expressions (that update or refer to a discourse model), and
non-referring ones (including expletives, predicative expressions, quantifiers,
and coordination). A variety of linguistic features were also annotated, including
morphosyntactic agreement, grammatical function, semantic type (person, animate,
concrete, action, time, other abstract) and genericity. The annotation was carried
out using the MMAX2 annotation tool
which allows text units to be marked at different levels.
The files in MMAX format have been organized so that they can be visualized
using the MMAX2 tool or directly used as input/output for the BART
toolkit which performs automatic coreference resolution including all necessary
Please view the following samplesL
None at this time.
Portions © 1987-1989 Dow Jones & Company, Inc., © 2001 Dr. Mary
S. Erbaugh, © 2013 Massimo Poesio, © 1995, 1999, 2002, 2013 Trustees
of the University of Pennsylvania