FactBank 1.0

Item Name: FactBank 1.0
Author(s): Roser Sauri, James Pustejovsky
LDC Catalog No.: LDC2009T23
ISBN: 1-58563-522-7
ISLRN: 293-001-569-603-0
DOI: https://doi.org/10.35111/tk5m-zw90
Release Date: September 15, 2009
Member Year(s): 2009
DCMI Type(s): Text
Data Source(s): newswire
Application(s): temporal analysis, question-answering
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2009T23 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Sauri, Roser, and James Pustejovsky. FactBank 1.0 LDC2009T23. Web Download. Philadelphia: Linguistic Data Consortium, 2009.
Related Works: View

Introduction

FactBank 1.0, Linguistic Data Consortium (LDC) catalog number LDC2009T23 and isbn 1-58563-522-7, consists of 208 documents (over 77,000 tokens) from newswire and broadcast news reports in which event mentions are annotated with their degree of factuality, that is, the degree to which they correspond to those events. FactBank 1.0 was built on top of TimeBank 1.2 and a fragment of the AQUAINT TimeML Corpus, both of which used the TimeML specification language. This resulted in a double-layered annotation of event factuality. TimeBank 1.2 and AQUAINT TimeML encode most of the basic structural elements expressing factuality information while FactBank 1.0 represents the resulting factuality interpretation. The combination of the factuality values in FactBank with the structural information in TimeML-annotated corpora facilitates the development of tools aimed at automatically identifying the factuality values of events, a component fundamental in tasks requiring some degree of text understanding, such as Textual Entailment, Question Answering, or Narrative Understanding.

FactBank annotations indicate whether the event mention describes actual situations in the world, situations that have not happened, or situations of uncertain interpretation. Event factuality is not an inherent feature of events but a matter of perspective. Different discourse participants may present divergent views about the factuality of the very same event. Consequently, in FactBank, the factuality degree of events is assigned relative to the relevant sources at play. In this way, it can adequately reflect the divergence of opinions regarding the factual status of events, as is common in news reports.

The annotation language is grounded on established linguistic analyses of the phenomenon, which facilitated the creation of a battery of discriminatory tests for distinguishing between factuality values. Furthermore, the annotation procedure was carefully designed and divided into basic, sequential annotation tasks. This made it possible for hard tasks to be built on top of simpler ones, while at the same time allowing annotators to become incrementally familiar with the complexity of the problem. As a result, FactBank annotation achieved a relatively high interannotation agreement, kappa=0.81, a positive result when considered against similar annotation efforts.

Data

All FactBank markup is standoff and is represented through a set of 20 tables which can be easily loaded into a database. Each table resides in an independent text file, where fields are separated by three consecutive bars (i.e., |||). The data in fields of string type are presented between simple quotations (').

Because FactBank 1.0 was built on top of TimeBank 1.2 and AQUAINT TimeML, both of which are marked up with inline XML-based annotation, this release contains the TimeBank 1.2 and AQUAINT TimeML annotation in standoff, table-based format as well.

Samples

The World is a co-production of Public Radio International and the British Broadcasting Corporation and is produced at WGBH Boston.

Available Media

View Fees





Login for the applicable fee