Benchmarks for Open Relation Extraction

Item Name: Benchmarks for Open Relation Extraction
Author(s): Filipe Mesquita, Jordan Schmidek, Denilson Barbosa
LDC Catalog No.: LDC2014T27
ISBN: 1-58563-698-3
ISLRN: 911-510-844-212-7
Release Date: December 15, 2014
Member Year(s): 2014
DCMI Type(s): Text
Data Source(s): newswire, transcribed speech
Application(s): relation extraction
Language(s): English
Language ID(s): eng
License(s): Benchmarks for Open Relation Extraction
Online Documentation: LDC2014T27 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Mesquita, Filipe, Jordan Schmidek, and Denilson Barbosa. Benchmarks for Open Relation Extraction LDC2014T27. Web Download. Philadelphia: Linguistic Data Consortium, 2014.

 Introduction

Benchmarks for Open Relation Extraction was developed by the University of Alberta and contains annotations for approximately 14,000 sentences from The New York Times Annotated Corpus (LDC2008T19) and Treebank-3 (LDC99T42). This corpus was designed to contain benchmarks for the task of open relation extraction (ORE), along with sample extractions from ORE methods and evaluation scripts for computing a method's precision and recall.

ORE attempts to extract as many relations as described in a corpus without relying on relation-specific training data. The traditional approach to relation extraction requires substantial training effort for each relation of interest. That can be unpractical for massive collections such as found on the web. Open relation extraction offers an alternative by extracting unseen relations as they come. It does not require training data for any particular relation, making it suitable for applications that require a large (or even unknown) number of relations.

Results published in ORE literature are often not comparable due to the lack of reusable annotations and differences in evaluation methodology. The goal of this benchmark data set is to provide annotations that are flexible and can be used to evaluate a wide range of methods.

Data

Binary and n-ary relations were extracted from the text sources. Sentences were annotated for binary relations manually and automatically. In the manual sentence annotation, two entities and a trigger (a single token indicating a relation) were identified for the relation between them, if one existed. A window of tokens allowed to be in a relation was specified; that included modifiers of the trigger and prepositions connecting triggers to their arguments. For each sentence annotated with two entities, a system must extract a string representing the relation between them. The evaluation method deemed an extraction as correct if it contained the trigger and allowed tokens only. The automatic annotator identified pairs of entities and a trigger of the relation between them; the evaluation script for that experiment deemed an extraction correct if it contained the annotated trigger.

For n-ary relations, sentences were annotated with one relation trigger and all of its arguments. An extracted argument was deemed correct if it was annotated in the sentence.

This release also includes extractions from the following ORE methods: ReVerb, SONEX, OLLIE, PATTY, TreeKernel, SwiRL, Lund and EXEMPLAR. Evaluation scripts are also provided for computing a method's precision and recall.

Samples

Please view this sample.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee