Home › Language Resources › Data

TAC KBP Reference Knowledge Base

Item Name:	TAC KBP Reference Knowledge Base
Author(s):	Heather Simpson, Joe Ellis, Robert Parker, Stephanie Strassel
LDC Catalog No.:	LDC2014T16
ISBN:	1-58563-685-1
ISLRN:	043-495-621-872-3
DOI:	https://doi.org/10.35111/4yac-wb16
Release Date:	August 15, 2014
Member Year(s):	2014
DCMI Type(s):	Text
Data Source(s):	web collection
Project(s):	TAC
Application(s):	information extraction, knowledge base population, knowledge representation
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2014T16 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Simpson, Heather, et al. TAC KBP Reference Knowledge Base LDC2014T16. Web Download. Philadelphia: Linguistic Data Consortium, 2014.
Related Works: Hide	View isAnnotationOf LDC2017T17 TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 hasAnnotation LDC2016T26 TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 LDC2018T22 TAC KBP English Regular Slot Filling - Comprehensive Training and Evaluation Data 2009-2014 LDC2020T08 TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 LDC2021T06 TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 LDC2021T08 TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014 hasOutcome LDC2018T16 TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009-2013 isSimilarWith LDC2019T02 TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015

Introduction

TAC KBP Reference Knowledge Base was developed by the Linguistic Data Consortium (LDC) in support of the NIST-sponsored TAC-KBP evaluation series. It is a knowledge base built from English Wikipedia articles and their associated infoboxes and covers over 800,000 entities. LDC also released TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 (LDC2016T26.)

TAC (Text Analysis Conference) is a series of workshops organized by NIST (the National Institute of Standards and Technology) to encourage research in natural language processing and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. TAC's KBP track (Knowledge Base Population) encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base.

Consult the LDC TAC-KBP project page for further information about LDC's resource development for the TAC-KBP program.

Data

The source data (Wikipedia infoboxes and articles) was taken from an October 2008 snapshot of Wikipedia.

TAC KBP Reference Knowledge Base contains a set of entities, each with a canonical name and title for the Wikipedia page, an entity type, an automatically parsed version of the data from the infobox in the entity's Wikipedia article, and a stripped version of the text of the Wiki article. Each entity is assigned one of four types: PER (person), ORG (organization), GPE (geo-political entity) and UKN (unknown).

All data files are presented as UTF-8 encoded XML.

Samples

Please view the following sample.

Updates

None at this time.

TAC KBP Reference Knowledge Base

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees