TAC KBP Reference Knowledge Base
Item Name: | TAC KBP Reference Knowledge Base |
Author(s): | Heather Simpson, Joe Ellis, Robert Parker, Stephanie Strassel |
LDC Catalog No.: | LDC2014T16 |
ISBN: | 1-58563-685-1 |
ISLRN: | 043-495-621-872-3 |
DOI: | https://doi.org/10.35111/4yac-wb16 |
Release Date: | August 15, 2014 |
Member Year(s): | 2014 |
DCMI Type(s): | Text |
Data Source(s): | web collection |
Project(s): | TAC |
Application(s): | information extraction, knowledge base population, knowledge representation |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2014T16 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Simpson, Heather, et al. TAC KBP Reference Knowledge Base LDC2014T16. Web Download. Philadelphia: Linguistic Data Consortium, 2014. |
Related Works: | View |
Introduction
TAC KBP Reference Knowledge Base was developed by the Linguistic Data Consortium (LDC) in support of the NIST-sponsored TAC-KBP evaluation series. It is a knowledge base built from English Wikipedia articles and their associated infoboxes and covers over 800,000 entities. LDC also released TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 (LDC2016T26.)
TAC (Text Analysis Conference) is a series of workshops organized by NIST (the National Institute of Standards and Technology) to encourage research in natural language processing and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. TAC's KBP track (Knowledge Base Population) encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base.
Consult the LDC TAC-KBP project page for further information about LDC's resource development for the TAC-KBP program.
Data
The source data (Wikipedia infoboxes and articles) was taken from an October 2008 snapshot of Wikipedia.
TAC KBP Reference Knowledge Base contains a set of entities, each with a canonical name and title for the Wikipedia page, an entity type, an automatically parsed version of the data from the infobox in the entity's Wikipedia article, and a stripped version of the text of the Wiki article. Each entity is assigned one of four types: PER (person), ORG (organization), GPE (geo-political entity) and UKN (unknown).
All data files are presented as UTF-8 encoded XML.
Samples
Please view the following sample.
Updates
None at this time.