Home › Language Resources › Data

ACE 2005 Mandarin SpatialML Annotations

Item Name:	ACE 2005 Mandarin SpatialML Annotations
Author(s):	Xiaoman Wang, Christine Doran, Janet Hitzeman, Inderjeet Mani
LDC Catalog No.:	LDC2010T09
ISBN:	1-58563-546-4
ISLRN:	951-452-048-245-8
DOI:	https://doi.org/10.35111/pkce-3b81
Release Date:	May 14, 2010
Member Year(s):	2010
DCMI Type(s):	Text
Data Source(s):	broadcast news
Project(s):	ACE
Application(s):	spatial analysis, automatic content extraction
Language(s):	Mandarin Chinese
Language ID(s):	cmn
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2010T09 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Wang, Xiaoman, et al. ACE 2005 Mandarin SpatialML Annotations LDC2010T09. Web Download. Philadelphia: Linguistic Data Consortium, 2010.
Related Works: Hide	View isAnnotationOf LDC2006T06 ACE 2005 Multilingual Training Corpus isSimilarWith LDC2008T03 ACE 2005 English SpatialML Annotations LDC2011T02 ACE 2005 English SpatialML Annotations Version 2 LDC2012T01 ModeS TimeBank 1.0 LDC2014T18 ACE 2007 Multilingual Training Corpus

Introduction

ACE 2005 Mandarin SpatialML Annotations was developed by researchers at The MITRE Corporation (MITRE). ACE 2005 Mandarin SpatialML Annotations applies SpatialML tags to a subset of the source Mandarin training data in ACE 2005 Multilingual Training Corpus (LDC2006T06). Annotations for entities, relations, and events, which were included in ACE 2005 Multilingual Training Corpus, are not included in the current SpatialML release. For SpatialML markup to ACE 2005 English data, see ACE 2005 English SpatialML Annotations (LDC2008T03).

SpatialML is a mark-up language for representing spatial expressions in natural language documents. SpatialML focuses is on geography and culturally-relevant landmarks, rather than biology, cosmology, geology, or other regions of the spatial language domain. The goal is to allow for better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases and mapping services.

The ACE (Automatic Content Extraction) Program seeks to develop extraction technology to support automatic processing of source language data (in the form of natural text, and as text derived from automatic speech recognition and optical character recognition). This includes classification, filtering, and selection based on the language content of the source data, i.e., based on the meaning conveyed by the data. Thus the ACE program requires the development of technologies that automatically detect and characterize this meaning. The annotation efforts of the ACE program supports the development of automatic content extraction technology to support automatic processing of human language in text form. The kind of information recognized and extracted from text includes entities, values, temporal expressions, relations and events

The SpatialML annotation scheme is intended to emulate earlier progress on time expressions such as TIMEX2, TimeML, and the 2005 ACE guidelines. The main SpatialML tag is the PLACE tag which encodes information about location. The central goal of SpatialML is to map location information in text to data from gazetteers and other databases to the extent possible by defining attributes in the PLACE tag. Therefore, semantic attributes such as country abbreviations, country subdivision and dependent area abbreviations (e.g., US states), and geo-coordinates are used to help establish such a mapping. LINK and PATH tags express relations between places, such as inclusion relations and trajectories of various kinds. Information in the tag along with the tagged location string should be sufficient to uniquely determine the mapping, when such a mapping is possible. This also means that redundant information is not included in the tag. To the extent possible, SpatialML leverages ISO and other standards towards the goal of making the scheme compatible with existing and future corpora. The SpatialML guidelines are compatible with existing guidelines for spatial annotation and existing corpora within the ACE research program.

Data

This corpus consists of a 298-document subset of broadcast material from the ACE 2005 Multilingual Training Corpus (LDC2006T06) that has been tagged by a native Mandarin speaker according to version 2.3 of the SpatialML annotation guidelines, which are included in the documentation for this release.

Updates

No updates have been issued at this time.

Copyright

Portions © 2000-2001 China Broadcasting System, © 2000-2001 China Central TV, © 2000-2001 China National Radio, © 2000-2001 China Television System, © 2008-2009 The MITRE Corporation, © 2005, 2006, 2010 Trustees of the University of Pennsylvania

ACE 2005 Mandarin SpatialML Annotations

Introduction

Data

Updates

Copyright

Available Media

View Fees