ModeS TimeBank 1.0


Item Name: ModeS TimeBank 1.0
Authors: Marta Guerrero Nieto, Roser Sauri
LDC Catalog No.: LDC2012T01
ISBN: 1-58563-604-5
Release Date: Feb 15, 2012
Data Type: text
Data Source(s): journal entries
Application(s): information extraction, spatial analysis, temporal analysis
Language(s): Spanish
Language ID(s): spa
Distribution: Web Download
Member fee: $0 for 2012 members
Non-member Fee: US $0.00
Reduced-License Fee: US $0.00
Extra-Copy Fee: N/A
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Marta Guerrero Nieto, Roser Sauri
2012
ModeS TimeBank 1.0
Linguistic Data Consortium, Philadelphia

Introduction

ModeS TimeBank 1.0 was developed by researchers at Technical University of Madrid and Barcelona Media and is a corpus of Modern Spanish (17th and 18th centuries) annotated with temporal and event information according to TimeML mark-up and annotated with spatial information following the SpatialML scheme.

TimeML (Pustejovsky et al., 2005) is a specification language for annotating eventualities and time expressions in natural language as well as the temporal relations among them, thus facilitating the task of extraction, representation and exchange of temporal information. SpatialML (Mani et al., 2008) is a specification language for annotating and normalizing spatial expressions by means of geographic coordinates.

LDC has released the following corpora incorporating TimeML or SpatialML annotation: TimeBank 1.2 LDC2006T08, FactBank 1.0 LDC2009T23, ACE 2005 English SpatialML Annotations Version 2 LDC2011T02 and ACE 2005 Mandarin SpatialML Annotations LDC2010T09.

Data

ModeS TimeBank 1.0 contains 102 documents reporting a sea-crossing cruise by a ship called La Princesa, which took place from December 1768 to April 1769. There exist copious logbooks from that period that not only provide information about shipping routes, but also contain valuable data concerning information flows, commercial agents and social networks. The original corpus manuscript is preserved in the Archivo General de Indias (General Archive of the Indies) and is available online at the Portal de Archivos Espa?oles. This corpus was created within the framework of the DynCoopNet project (Dynamic Compatibility of Cooperation-Based Self-Organizing Networks in the First Global Age) which is focused on the study of trade network cooperation during the 15th-19th centuries and incorporates into its work maps, charts, databases and natural language documents.

All text is encoded in UTF-8. The data in ModeS TimeBank 1.0 has been tokenized, POS-tagged, and annotated with space, time and event information according to the TimeML and SpatialML specification schemes. More specifically, the entities annotated in the corpus are the following:

  • Events: (tag EVENT, from TimeML). These include finite and non-finite verbal constructions, nominalizations, nouns, adjectives and prepositional phrases.
  • Temporal expressions (tag TIMEX3, from TimeML). These includeg expressions of dates, times, durations and frequencies, both precise and vague.
  • Spatial expressions (tag PLACE, from SpatialML). These are used for proper and common nouns, adjectives, adverbs or spatial coordinates.

Samples

Please see the following links for examples of annotated and original texts.

Updates

None at this time.

Content Copyright

Portions 2012 Marta Guerrero Nieto, Roser Sauri, 2012 Trustees of the University of Pennsylvania