ModeS TimeBank 1.0 was developed by researchers at Technical University of Madrid and Barcelona Media and is a corpus of Modern Spanish (17th and 18th centuries) annotated with temporal and event information according to TimeML mark-up and annotated with spatial information following the SpatialML scheme.
TimeML (Pustejovsky et al., 2005) is a specification language for annotating eventualities and time expressions in natural language as well as the temporal relations among them, thus facilitating the task of extraction, representation and exchange of temporal information. SpatialML (Mani et al., 2008) is a specification language for annotating and normalizing spatial expressions by means of geographic coordinates.
LDC has released the following corpora incorporating TimeML or SpatialML annotation: TimeBank 1.2 LDC2006T08, FactBank 1.0 LDC2009T23, ACE 2005 English SpatialML Annotations Version 2 LDC2011T02 and ACE 2005 Mandarin SpatialML Annotations LDC2010T09.
ModeS TimeBank 1.0 contains 102 documents reporting a sea-crossing cruise by a ship called La Princesa, which took place from December 1768 to April 1769. There exist copious logbooks from that period that not only provide information about shipping routes, but also contain valuable data concerning information flows, commercial agents and social networks. The original corpus manuscript is preserved in the Archivo General de Indias (General Archive of the Indies) and is available online at the Portal de Archivos Espa?oles. This corpus was created within the framework of the DynCoopNet project (Dynamic Compatibility of Cooperation-Based Self-Organizing Networks in the First Global Age) which is focused on the study of trade network cooperation during the 15th-19th centuries and incorporates into its work maps, charts, databases and natural language documents.
All text is encoded in UTF-8. The data in ModeS TimeBank 1.0 has been tokenized, POS-tagged, and annotated with space, time and event information according to the TimeML and SpatialML specification schemes. More specifically, the entities annotated in the corpus are the following:
- Events: (tag EVENT, from TimeML). These include finite and non-finite verbal constructions, nominalizations, nouns, adjectives and prepositional phrases.
- Temporal expressions (tag TIMEX3, from TimeML). These includeg expressions of dates, times, durations and frequencies, both precise and vague.
- Spatial expressions (tag PLACE, from SpatialML). These are used for proper and common nouns, adjectives, adverbs or spatial coordinates.
Please see the following links for examples of annotated and original texts.
None at this time.
Portions © 2012 Marta Guerrero Nieto, Roser Sauri, © 2012 Trustees of the University of Pennsylvania