The SenSem (acronym for Sentence Semantics) Corpus for Spanish has been created as part of the SenSem Databank. Work on the Databank started in 2004 and continues in 2014. All the projects contributing to the Databank's creation have been funded by the Spanish Government through a variety of grants. The 1.0 version of this Corpus includes texts from journalistic sources (1st phase) and literary sources (2nd phase) and it contains over 30,000 sentences (25,075 from the journalistic and 5,299 from the literary). These sentences exemplify around one thousand different verb meanings. These verb meanings correspond to the 250 most frequent verbs in Spanish. The frequency of these verbs was retrieved from a quantitative analysis of around 13 million words. Sentences have been annotated with syntactic and semantic information. All of this results in a corpus of almost one million words, half of which are annotated. The main source of journalistic texts has been El Periódico de Catalunya, which agreed to collaborate with its texts for research purposes when this resource was created. A smaller percentage of journalistic sentences belong to La Vanguardia. The vast majority of sentences belonging to the literary register have been extracted from novels available on the Internet. These novels are, in all cases, contemporary (XX and XXI centuries) and have been written by peninsular Spanish authors. A complete list of the works used in the SenSem Corpus can be found at the end of this document. On occasions, the literary subcorpus CREA of the Real Academia Española has been used as a source of literary sentences. In the phrasal and sentence level, the different participants have been identified (differentiating between arguments and adjuncts). Although the annotation of the internal structure of phrases has not been carried out, the prototypical syntactic-semantic information of subcategorization patterns has been thoroughly codified: semantic roles, syntactic functions and phrasal categories. Each sentence has also been associated to a constructional meaning and has had its formal mechanism described according to Goldberg’s (1995) concept of construction. This is, precisely, one of the novelties of the SenSem Corpus with respect to others of the same or similar type. The inclusion of the constructional information is key to completing the description of subcategorization patterns in order to avoid their ambiguity. In addition to this, sentence aspectuality, polarity and modality have also been codified. These constitute valuable sources of information for the field of natural language processing (NLP). To start with, the lexical aspect (or Aktionsart) of verbs and phrases of sentences is useful in order to obtain the concatenation of events in the discourse, which is of interest in the question-answer interface. Similarly, differentiating between events and states is very useful, as well. In automatic generation, the appropriate selection of a lexical item in some languages depends, precisely, on this aspectual information. On the other hand, polarity and modality are two key elements in the interpretation of factuality. In the field of NLP the distinction between factive and non-factive events is crucial in order to tell which events described in texts are real and which are not (either because they have not taken place or because one does not know whether they have). A corpus with such diversified information that includes high-level semantics is of great interest in another, highly active, NLP field: the acquisition of information in order to create resources such as grammars and analyzers of different types. One can search the corpus online through the search engine available on the website: http://grial.uab.es/sensem/corpus. On this website, more specific documentation can be accessed, such as the definition of the terms used and their equivalence in other projects. All the publications related to this project can be consulted at http://grial.uab.es/publicacions.php. Appendix: Literary works YEAR AUTHOR TITLE WORDS 1902 Vicente Blasco Ibáñez Cañas y barro 74.990 1914 Miguel de Unamuno Niebla 54.000 1940 Ortega y Gasset Creer y pensar 5.000 1977 Alonso Zamora Sin levantar cabeza 3.315 1995 Enrique Cerdán Los ahorcados del cuarto menguante 13.456 1996 Arturo Pérez-Reverte Capitán Alatriste 5.000 2000 Rafael López Rivera El don 39.000