TREC Spanish


Item Name: TREC Spanish
Authors: Willie Rogers
LDC Catalog No.: LDC2000T51
ISBN: 1-58563-177-9
Data Type: text
Data Source(s): newswire
Project(s): GALE, TIDES, TREC
Application(s): information retrieval
Language(s): Spanish
Language ID(s): spa
Distribution: Web Download
Member fee: $0 for 2000 members
Non-member Fee: US $500.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Willie Rogers
2000
TREC Spanish
Linguistic Data Consortium, Philadelphia

Introduction

This publication contains the TREC Spanish Corpus produced by the Linguistic Data Consortium (LDC) catalog number LDC2000T51, ISBN 1-58563-177-9. This is the set of documents used for the Spanish task in TRECs 3-5. It consists of approximately 250 megabytes of the Mexican newspaper El Norte and 300 megabytes of Agence France Presse 1994 newswire text formatted to include TREC document IDs. The El Norte documents were used for TRECs 3-4 and the Agence France Presse documents were used for TREC 5. The topics (questions) and relevance judgments (right answers) that complete the test collections can be downloaded from the TREC web site in the Data/Non-English section.

Data

Please look at file.tbl for the directory structure of this publication, as well as a complete list of files.

The files in the afp_text and infosel_data subdirectories are ASCII encoded SGML files that conform to the afp_trec.dtd and infosel.dtd files found in the doc subdirectory.

Updates

There are no updates at this time.

Copyright