README FOR THE FISHER SPANISH 2006 CORPUS Speech and Transcripts This corpus contains 819 telephone conversations of 10 to 12 minutes in duration, with 136 speakers participating. The collection was done by the LDC, using a robot operator system in Philadelphia, PA. Native speakers of Latin American Spanish were recruited from the within the domestic U.S. and Puerto Rico. A full orthographic transcript is provided for each conversation, along with the original digital audio, in the form of 2-channel mu-law sample data with 8000 samples per second (as captured from the public telephone network). The audio files are in NIST SPHERE format (1024-byte ASCII file headers). The transcript files are in plain-text, tab-delimited format (tdf), with UTF-8 character encoding. These files were created by the LDC-developed transcription tool called "xtrans", which is available from the LDC: http://www.ldc.upenn.edu/tools/XTrans/ The first line of each transcript file provides the column headings; the next two lines are "comments" that can be ignored (these are used by xtrans; they are distinguished from non-comment lines by having an initial semicolon ";"). Actual transcript data, with time stamps, channel number, transcript text and additional information, begins at line 4 of each transcript file. The "doc" directory contains comma-delimited tables providing information about the calls and the subjects (speakers). There is also a document file (in Microsoft Word format) that explains the transcription conventions used for this corpus.