Portland Cellular Corpus Release 1.3 Center for Spoken Language Understanding UPDATED: 23 August 2002 All phonetic transcribtion (files with extention wrd), were moved to labels directory. Small modification to readme.txt and file structure. This document describes the file naming conventions used for this distribution and gives a brief description of the various file formats used. File Naming Convention ---------------------- File naming follows the following convention: CE-150.spelllastname.txt The first field ("CE") is the prefix indicating the corpus to which this data belongs, and the second field ("150") represents a unique ID number for the speaker. The third field is an identifier indicating the prompt for this particular utterance. Please see the protocol section of overview.txt for information on the mapping of these identifiers. These files are subdivided into directories based on their call number divided by 10. So, the files for call 103 could be found in the /10 subdirectory. The /trans and /labels directory file structures exactly parallel the structure of the /speech directory. File Formats ------------ The data was captured digitally from the CSLU T1 connection, and saved as 8 khz 8-bit ulaw. These files have been converted to the RIFF standard file format. This file format is 16-bit linearly encoded. Transcriptions -------------- The text transcriptions were performed according to the non time-aligned word-level conventions described in the CSLU Labeling Guide. This document is available at the CSLU web site. Phonetic transcriptions are plain text files that carry time-aligned phonetic labels. The first two lines of the file are a header which defines the length of a "frame" in milliseconds. The rest of the files consists of two numbers that define a frame range, and a label that applies to that region. For example: MillisecondsPerFrame: 1.000000 END OF HEADER 2 113 .pau 113 191 w 191 267 ^ 267 395 n So, we can see here that a frame corresponds to 1 millisecond (ms) of time, and that from 2 to 113 ms into the file, there is a pause (.pau), with the first phoneme (w) starting at 113 ms and stretching to 191 ms. The text transcriptions were performed according to the non time-aligned word-level conventions described in the CSLU Labeling Guide. Phonetic transcriptions are plain text files that carry time-aligned phonetic labels. The first two lines of the file are a header which defines the length of a "frame" in milliseconds. The rest of the files consists of two numbers that define a frame range, and a label that applies to that region. For example: MillisecondsPerFrame: 1.000000 END OF HEADER 2 113 .pau 113 191 w 191 267 ^ 267 395 n So, we can see here that a frame corresponds to 1 millisecond (ms) of time, and that from 2 to 113 ms into the file, there is a pause (.pau), with the first phoneme (w) starting at 113 ms and stretching to 191 ms. The word-level transcription files follow the same format, with word labels in place of the phonetic labels. The .com files that are found with the .wrd files contain information about breathing during the speech. They are in a similar time-aligned format. Labels ------ The lola files are ASCII "location and label" files. They are similar to the ".phn" files of the TIMIT database except: 1) the locations are given in a unit of time other than the sample. 2) there is a short header saying what this unit is Each file in this distribution has the header: MillisecondsPerFrame: 3.0 END OF HEADER After that are a series of lines, one per segment, of the form label For example 200 237 ah 237 289 m The [ah] segment extends from from 200 to frame 236 inclusive. The end label is 237 for historical reasons. The lola files have the extension ".ptlola"