Frank Enos frank@cs.columbia.edu 20120311 This readme describes the audio files and associated transciptions, segmentations, and labelings of the Columbia Deception Corpus. For experiment-specific terminology and definitions, please refer to my dissertation. See /proj/speech/corpora/CDC/sub_stats_unix.txt for subect gender and interview length. See /proj/speech/projects/deception/columbia/README.features.txt for notes on existing feature sets and their manipulation. ================== File Convetions ================== Each subject directory contains the same set of files; in what appears below, I have taken Subject 10B as an example. For all subjects, left channel, denoted by _L_ in the filename, is the interviewer. The right channel, _R_, is the subject. Note the following conventions for Global Lie labeling: Global Lie valence and version of the pre-interview task for the given section appears before the colon (E.g. "T/H"; the section name appears after the colon (e.g. "INTERACTIVE"). Global Lie valence is indicated as: + T == Truth + LU == Lie Up (Subject claims better task performance than was actually acheived.) + LD == Lie Down (Subject claims worse task performance than was actually acheived.) Task version is indicated as: H == Hard E == Easy Sections are labeled by section name. For example, T/H:INTERACTIVE indicates that the subject is telling the truth based on having performed the "hard" version of the Interactive pre-task. ================ .wav ================ S-10B_L_16k.wav S-10B_R_16k.wav S-10B_LR_16k.wav Interviewer, subject, and combined stereo audio files, respectively. ================ .TextGrid ================ S-10B_L_16k.punc.TextGrid : Interviewer transcription with punctuation. S-10B_R_16k.punc.TextGrid : Subject transcription with punctuation. S-10B_BigLies.TextGrid : Indicates the topic and Global Lie valence of (typically long) intervals of the interview. S-10B_BigLiesSU.TextGrid : Indicates the topic and Global Lie valence of interview Slash Units. S-10B_BigLiePhrases.TextGrid : Indicates the topic and Global Lie valence of interview phrases (breath delimited). S-10B_pedal_hand_corr.TextGrid : Indicates Local Lie valence of segments of the interview; based on hand-corrected alignments of subject pedal presses. ================== .trs ================== S-10B_L_16k.punc.trs S-10B_R_16k.punc.trs Interviewer and subject versions of LDC Transcriber files used in producing hand transcriptions (correspond to the .punc.TextGrid files). ================== .ltf ================== S-10B.ltf "Lie Tracker File" File produced by the pedal-press reading Java program I wrote to record subject pedal presses; used to produce the Local Lie .TextGrid labels. ================== .ctm / .stm ================== (Note that these were produced by the SRI team; I never made direct use of these files.) S-10B_L_16k-wrd.ctm S-10B_R_16k-wrd.ctm Word based segmentations. S-10B_L_16k-wrd.stm S-10B_R_16k-wrd.stm Word based segmentations explicitly indicating inter-segment gaps. S-10B_L_16k-phn.stm S-10B_R_16k-phn.stm Phone-based segmentations explicitly indicating inter-segment gaps.