AnnoDIFP CTS Audio and Transcripts LDC2025S10 March 4, 2025 Linguistic Data Consortium 1. Overview =========== AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) was created by the Linguistic Data Consortium (LDC), Florida Institute of Technology (FIT), and University of New Haven (UNH) to support development of algorithms for prediction of personality traits. It consists of audio recordings from both in person interviews and conversational telephone speech collections paired with scores from two self-reported personality assessments – HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3). This release contains audio data and transcripts from the conversational telephone speech (CTS) collection for the AnnoDIFP project, comprising 1,179 calls from 327 participants (total call duration: 242.52 hours). More information about the corpus design, collection, protocol, processing, and annotation is provided in the file "docs/annodifp_collection_doc.pdf". 2. Directory Struture ===================== - data//flac/ -- FLAC from all callsides that participant PARTICIPANT-ID was on - data//transcripts/ -- transcripts for all participant PARTICIPANT-ID callsides - docs/annodifp_collection_doc.pdf -- detailed description of corpus design, collection, protocols, processing, and annotation - docs/scores.tbl -- ground truth scores for participants - docs/calls.tbl -- mapping between calls and callsides - docs/file.tbl -- listing of md5 checksums, sizes, dates, and file names - README.txt -- this file 3. File naming convention ========================= Data files (FLAC, transcripts) are stored separately for each callside and named according to the following convention: _