TITLE: Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) [LDC2005S14] Authors: Mohamed Maamouri (Project head), Tim Buckwalter, Hubert Jin This release contains 901 calls and the total speech is 133.6 hours of telephone conversation in Levantine Arabic. Both audio and transcription files are included in this package. In this directory (docs), we included the following documents: 1) filelist A list of conversation IDs with prefix of 'fsa_'. 2) 901_wordlist.utf8 Wordlist and mapping table 3) speaker_info.txt Speaker information on origin, gender, age (group) etc, judged by the annotators who transcribed the conversations. Unlike the previous training data Set 1 [LDC2004E22] and 2 [LDC2004E66] which are nearly 100% dominated by Jordanian speakers, this corpus has 76% of the speakers being Lebanese, which is similar to the training data Set 3 [LDC2005T03]. Here is the break down of the dialects: 171 JOR 1373 LEB 229 PAL 29 SYR Directory structure annotation - 901 transcription files in the UTF-8 format. audio - 901 audio files in the sphere format. docs - documentation. Note: sph2pipe can be used to convert the sphere files to wave files. For more information, please search "sphere LDC" at google.com. For more information, please contact Mohamed Maamouri maamouri@ldc.upenn.edu Timbuck Water timbuck2@ldc.upenn.edu Hubert Jin hubertj@ldc.upenn.edu