TITLE:    Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
		     [LDC2005S14] 


Authors: Mohamed Maamouri (Project head), Tim Buckwalter, Hubert Jin

This release contains 901 calls and the total speech is 133.6 
hours of telephone conversation in Levantine Arabic. Both audio 
and transcription files are included in this package.

In this directory (docs), we included the following documents:

   1) filelist

        A list of conversation IDs with prefix of 'fsa_'.

   2) 901_wordlist.utf8

        Wordlist and mapping table

   3) speaker_info.txt

	Speaker information on origin, gender, age (group) etc, judged
        by the annotators who transcribed the conversations.

Unlike the previous training data Set 1 [LDC2004E22] and 2 [LDC2004E66]
which are nearly 100% dominated by Jordanian speakers, this corpus has 
76% of the speakers being Lebanese, which is similar to the training 
data Set 3 [LDC2005T03]. Here is the break down of the dialects:

    171 JOR
   1373 LEB
    229 PAL
     29 SYR

Directory structure

    annotation - 901 transcription files in the UTF-8 format.
    audio      - 901 audio files in the sphere format.
    docs       - documentation.

    Note: sph2pipe can be used to convert the sphere files to wave files.
          For more information, please search "sphere LDC" at google.com.
    
For more information, please contact

    Mohamed Maamouri     maamouri@ldc.upenn.edu
    Timbuck Water        timbuck2@ldc.upenn.edu
    Hubert  Jin	         hubertj@ldc.upenn.edu