Levantine Arabic QT Training Data Set 3 Transcripts
		     (LDC2005T02) 

This corpus provides the transcription for the corresponding speech
corpus (LDC2005T02) from LDC.

This training speech release contains 322 conversations and the total 
speech is just over 50 hours of Levantine Arabic speech. 

In this directory (docs), we included the following documents:

   1) filelist

        A list of conversation IDs with prefix of 'fsa_'.

   2) wordlist.LA-TD3.utf8.txt

        Wordlist and mapping table

   3) speaker_info.txt

	Speaker information on origin, gender, age (group) etc, judged
        by the annotators who transcribed the conversations.

Unlike the previous training data corpora (Set 1 and 2) which are nearly 
100% dominated by Jordanian speakers, this corpus is mostly Lebanese (72%)
plus a combination of others Levantine speakers. 

Directory structure

    annotation - 322 transcription files in UTF-8 format. 
    docs       - documentation.
    
    Note: The audio (in sphere format) is released on a separate
          package (LDC2005S07).

For more information, please contact

    Mohamed Maamouri     maamouri@ldc.upenn.edu
    Timbuck Water        timbuck2@ldc.upenn.edu
    Hubert  Jin	         hubertj@ldc.upenn.edu