This file contains documentation on the Levantine Arabic QT Training Data Set 4 (Speech + Transcripts), Linguistic Data Consortium (LDC) catalog number LDC2005S14 and ISBN 1-58563-342-9.
This release contains 901 calls and the total speech is 133.6 hours of telephone conversation in Levantine Arabic. Both audio and transcription files are included in this package.
The majority of speakers in this corpus are Lebanese. The data is similar to the training data in Set 3 [LDC2005S07, speech and LDC2005T03, transcripts]. The dialects are distributed as follows:
- 171 JOR
- 1373 LEB
- 229 PAL
- 29 SYR
For an example of this corpus, please review this audio sample.
Portions © 2005 Trustees of the University of Pennsylvania