L2-KSU Native and Non-Native Arabic Speech

Item Name: L2-KSU Native and Non-Native Arabic Speech
Author(s): Norah Alrashoudi, Hend AlKhalifa, Yousef Ajami Alotaibi
LDC Catalog No.: LDC2024S11
ISLRN: 031-691-303-064-0
DOI: https://doi.org/10.35111/n3d8-t960
Release Date: September 16, 2024
Member Year(s): 2024
DCMI Type(s): Sound, Text
Sample Type: pcm
Sample Rate: 16000
Data Source(s): microphone speech
Application(s): speaker identification, speech recognition
Language(s): Standard Arabic, Arabic
Language ID(s): arb, ara
License(s): L2-KSU Native and Non-Native Arabic Speech Agreement
Online Documentation: LDC2024S11 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Alrashoudi, Norah, Hend AlKhalifa, and Yousef Alotaibi. L2-KSU Native and Non-Native Arabic Speech LDC2024S11. Web Download. Philadelphia: Linguistic Data Consortium, 2024.
Related Works: View

Introduction

L2-KSU Native and Non-Native Arabic Speech was developed by King Saud University (KSU) and contains approximately six hours of Modern Standard Arabic read speech from 80 subjects, along with transcripts and speaker metadata.

Data

The speech data was collected in 2022 from 40 native and 40 non-native speakers. Native speakers were from Saudi Arabia, Egypt, and Palestine. They provided audio recordings through the crowd sourcing platform Khamsat. Non-native speakers were Central and West African students enrolled in KSU's Arabic Linguistics Institute; they provided speech recordings on site. All subjects read a series of ten sentences, repeating each sentence multiple times.

Audio is presented as 16-bit 16 kHz wav files. Transcript files in UTF-8 plain text, speaker metadata, and the Arabic sentences with transliteration, English translation and IPA transcription are also included in the documentation accompanying this release.

Samples

Please view these samples:

Updates

None at this time.

 

Available Media

View Fees





Login for the applicable fee