Spoken Digits in Hindi and Indian English
|Item Name:||Spoken Digits in Hindi and Indian English|
|Author(s):||Basabdatta Sen Bhattacharya, Aiswarya Subramanian, Purbayan Chatterjee, Sounak Dey|
|LDC Catalog No.:||LDC2022S03|
|Release Date:||February 15, 2022|
|Data Source(s):||field recordings, microphone conversation, web collection|
|Application(s):||language identification, machine translation, speech recognition|
|Language ID(s):||eng, hin|
Spoken Digits in Hindi and Indian English Agreement
|Online Documentation:||LDC2022S03 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Bhattacharya, Basabdatta Sen, et al. Spoken Digits in Hindi and Indian English LDC2022S03. Web Download. Philadelphia: Linguistic Data Consortium, 2022.|
Spoken Digits in Hindi and Indian English was developed by the Birla Institute of Technology and Science Pilani. It contains approximately two hours of speech comprised of spoken digits from one to ten in Hindi and English with regional accents from across India.
The speech data was collected as follows: in person, on a mobile handset recorder app; via one-to-one online communications over social apps; and from social media sites. Each audio file represents a single spoken digit in either Hindi or Indian English. Background noise was mostly retained. Some data was recorded in a noise-free environment or cleaned after recording to avoid abrupt noises such as car horns.
The audio data is organized by number, language and gender. The gender breakdown for speakers is 17% female, 27% male, and 56% unspecified.
A Google Colab Notebook file which can be used for basic functionalities such as removing noise or unwanted spaces is also included in this release.
All audio data is presented as single channel 16-bit 16kHz flac compressed linear PCM.
Please view these samples:
None at this time.