MASRI Synthetic
Item Name: | MASRI Synthetic |
Author(s): | Carlos Daniel Hernández Mena, Albert Gatt, Claudia Borg, Andrea DeMarco, Lonneke van der Plas |
LDC Catalog No.: | LDC2022S08 |
ISBN: | 1-58563-995-8 |
ISLRN: | 518-019-551-096-3 |
DOI: | https://doi.org/10.35111/wc8h-h752 |
Release Date: | September 15, 2022 |
Member Year(s): | 2022 |
DCMI Type(s): | Sound, Text |
Sample Type: | flac |
Sample Rate: | 16000 |
Data Source(s): | transcribed speech |
Application(s): | speech recognition |
Language(s): | Maltese |
Language ID(s): | mlt |
License(s): |
MASRI Synthetic Agreement |
Online Documentation: | LDC2022S08 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Hernández Mena, Carlos Daniel, et al. MASRI Synthetic LDC2022S08. Web Download. Philadelphia: Linguistic Data Consortium, 2022. |
Related Works: | View |
Introduction
MASRI (Maltese Automatic Speech Recognition I) Synthetic was developed by the MASRI team at the University of Malta and consists of approximately 99 hours of synthesized Maltese speech.
Data
Source sentences were extracted from the Maltese Language Resource Server (MLRS) corpus, comprised of written or transcribed Maltese covering various genres, including parliamentary debates, news, law, opinion, sports, culture, academic, literature and religious texts. Text was processed through the CrimsonWing text-to-speech system to generate speech files. Synthesized speech was created with 210 voices (105 male and 105 female).
Audio files are presented as 16kHz, 16-bit, single channel flac files. When uncompressed, they produce PCM wav files.
Transcripts are contained in a single plain text file encoded as UTF-8.
Samples
Please view the following samples:
Updates
None at this time.