Home › Language Resources › Data

MASRI Synthetic

Item Name:	MASRI Synthetic
Author(s):	Carlos Daniel Hernández Mena, Albert Gatt, Claudia Borg, Andrea DeMarco, Lonneke van der Plas
LDC Catalog No.:	LDC2022S08
ISBN:	1-58563-995-8
ISLRN:	518-019-551-096-3
DOI:	https://doi.org/10.35111/wc8h-h752
Release Date:	September 15, 2022
Member Year(s):	2022
DCMI Type(s):	Sound, Text
Sample Type:	flac
Sample Rate:	16000
Data Source(s):	transcribed speech
Application(s):	speech recognition
Language(s):	Maltese
Language ID(s):	mlt
License(s):	MASRI Synthetic Agreement
Online Documentation:	LDC2022S08 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Hernández Mena, Carlos Daniel, et al. MASRI Synthetic LDC2022S08. Web Download. Philadelphia: Linguistic Data Consortium, 2022.
Related Works: Hide	View isSimilarWith LDC2024S12 Samrómur Synthetic relatesTo LDC2023S07 LDC Spoken Language Sampler - Sixth Release

Introduction

MASRI (Maltese Automatic Speech Recognition I) Synthetic was developed by the MASRI team at the University of Malta and consists of approximately 99 hours of synthesized Maltese speech.

Data

Source sentences were extracted from the Maltese Language Resource Server (MLRS) corpus, comprised of written or transcribed Maltese covering various genres, including parliamentary debates, news, law, opinion, sports, culture, academic, literature and religious texts. Text was processed through the CrimsonWing text-to-speech system to generate speech files. Synthesized speech was created with 210 voices (105 male and 105 female).

Audio files are presented as 16kHz, 16-bit, single channel flac files. When uncompressed, they produce PCM wav files.

Transcripts are contained in a single plain text file encoded as UTF-8.

Samples

Please view the following samples:

Updates

None at this time.

MASRI Synthetic

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees