Home › Language Resources › Data

Samrómur Synthetic

Item Name:	Samrómur Synthetic
Author(s):	Carlos Daniel Hernández Mena, Gunnar Thor Örnólfsson, Jon Gudnason
LDC Catalog No.:	LDC2024S12
ISLRN:	446-426-909-343-3
DOI:	https://doi.org/10.35111/4fam-5358
Release Date:	November 15, 2024
Member Year(s):	2024
DCMI Type(s):	Sound, Text
Sample Type:	flac
Sample Rate:	16000
Data Source(s):	transcribed speech
Application(s):	speech recognition
Language(s):	Icelandic
Language ID(s):	isl
License(s):	Samrómur Synthetic Agreement (For-Profit Member) Samrómur Synthetic Agreement (Non-Member) Samrómur Synthetic Agreement (Not-for-Profit)
Online Documentation:	LDC2024S12 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Hernández Mena, Carlos Daniel, Gunnar Örnólfsson, and Jon Gudnason. Samrómur Synthetic LDC2024S12. Web Download. Philadelphia: Linguistic Data Consortium, 2024.
Related Works: Hide	View isSimilarWith LDC2022S08 MASRI Synthetic relatesTo LDC2022S05 Samrómur Icelandic Speech 1.0 LDC2022S11 Samrómur Children Icelandic Speech 1.0 LDC2023S05 Samrómur Queries Icelandic Speech 1.0

Introduction

Samrómur Synthetic was developed by the Language and Voice Lab, Reykjavik University and contains 72 hours of Icelandic synthetic speech, transcripts and metadata.

Data

Source sentences were extracted from the Samrómur platform, comprised of texts and transcripts covering various genres. Text was processed through a text-to-speech system developed by Reykjavik University's Language and Voice Lab to generate speech files. Synthesized speech was created with 44 voices (22 male, 22 female) at four different speed rates for a total of 220 speakers and 62,700 utterances (with 285 sentences/speaker).

Audio data is divided by speaker and is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM. Transcripts and metadata are presented in .tsv format.

Samrómur Synthetic

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees