Home › Language Resources › Data

Samrómur Icelandic Speech 1.0

Item Name:	Samrómur Icelandic Speech 1.0
Author(s):	David Mollberg, Ólafur Helgi Jónsson, Sunneva Þorsteinsdóttir, Jóhanna Vigdís Guðmundsdóttir, Steinthor Steingrimsson, Eydis Huld Magnusdottir, Judy Fong, Michal Borsky, Jon Gudnason
LDC Catalog No.:	LDC2022S05
ISBN:	1-58563-991-5
ISLRN:	643-778-441-472-4
DOI:	https://doi.org/10.35111/thx3-f170
Release Date:	May 16, 2022
Member Year(s):	2022
DCMI Type(s):	Sound, Text
Sample Type:	flac
Sample Rate:	16000
Data Source(s):	web collection
Application(s):	speaker identification, speaker verification, speech recognition
Language(s):	Icelandic
Language ID(s):	isl
License(s):	Samrómur Icelandic Speech 1.0 Agreement (For-Profit) Samrómur Icelandic Speech 1.0 Agreement (Non-Member) Samrómur Icelandic Speech 1.0 Agreement (Not-For-Profit)
Online Documentation:	LDC2022S05 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Mollberg, David, et al. Samrómur Icelandic Speech 1.0 LDC2022S05. Web Download. Philadelphia: Linguistic Data Consortium, 2022.
Related Works: Hide	View relatesTo LDC2022S11 Samrómur Children Icelandic Speech 1.0 LDC2023S05 Samrómur Queries Icelandic Speech 1.0 LDC2023S07 LDC Spoken Language Sampler - Sixth Release LDC2024S12 Samrómur Synthetic

Introduction

Samrómur Icelandic Speech 1.0 was developed by the Language and Voice Lab, Reykjavik University in cooperation with Almannarómur, Center for Language Technology. The corpus contains 145 hours of Icelandic prompted speech from 8,392 speakers representing 100,000 utterances.

This version 1.0 is equivalent to "Samrómur Icelandic Speech 21.05" as used by the Language Technology Programme for Icelandic 2019-2023.

Data

Speech data was collected between October 2019 and May 2021 using the Samrómur website which displayed prompts to participants. The prompts were mainly from The Icelandic Gigaword Corpus, which includes text from novels, news, plays, and from a list of location names in Iceland. Additional prompts were taken from the Icelandic Web of Science and others were created by combining a name followed by a question or a demand. Prompts and speaker metadata are included in the corpus.

The audio data is divided into train, dev, and test sets and is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM.

Samrómur Icelandic Speech 1.0

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees