Home › Language Resources › Data

Dialogs Re-Enacted Across Languages

Item Name:	Dialogs Re-Enacted Across Languages
Author(s):	Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco
LDC Catalog No.:	LDC2024S08
ISLRN:	859-445-294-766-0
DOI:	https://doi.org/10.35111/2pac-j365
Release Date:	July 15, 2024
Member Year(s):	2024
DCMI Type(s):	Sound
Sample Type:	pcm
Sample Rate:	16000
Data Source(s):	microphone conversation, microphone speech
Application(s):	discourse analysis, prosody, spoken dialogue modeling
Language(s):	English, Spanish
Language ID(s):	eng, spa
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2024S08 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Ward, Nigel G., et al. Dialogs Re-Enacted Across Languages LDC2024S08. Web Download. Philadelphia: Linguistic Data Consortium, 2024.
Related Works: Hide	View isCreatedBy ELAN

Introduction

Dialogs Re-Enacted Across Languages was developed at the University of Texas at El Paso. It contains approximately 17 hours of conversational speech in English and Spanish by 129 unique bilingual speakers, specifically, short fragments extracted from spontaneous conversations and close re-enactments in the other language by the original speakers, for 3816 pairs of matching utterances.

Data

Data was collected in 2022-2023. Participants were recruited from among students at the University of Texas at El Paso which is located on the US-Mexico border. All participants were bilingual speakers of General American English and of Mexico-Texas Border Spanish. Their self-described dialects for English were El Paso and for Spanish, mostly "El Paso/Juarez."

Each speaker pair had a ten minute conversation in one language. From these conversations, various fragments of the conversations were chosen for re-enactment, and the original speakers produced equivalents in the other language. Each re-enactment was vetted for fidelity to the original and naturalness in the target language.

After recording, fragments were mapped to the translated re-enactments using ELAN, an annotation tool for audio and video recordings.

Metadata about conversations, participants, re-enactments and utterances are included in this release.

The audio data is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM.

Dialogs Re-Enacted Across Languages

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees