Dialogs Re-Enacted Across Languages
Item Name: | Dialogs Re-Enacted Across Languages |
Author(s): | Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco |
LDC Catalog No.: | LDC2024S08 |
ISLRN: | 859-445-294-766-0 |
DOI: | https://doi.org/10.35111/2pac-j365 |
Release Date: | July 15, 2024 |
Member Year(s): | 2024 |
DCMI Type(s): | Sound |
Sample Type: | pcm |
Sample Rate: | 16000 |
Data Source(s): | microphone conversation, microphone speech |
Application(s): | discourse analysis, prosody, spoken dialogue modeling |
Language(s): | English, Spanish |
Language ID(s): | eng, spa |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2024S08 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Ward, Nigel G., et al. Dialogs Re-Enacted Across Languages LDC2024S08. Web Download. Philadelphia: Linguistic Data Consortium, 2024. |
Related Works: | View |
Introduction
Dialogs Re-Enacted Across Languages was developed at the University of Texas at El Paso. It contains approximately 17 hours of conversational speech in English and Spanish by 129 unique bilingual speakers, specifically, short fragments extracted from spontaneous conversations and close re-enactments in the other language by the original speakers, for 3816 pairs of matching utterances.
Data
Data was collected in 2022-2023. Participants were recruited from among students at the University of Texas at El Paso which is located on the US-Mexico border. All participants were bilingual speakers of General American English and of Mexico-Texas Border Spanish. Their self-described dialects for English were El Paso and for Spanish, mostly "El Paso/Juarez."
Each speaker pair had a ten minute conversation in one language. From these conversations, various fragments of the conversations were chosen for re-enactment, and the original speakers produced equivalents in the other language. Each re-enactment was vetted for fidelity to the original and naturalness in the target language.
After recording, fragments were mapped to the translated re-enactments using ELAN, an annotation tool for audio and video recordings.
Metadata about conversations, participants, re-enactments and utterances are included in this release.
The audio data is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM.
Samples
Please listen to the following samples:
Updates
None at this time.