Diaspora Tibetan Speech
Item Name: | Diaspora Tibetan Speech |
Author(s): | Christopher Geissler, Sarah Babinski, Jason Shaw |
LDC Catalog No.: | LDC2024S06 |
ISLRN: | 883-684-044-738-1 |
DOI: | https://doi.org/10.35111/b8wr-w485 |
Release Date: | June 17, 2024 |
Member Year(s): | 2024 |
DCMI Type(s): | Sound, Text |
Sample Type: | pcm |
Sample Rate: | 16000 |
Data Source(s): | microphone speech |
Application(s): | phonology, sociolinguistics |
Language(s): | Tibetan |
Language ID(s): | bod |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2024S06 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Geissler, Christopher, Sarah Babinski, and Jason Shaw. Diaspora Tibetan Speech LDC2024S06. Web Download. Philadelphia: Linguistic Data Consortium, 2024. |
Introduction
Diaspora Tibetan Speech was developed at Yale University. It contains approximately 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker demographic information.
Data
Recordings were collected in 2016. All speakers were adults and varied in age as well as age of diaspora. A substantial number of speakers were born in Nepal.
Each speaker contributed one recording comprising a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations. All elicitation materials are included with the corpus documentation in PDF format.
The word- and number-list sections of the recordings were time aligned at the word level as Praat TextGrids. Five recordings were fully transcribed word-for-word by a native Tibetan speaker and are presented in both Microsoft Word and PDF format to preserve font encoding. The transcripts are not time-aligned but include general time stamps. Other transcripts are available as Excel spreadsheets with word-to-word correspondence of Tibetan script, phonetic transcription, and English translation.
Demographic information includes age at recording, age at diaspora, and other information.
The audio data is presented as single channel, 16 kHz, 16-bit wav files.
Sample
Please view the following samples:
Updates
None at this time.