Home › Language Resources › Data

Diaspora Tibetan Speech

Item Name:	Diaspora Tibetan Speech
Author(s):	Christopher Geissler, Sarah Babinski, Jason Shaw
LDC Catalog No.:	LDC2024S06
ISLRN:	883-684-044-738-1
DOI:	https://doi.org/10.35111/b8wr-w485
Release Date:	June 17, 2024
Member Year(s):	2024
DCMI Type(s):	Sound, Text
Sample Type:	pcm
Sample Rate:	16000
Data Source(s):	microphone speech
Application(s):	phonology, sociolinguistics
Language(s):	Tibetan
Language ID(s):	bod
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2024S06 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Geissler, Christopher, Sarah Babinski, and Jason Shaw. Diaspora Tibetan Speech LDC2024S06. Web Download. Philadelphia: Linguistic Data Consortium, 2024.

Introduction

Diaspora Tibetan Speech was developed at Yale University. It contains approximately 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker demographic information.

Data

Recordings were collected in 2016. All speakers were adults and varied in age as well as age of diaspora. A substantial number of speakers were born in Nepal.

Each speaker contributed one recording comprising a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations. All elicitation materials are included with the corpus documentation in PDF format.

The word- and number-list sections of the recordings were time aligned at the word level as Praat TextGrids. Five recordings were fully transcribed word-for-word by a native Tibetan speaker and are presented in both Microsoft Word and PDF format to preserve font encoding. The transcripts are not time-aligned but include general time stamps. Other transcripts are available as Excel spreadsheets with word-to-word correspondence of Tibetan script, phonetic transcription, and English translation.

Demographic information includes age at recording, age at diaspora, and other information.

The audio data is presented as single channel, 16 kHz, 16-bit wav files.

Sample

Please view the following samples:

Updates

None at this time.

Diaspora Tibetan Speech

Introduction

Data

Sample

Updates

Copyright

Available Media

View Fees