TITLE: Diaspora Tibetan Speech ------------------------------------------------- ------------------------------------------------- AUTHORS: Christopher Geissler, Sarah Babinski, Jason Shaw Yale University ------------------------------------------------- ------------------------------------------------- LANGUAGE: Diaspora Tibetan [bod] ------------------------------------------------- ------------------------------------------------- OVERVIEW: The Diaspora Tibetan Speech corpus consists of recordings and transcriptions of Diaspora Tibetan speakers collected by Christopher Geissler in 2016 in Kathmandu, Nepal. This work was funded by NSF Doctoral Dissertation Research Improvement Grant #1928750. The collection includes recordings from 73 speakers of Tibetan, all of whom are members of the diaspora Tibetan community in Kathmandu. Speakers vary in age as well as age of diaspora, and a substantial portion of speakers were born in Nepal. Each speaker contributes one recording, which comprises a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations. These elicitation materials are included with this deposit. The word- and number-list sections of these recordings have been time aligned at the word level as Praat TextGrids. Some of the recordings have additionally been fully transcribed word-for-word by a native Tibetan speaker, including transcription in the Tibetan orthography and as a naive phonetic transcription, and English translations. These transcripts are not time-aligned, but the transcriptions include general time stamps. ------------------------------------------------- ------------------------------------------------- USE OF CORPUS: We hope that these materials will be useful for both linguists and community members alike. This corpus provides documentation of Diaspora Tibetan in Kathmandu at a particular time in history and will be of interest to those studying Tibetan, especially its dialects and diaspora communities. ------------------------------------------------- ------------------------------------------------- COLLECTION INFORMATION: Recordings were collected in Kathmandu in 2016 from 73 speakers; demographic information is available in the file speaker-demographics.xlsx including age at recording, age at diaspora, and other information. ------------------------------------------------- ------------------------------------------------- FILE INFORMATION: Audio files are in WAV format and are named based on speaker codes (see speaker demographics file for these). Time-aligned transcripts are in Praat TextGrid format. Other transcripts are available as Excel spreadsheets with word-to-word correspondence of Tibetan script, phonetic transcription, and English translation. Tibetan script transcripts are also available as PDFs to preserve font encoding. Elicitation documents are in PDF format and are written in the Tibetan script.