Malto Speech and Transcripts

LDC2012S04

Introduction

Malto Speech and Transcripts, Linguistic Data Consortium (LDC) catalog number LDC2012S04 and ISBN 1-58563-606-1, was developed by Masato Kobayashi, Associate Professor in Linguistics at the University of Tokyo (Japan), and Bablu Tirkey, research scholar at the Tribal and Regional Languages Department, Ranchi University (India). It contains approximately 8 hours of Malto speech data collected between 2005 and 2009 from 27 speakers (22 males, 5 females). Also included are accompanying transcripts, English translations and glosses for 6 hours of the collection. Speakers were asked to talk about themselves, their lives, rituals and folklore; elicitation interviews were then conducted. The goal of the work was to present the current state and dialectal variation of Malto.

Malto is a Dravidian language spoken in northeastern India (principally the states of Bihar, Jharkhand and West Bengal) and Bangladesh by people called the Pahariyas. Indian census data places the number of Malto speakers in a range of between 100,000-200,000 total speakers. Most Malto speakers live in the three northeastern districts of Jharkhand, i.e, Sahebganj, Godda and Pakur; the fieldwork that resulted in this corpus was conducted in those districts. Of the Pahariyas in that area, three subtribes, the Sawriya Pahariyas, the Mal Pahariyas and the Kumarbhag Pahariyas, primarily speak Malto. (Kobayashi 3)

Pahariya villages or hamlets are located on hilly tracts and in the lowlands are often separated by non-Parahiya villages. As a result, Malto varies from village to village. It may be more accurate to consider Malto a continuum of dialects rather than a unitary language. The three major dialects -- Sawriya Pahariya, Mal Pahariya, and Kumarbhag Pahariya -- correspond to the principal sub-tribal communities. (Kobayashi 14)

For further reading on Malto, consult Texts and Grammar of Malto (2012) by Masato Kobayashi published by Kotoba Books, Vizianagaram 2012 and sold by the book distributors: Mary Martin Booksellers, 123 Third Street, Tatabad, Coimbatore 641012, India. They can be contacted at info@marymartin.com or books.kotobo@gmail.com.

Data

The transcribed data accounts for 6 hours of the collection and contain 21 speakers (17 male, 4 female). The untranscribed data accounts for 2 hours of the collection and contains 10 speakers (9 male, 1 female). Four of the male speakers are present in both groups.

All audio is presented in .wav format. Each audio file name includes a subject number, village name, speaker name and the topic discussed. The transcripts and glossary are UTF-8 text files. Because of ambiguities that occur when writing Malto in Devenagari script, the transcripts were developed using Roman script with symbols adapted from the International Phonetic Alphabet (IPA) but are not considered to be phonetic transcripts.

Consult docs/readme.txt and docs/untran_speaker.txt for further information about the corpus, its collection and the speakers. The transcription and glosses are split into three text files; consult the readme to determine which audio files are covered by each transcript.

Directory Structure

Please see file.tbl for a complete file list as well as checksums for this publication.

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2012S04.

Works Cited

Kobayashi, Masato. Texts and Grammar of Malto. Vizianagaram: Kotoba Books, 2012. Print.

Content Copyright

Portions © 2005-2012 Masato Kobayashi, © 2012 Trustees of the University of Pennsylvania