Notes on Transcription San Duanmu and LDC May 1998 The transcriptions mostly conform to the Transcription Conventions adopted by the Linguistic Data Consortium (LDC). Additional notes relating to the present data are listed below. 1. Based on the advice of the LDC staff, in monologues only the starting and the ending times are indicated. This is given at the beginning of the transcription. 2. Time marking is indicated in seconds. 3. Each dialogue has two speakers, recorded on two tracks through two microphones. The first speaker is referred to as <> and the second as <>, regardless of the tracks they were on. 4. Since the two speakers of a dialogue were in the same room, even though separate microphones were used, each speaker's voice can still be heard on the other's track (at a lower volume). When this occurs, the transcript is marked with . For example, <> SPEAKER1: 就 好, SPEAKER2: 常常 有 去 做, 然後 The speaker that appears in the << >> caused the overlap. SPEAKER1 is the individual who was interrupted. SPEAKER2 is the individual in << >>. "e1" is the ending time for SPEAKER1. The text immediately below "e1" is spoken by SPEAKER2. 5. In the present data, each dialogue involves just two speakers, therefore only the starting time of each speaker turn is marked; the ending time of a speaker turn is approximately the same as the starting time of the next speaker turn. For example, in <> 我 騎去 頭份 那, <> 你 騎 車 到 石頭山 玩 喏? 6. Proper names (place, person, company) are based on the sound; the characters used may differ from the original. 7. Uppercase letters are used where they were spoken as separate letters in English, e.g. 'ABC' was spoken as [ei bi si], and 'OK' was spoken as [ow kei]. Sometimes hyphens or periods are used to avoid ambiguity, for example, 'C-O-A-T' (instead of 'COAT') for [si ow ei ti] and 'U.S.' for [yu es]. Words spoken in English are written in English, such as 'coat', instead of ''. Words pronounced like English are also spelled with letters, such as 'So-Go' [sou gou], which is the actual name of a Japanese department store in Taipei. 8. The choices of hesitation or filler words are often approximate. For example, 呵 and 嗷 are often used to represent [ho], a common Taiwanese filler word. In addition, because hesitation or filler sounds are often variable, the difference between alternative characters, such as 嗨 and 嘿, should not be taken very strictly. 9. {ts} represents a sound made with the lips or the tongue-tip and the teeth while drawing the air into the mouth. 10. The use of the punctuation [,] or [.] is largely based on the speed of speech instead of on grammar. For example, [,] is used at a pause position even if it is not a syntactic boundary, and no [,] or [.] is used at a sentence boundary if the speaker rushes into the next sentence without a pause or a slowdown, such as commonly happens with speaker #23. The difference between [,] and [.] is not very strict. 11. (( )) is used for words not spoken clearly, as recommended by Transcription Conventions. There are two cases. The first is words that could not be clearly determined. The second is words that (are determinable but) are only partially pronounced, such as ((設))設計, where in ((設)) only the consonant [s] (but not the vowel) is pronounced. 12. In dialogs, only reasonably clear non-speech sounds (laughter and hesitation/filler words) by the other speaker are transcribed. Weaker non-speech sounds (such as quiet laughter) are sometimes ignored. 13. The Chinese characters were written in the traditional form (the Big5 coding system, or the 'Taipei' font of the Chinese Language Kit for the Mac computer, readable by the IBM PC). However, owing to the software used (Cihui), some simplified characters also appear (still in Big5 coding), such as 台灣 instead of 臺灣, and 什麼 instead of 甚麼. Such simplified characters are often used in Taiwan as well. 14. Some words in Taiwanese PTH are pronounced differently from Standard Mandarin (besides the more common differences such as the lack of retroflex sounds in Taiwanese PTH). For example, a number of speakers pronounced 和 as [han].