Notes on Word Boundaries San Duanmu June 1996 The word boundaries, indicated with a space, are based on both syntax and phonology. In some cases the two criteria coincide; in other cases they do not. Since there is no existing convention on marking word boundaries in Chinese, the following decisions are made as a first approximation. Since some of the decisions are arbitrary, no attempt is made to check the entire corpus for consistency. 1. For nominal expressions, including locative expressions, [1 2] and [2 1] are treated as one word, but [2 2] are mostly treated as two words, such as: [1 2] 縣里面, 電風扇, 管理上 [2 1] 學校里, 電腦課 [2 2] 學校 里面, 電腦 教室, 管理 上面 2. The nominal modifier marker 的 (or 之) is grouped with the preceding word, such as: 教學的 效果 優良的 校風 到任的 時候 去的 時候 能源之 規劃 3. The manner adverbial marker 地 is grouped with the preceding word, such as: 慢慢地 走 努力地 使 自己 4. The verb complement marker 得 is grouped with the preceding verb, such as: 走得 快 5. The aspect markers 著 and 過 are grouped with the preceding verb, such as: 搭著 火車 買過 衣服 The aspect marker 了 is grouped with the preceding verb, if it follows one. Otherwise, it is treated as a separate word. For example: 成立了 起了 小霧 起 小霧 了 If the preceding word is not a verb, 了 is treated as a separate word: 三年多 了 這樣 就 很好 了 沒事 了 回 學校 了 住久 了 6. Hesitation words like 呢, 哦, 那, 啊 etc., are mostly treated as separate words. For example: 可以 啊 很難 那 你 呢 自己 學 嘛 However, sometimes a hesitation word is grouped with the preceding one if they seem to form a word-like expression, such as: 對啊 好啊 7. The negation word 不 is grouped with the following word, such as: 不要 不僅 不知道 不一樣 8. The degree adverb 很 is usually treated as a separate word, such as: 很 虛偽 But if the following word is monosyllabic, it is grouped with 很, if the expression is a frequent one, such as: 很好 很多 If the expression is not a frequent one, then 很 is separated from the following monosyllabic word, such as: 很 挺 很 直 很 夠 9. Short idiomatic expressions are sometimes treated as one word, such as: 持之以恆 操之過急 自然而然 and sometimes treated as two words, such as: 三心 二意 儀表 堂堂 昂首 闊步 10. Some expressions may grammatically be analyzed as a phrase but are often spoken as a word, and so they are usually treated as words. For example: 有人 有空 這是 的話 (as in 有空 的話) 就是說 是不是 再說 對不對 11. Some expressions are unclear and may have been treated inconsistently, such as: 有時候 or 有 時候 也是 or 也 是 他們家 or 他們 家 一定要 or 一定 要 還是說 or 還是 說 12. For fractions, a word boundary is put after 之, such as: 五分之 一 13. For the 越...越 expression like the following, a word boundary is put after the second 越, such as: 越做越 辛苦 越來越 多 However, another kind of 越...越 expressions are treated differently, such as: 越多越好 or 越多 越好 14. A measure word without a numeral is sometimes grouped with the preceding verb, such as: 找個 老師 賺點 錢 混口 飯 吃 When there is a numeral, the verb is usually separated from the numeral, such as: 找 一個 老師 找 一些 書 but 看一下 有一些 15. The verbal complements like 上, 下, 出來, 進去, 在, etc., are grouped with the preceding verb in [1 2] and [2 1] structures, such as: 講出來 走出去 帶上 坐下 住在 買好 調到 做成 沖壓成 接下來 In [2 2] structures, a space is often put between the verb and its complement, such as: 沖壓 出來 16. Numeral-classifier expressions are divided as follows 一九 七九年 五萬 七千 三百 六十 三台 十八點 八億 五十 八點 八億 六十 幾台 八十四 年度 17. When a [1 1] or [1 2] Verb-Object unit acts as a modifier of a noun (or when it is used as a nominal itself), it is treated as a single word, such as 節能 技朮 省能源 活動 用水 用電的 合理性 A [2 2] Verb-Object unit is treated as two words, such as: 節約 能源 活動 節約 能源的 政策