CZECH ACADEMIC CORPUS 1.0 GUIDE

4. Bonus material

4.1. The STYX electronic exercise book

The bonus material is aimed at advanced students in primary and high schools and their respective teachers. The bonus material section labelled STYX [36] presents the user with an electronic exercise book for practising Czech morphology and syntax. The most noteworthy feature of this material is the number of sentences offered: More than 11,000 sentences have been compiled along with the corresponding annotations in the PDT to facilitate effective training. In addition to this large vocabulary, the application provides immediate verification of user’s parsing accuracy. It is important to stress that the academic notion of Czech syntax (presented in the PDT 2.0) differs in some ways from the concepts traditionally taught in the school system. These differences are closely documented (Kučera, 2006). Each exercise processes an arbitrary number of sentences according to Czech syntax: Each word in the sentence will be morphologically analysed and the entire sentence will be parsed including determining the constituents of the sentence. Only a small subset of the 11,000 sentences is available on the CD-ROM to avoid overloading the user – 50 sentences (see bonus-tracks/STYX/sample.styx).

The steps for using STYX are clearly illustrated in Figure 4.1. First, the user selects the part of speech associated with each word and then (s)he determines the morphological analysis and appropriate morphological categories (upper part of the right window). The word nodes are juxtaposed together at the beginning of the parsing and each node is removed when it has been successfully parsed. The next step leads to determining the constituents of the sentence including the basic clause elements (predicate and subject). Figure 4.2 demonstrates the parsing evaluation process. The user in our example morphologically analysed the word předměty (E: subjects) correctly; also the syntax and analytical functions analysis is correct (the top tree has been constructed by the user, the lower tree serves for evaluation purposes).

Figure 4.1. STYX: Exercises

STYX: Exercises

Figure 4.2. STYX: Exercise evaluation

STYX: Exercise evaluation

4.2. Voice control of the TrEd editor via the TrEdVoice module

The TrEd annotation editor is the essential annotation tool used to annotate the CAC 2.0 on the analytical layer (see Chapter 3.3.3). From the very beginning the TrEd was equipped with many complex functions and macros, and their number even increased over time. Most of the functions are assigned hotkeys, as it would be extremely time consuming to call upon all the functions from the menu system each time. Nevertheless, the system that consists of a large number of hotkeys is also complicated for the user’s memory. One of the ways of how to rid the user from these complications is the voice control system, which is quite rarely used for application programs. That was why we have developed the TrEdVoice module (Přikryl, 2007). This module’s purpose was not to create a complete voice control of all TrEd functions and enable its full control without using the keyboard and mouse. However, it is a useful accessory extending the original control possibilities (menus, hotkeys and mouse). Figure 4.3 shows the main TrEd screen with voice control enabled. The automatic speech recognition module (so-called ASR module) created by the Department of Cybernetics of the University of West Bohemia in Plzen’s team [6] ( Müller, Psutka, Šmídl, 2000) is used for voice commands recognition. The ASR module is not embodied into the TrEdVoice, it runs independently as the ASR server and the TCP/IP network protocol is used to communicate with the TrEdVoice. The ASR module is based on statistics and it is speaker-independent, which means it can recognise an arbitrary speaker’s voice. For more details on voice recognition see (Psutka, Müller, Matoušek, Radová, 2006).

Figure 4.3. The TrEd editor screen with the TrEdVoice module enabled

The TrEd editor screen with the TrEdVoice module enabled