Read README.TOOLS.txt first. These exercises are based on the very earliest tools to be written using NXT and have not been tested extensively using modern NXT. ====================================================== EXERCISES TO GO WITH THE SWITCHBOARD EXAMPLE DATA ====================================================== ------------------------------------------------------ TRY CODING MARKABLES FOR ANIMACY: ------------------------------------------------------ Double-click on switchboard-guis.bat (for Windows) or run switchboard.sh (for Linux/Unix/Mac). This gives a window with the list of interfaces that have been registered for this corpus. Start up the animacy coder and, if it gives you a choice of dialogue (which it will if the distribution has more than one), then choose dialogue sw4633. This will bring up an interface for the dialogue with a file menu that allows one to save the coding results, open a different dialogue, or redisplay (in case the display should get messed up, for whatever reason). Only one dialogue can be open at a time, and the interface will ask if you wish to save the previous coding if you haven't done so already. The interface allows one to code markables for one of a set of mutually exclusive and exhaustive animacy categories (they are guaranteed exhaustive because "unclassifiable" can be used as a bucket for anything the analyst doesn't want to deal with). The text is in one window, and a set of buttons with the markable codes on them is in another, along with a "skip" button that just moves the interface along to the next markable. In the text window, markables are shown in parentheses; they can nest inside each other, and colour is used to show the nesting level, to make it easier to match the parentheses visually. A dot after the left parenthesis means that the markable's animacy code is "uncoded"; these are the markables that the analyst has not yet considered. To select a markable, click on one of its brackets or a word that is in that markable but not in a deeper markable. To code a markable, choose a value from the set of buttons. The users didn't want to be able to see the codes next to the markables because they said that would make it harder to read the transcript. Instead, a small red ball will appear next to the code for that markable. Try selecting a markable and coding it (being careful not to use the "skip button"). The highlight will move forward to the next markable. Click on the markable you re-coded and note that the red ball occurs next to the correct category. Code a bunch of markables, making sure to put in some human-group, human-indiv, and org-human codes to set you up for the next part of the exercise. (You don't need to know what the codes mean because we don't mind if they're right or not for our purposes). Try the skip button; it just moves forward, leaving the markable in whatever state it was in before. ------------------------------------------------------ TRY SEARCHING THE CODED DATA: ------------------------------------------------------ One of the menus on the animacy coding interface, "search", allows one to type queries in the NXT Query Language and see the results highlighted on the text window. Try it. The window that results has two tabs, the query tab for typing in queries, and the results tab, which shows an XML tree version of the results. On the query tab, type the query ($m markable): and hit the search button. (You can copy the query from somewhere else and paste it in if you like.) The interface moves to the results tab. Open up a result on the tree until you get to a description of an element, and left-click on it; the display for that element will highlight in the textual display window. [NB: there is currently a bug so that this only works if you haven't redisplayed or changed the font size using the options on the file menu, so start the interface again if you have.] Try the following queries: ($m markable):($m@animacy == "human-group") This means "Markables where the animacy code is human-group". ($m markable):($m@animacy ~ /.*human.*/) This means "Markables where the animacy code is has human in it somewhere." The dot (.) means any character, and the star (*) means zero or more times. This is what's called a "regular expression". ($m markable):($m@animacy ~ /human.*/) This means "Markables where the animacy code is human followed by any number of other characters". Note that this doesn't pick up what you coded as org-human. That's because the code has to *start* with human. ($m markable):($m@animacy ~ /human/) This doesn't pick up any markables, because all of the codes are human followed by something or preceded by something, and in this query language, regular expressions specify complete matches. There are more complex queries one can try, for instance, to do with the syntax of the data set. These won't highlight on the display, however, because the display doesn't show the syntax coding. While you have the interface up, explore the file menu to see the options there. Redisplay is just in case the interface got messed up for any reason. Now quit the interface. ------------------------------------------------------ TRY CHANGING THE ANIMACY CODES: ------------------------------------------------------ The animacy codes the interface knows about just comes from an enumerated list in the metadata file. Edit xml/swbd-metadata.xml and have a look at what's in there. Search on human-group. You will see the control information for a code with the name markable, that has an attribute with the name animacy, with a list of values, one of which is human-group, specified like this: human-group Add a new value to the list for the code "fred". fred Save the metadata file and start the animacy coding interface again. Fred is now available on one of the coding buttons. ------------------------------------------------------ TRY USING THE SUPERVISOR'S CHECKING PROGRAM: ------------------------------------------------------ If you want to check animacy coding that's already been done, it's inconvenient that the codes aren't displayed in-line with the transcript. One of the points of the NITE XML Toolkit is that tools can be tailored to the specific task at hand. Try the animacy checker (a different option on the initial menu when you run switchboard-guis.sh or switchboard-guis.bat) and see a slightly different interface for use in the checking phase, that makes it harder to read the text, but easier to see and correct the tags. ------------------------------------------------------ TRY USING THE INFORMATION STATUS AND COREFERENCE CODING PROGRAM: ------------------------------------------------------ Another set of researchers is coding the same set of markables for information status and for coreferential links between markables. Try the interface for adding this coding, which is called SwitchboardMarkables, on sw4633, for which the data is already coded. In the interface, markables have two attributes: status (old, mediated, new, not applicable, and unclassifiable - there's no option to leave uncoded on this one) and a status type (general vs. event). These buttons work as you would expect based on the other interface, but the interface doesn't move forward as one codes, because here there is more to do. Look at the links menus. This gives all of the coreferential links in the data. Explore the first link. If you select the link (by left-clicking on it), then the text display window shows the antecedent in pink and the anaphor in grey. In the link window, you can also select just the antecedent or the anaphor of a link, and have just those displayed. Links have types specified, not by a flat list like the animacy codes, but by a tree-shaped ontology shown in the window labelled "link type hierarchy". To change the type of the first link, select the link and then select a different value in the ontology; notice the label on the link changes.