Read README.TOOLS.txt first.
These exercises are based on the very earliest tools to be written
using NXT and have not been tested extensively using modern NXT.
======================================================
EXERCISES TO GO WITH THE SWITCHBOARD EXAMPLE DATA
======================================================
------------------------------------------------------
TRY CODING MARKABLES FOR ANIMACY:
------------------------------------------------------
Double-click on switchboard-guis.bat (for Windows) or run
switchboard.sh (for Linux/Unix/Mac). This gives a window with the
list of interfaces that have been registered for this corpus.
Start up the animacy coder and, if it gives you a choice of dialogue
(which it will if the distribution has more than one), then choose
dialogue sw4633. This will bring up an interface for the dialogue
with a file menu that allows one to save the coding results, open a
different dialogue, or redisplay (in case the display should get
messed up, for whatever reason). Only one dialogue can be open at a
time, and the interface will ask if you wish to save the previous
coding if you haven't done so already.
The interface allows one to code markables for one of a set of
mutually exclusive and exhaustive animacy categories (they are
guaranteed exhaustive because "unclassifiable" can be used as a bucket
for anything the analyst doesn't want to deal with). The text is in
one window, and a set of buttons with the markable codes on them is in
another, along with a "skip" button that just moves the interface
along to the next markable. In the text window, markables are shown
in parentheses; they can nest inside each other, and colour is used to
show the nesting level, to make it easier to match the parentheses
visually. A dot after the left parenthesis means that the markable's
animacy code is "uncoded"; these are the markables that the analyst
has not yet considered. To select a markable, click on one of its
brackets or a word that is in that markable but not in a deeper
markable. To code a markable, choose a value from the set of buttons.
The users didn't want to be able to see the codes next to the
markables because they said that would make it harder to read the
transcript. Instead, a small red ball will appear next to the code
for that markable. Try selecting a markable and coding it (being
careful not to use the "skip button"). The highlight
will move forward to the next markable. Click on the markable you
re-coded and note that the red ball occurs next to the correct
category. Code a bunch of markables, making sure to put in some
human-group, human-indiv, and org-human codes to set you up for the
next part of the exercise. (You don't need to know what the codes
mean because we don't mind if they're right or not for our purposes).
Try the skip button; it just moves forward, leaving the markable in
whatever state it was in before.
------------------------------------------------------
TRY SEARCHING THE CODED DATA:
------------------------------------------------------
One of the menus on the animacy coding interface, "search", allows one
to type queries in the NXT Query Language and see the results
highlighted on the text window. Try it. The window that results has
two tabs, the query tab for typing in queries, and the results tab,
which shows an XML tree version of the results. On the query tab,
type the query ($m markable): and hit the search button. (You can
copy the query from somewhere else and paste it in if you like.) The
interface moves to the results tab. Open up a result on the tree
until you get to a description of an element, and left-click on it;
the display for that element will highlight in the textual display
window. [NB: there is currently a bug so that this only works if you
haven't redisplayed or changed the font size using the options on the
file menu, so start the interface again if you have.] Try the
following queries:
($m markable):($m@animacy == "human-group")
This means "Markables where the animacy code is human-group".
($m markable):($m@animacy ~ /.*human.*/)
This means "Markables where the animacy code is has human in it
somewhere." The dot (.) means any character, and the star (*) means
zero or more times. This is what's called a "regular expression".
($m markable):($m@animacy ~ /human.*/)
This means "Markables where the animacy code is human followed by any
number of other characters". Note that this doesn't pick up what you
coded as org-human. That's because the code has to *start* with human.
($m markable):($m@animacy ~ /human/)
This doesn't pick up any markables, because all of the codes are human
followed by something or preceded by something, and in this query
language, regular expressions specify complete matches.
There are more complex queries one can try, for instance, to do with
the syntax of the data set. These won't highlight on the display,
however, because the display doesn't show the syntax coding.
While you have the interface up, explore the file menu to see the
options there. Redisplay is just in case the interface got messed
up for any reason. Now quit the interface.
------------------------------------------------------
TRY CHANGING THE ANIMACY CODES:
------------------------------------------------------
The animacy codes the interface knows about just comes from an enumerated
list in the metadata file. Edit xml/swbd-metadata.xml and have a
look at what's in there. Search on human-group. You will see the control
information for a code with the name markable, that has an attribute with
the name animacy, with a list of values, one of which is human-group,
specified like this:
human-group
Add a new value to the list for the code "fred".
fred
Save the metadata file and start the animacy coding interface again.
Fred is now available on one of the coding buttons.
------------------------------------------------------
TRY USING THE SUPERVISOR'S CHECKING PROGRAM:
------------------------------------------------------
If you want to check animacy coding that's already been done, it's
inconvenient that the codes aren't displayed in-line with the
transcript. One of the points of the NITE XML Toolkit is that tools
can be tailored to the specific task at hand. Try the animacy checker
(a different option on the initial menu when you run
switchboard-guis.sh or switchboard-guis.bat) and see a slightly
different interface for use in the checking phase, that makes it
harder to read the text, but easier to see and correct the tags.
------------------------------------------------------
TRY USING THE INFORMATION STATUS
AND COREFERENCE CODING PROGRAM:
------------------------------------------------------
Another set of researchers is coding the same set of markables for
information status and for coreferential links between markables. Try
the interface for adding this coding, which is called
SwitchboardMarkables, on sw4633, for which the data is already coded.
In the interface, markables have two attributes: status (old, mediated,
new, not applicable, and unclassifiable - there's no option to leave
uncoded on this one) and a status type (general vs. event). These buttons
work as you would expect based on the other interface, but the interface
doesn't move forward as one codes, because here there is more to do.
Look at the links menus. This gives all of the coreferential links in
the data. Explore the first link. If you select the link (by
left-clicking on it), then the text display window shows the
antecedent in pink and the anaphor in grey. In the link window, you
can also select just the antecedent or the anaphor of a link, and have
just those displayed. Links have types specified, not by a flat list
like the animacy codes, but by a tree-shaped ontology shown in the
window labelled "link type hierarchy". To change the type of the
first link, select the link and then select a different value in the
ontology; notice the label on the link changes.