============================================== GRAPHICAL USER INTERFACES ============================================== This release contains a number of graphical user interfaces for the data, most of which are very old. We have not put the effort into making all of the work properly on the modern versions of the data and with modern NXT, but below, where we know of problems introduced by the passage of time, we mention them. We've released the full set because having the code may be useful for those working with the data. On Linux, Macs, and Windows machines under cygwin, running the script switchboard-guis.sh will give you a choice of GUIs available for use on this corpus. On Windows machines under DOS, use switchboard-guis.bat. In order to use these scripts, you must first edit the one for your platform to change the paths so that it matches your setup. There are instructions in the scripts themselves. When you run the scripts, they give a choice of all programs that are registered as available in the metadata file, xml/swbd-metadata.xml, plus two choices that NXT always makes available - a generic display, and a generic search facility that works over one observation. (This generic display is too memory-intensive to use on this data, but below we explain how to get an alternative one.) All of the programs that can be run from the scripts can also be run from the command line, usually with more options. If you have the LDC release containing the Switchboard audio files, Switchboard-1 Release 2, LDC Catalog Number LDC97S62, you can play the audio from NXT, but you have to strip the sphere header and rename using NXT's signal naming convention first. The easiest way is to use sox: sox sw02005.sph sw2005.mix.wav The system is set up to expect these .wav files to be in a sister directory to the xml directory, called "signals". This can be configured by editing the following line of the metadata file, xml/swbd-metadata.xml: Finally, when NXT loads file from this data set, it issues warnings (viewable in the terminal window) about the name of the "stream" element. Here's an example: WARNING: Stream element "nite:terminal_stream" does not have the declared NITE stream element name "nite:root". These warnings don't matter. (NXT now expects all stream elements for a data set to use the same tag, and that tag to to be declared in the metadata file, but the corpus uses different tags for different kinds of coding.) ------------------------------------------------------ GENERIC SEARCH ------------------------------------------------------ This just gives a simple window for typing in queries and seeing a display of the results on a single observation. You can get a spreadsheet view for a set of query results in the lower half of the window by selecting on parts of the result tree. This interface works for any NXT-format data set even if no specific GUIS have been written, but as soon as there are other GUIs, most people prefer them. This is because you can get the same display from the search menu on most modern NXT tools, but in addition to what you see with this tool, the query results interact nicely with the rest of the display - for anything you select in the query results, the corresponding parts of the data display will be highlighted in orange. In this corpus, the dialogue act coder is the best choice of interface for arbitrary querying. NXT doesn't include any GUI for searching an entire corpus at once because that's unnecessarily memory intensive for most users. Refine your queries on one dialogue at a time using the search menu on any of the tools, and then run them over the entire corpus at the command line using, for instance, CountQueryResults or FunctionQuery. The command line utilies usually have an option for searching all observations at once rather than one by one. For all but the simplest queries and on all but the highest performing machines, this is unlikely to work for this corpus because there are so many dialogues. ------------------------------------------------------ ALTERNATE STRATEGY FOR GENERIC DISPLAY ------------------------------------------------------ NXT always includes a "generic corpus display" as one option. It is unwise to run it on this corpus using the script because by default, it displays all the annotations. This takes a lot of memory and looks very busy. You can, however, run it from the command line, specifying which kinds of annotations you wish to see by naming them in a simple query. After setting the same variables as in the script, from the top level directory of this release, e.g.: java net.sourceforge.nite.gui.util.GenericDisplay -c xml/swbd-metadata.xml -o sw2012 -q '($t turn)($p parse)($w word):' will show turns, syntax, and terminals on the observation sw2012. Complete "codings" are shown, one per window. It is sufficient to name any one element from the codings you wish to see. If you have one option you particularly like, you can add it as a in the metadata file. This will appear in addition to the full display option currently shown. ------------------------------------------------------ DIALOGUE ACT CODER ------------------------------------------------------ The dialogue act coder shows the dialogue acts. It is the best way to see the dialogue act coding at a glance, and to hear the sound while reading the transcription. Because the tools is just one of NXT's standard interfaces configured for this data, it includes options for addressee and reflexivity coding that are not used by this data set. Although the dialogue acts were not originally created using this tool, the tool functions as an editor and can be used to modify the existing coding. It works best of the tools in the release. ------------------------------------------------------ INFORMATION STATUS ------------------------------------------------------ This coder is the actual tool that was used to add information status to markables on the corpus, albeit in a very early version of NXT. The markables themselves were added automatically by running a query to find constituents with the correct syntactic properties, unlike in some corpora where they might be marked by hand. The tool has two modes. In coding mode, the code for the current markable is shown using a coloured ball next to the button for the code. When coding, the tool moves forward automatically to the next uncoded markable. These properties are designed for quick coding. In checking mode, the cursor does not advance automatically, and the codes are shown in-line, with a coloured dot in the transcription next to uncoded markables, so that the coding supervisor can check entire pages at a glance. In both cases, coreferential links have a separate display. Selecting a link will highlight the associated anaphor and antecedent on the transcription; when in doubt, you can tell which is which by selecting just the anaphor or antecedent in the link window. The only way to find out whether a particular markable participates in any coreferential link is to run a query to check. In the coding scheme, markables can be old, mediated, or new, and if they are old or mediated, they can also have a "statustype" that explains how they are accessible to the conversants. The tool doesn't enforce this restriction of when statustypes apply. Because the data representation and NXT have changed drastically since this early tool, its behaviour can be flaky. It works OK in checking mode but no longer in coding mode because it doesn't redisplay and move to the next code well. The original version of the program, as used to add the coding in the corpus, interleaved words from the speakers for one parse at a time, not one turn, as the tool does currently. Try it on sw2525 or some other dialogue for which kontrast coding exists. It has never been possible to play the audio in this tool. There are more details in README.EXERCISES.txt. ------------------------------------------------------ ANIMACY ------------------------------------------------------ Again, this tool is the one that was used to create the data in the first place, apart from a few dialogues where it was translated from elsewhere. It is very like the information status tool, with the same flaws, and applies codes to the same markables, but here there are no coreferential links. There are three attributes, the animacy code, a confidence code, and a code about the use of anthropomorphism. Try it on sw2005 or some other dialogue for which animacy coding exists. There are more details in README.EXERCISES.txt. ------------------------------------------------------ KONTRAST ------------------------------------------------------ This tool is similar to the information status tool, but uses different "markables", again added before the tool is run. In this one, it's possible to listen to the sound. It suffers from the same flaky behaviour as the other older tools because of changes in NXT's selection mechanism. The original version of the program, as used to add the coding in the corpus, interleaved words from the speakers for one parse at a time, not one turn, as the tool does currently. Try it on sw4880 or some other dialogue for which kontrast coding exists. ------------------------------------------------------ OTHER TOOLS ------------------------------------------------------ Because the code could be useful, the release contains source for two further tools that, because they are not so usable, are not presented on the GUI menu. The code to declare them is commented out in swbd-metadata.xml. The first, SwitchboardDisplay, is a tool that will show you the transcription displayed as syntax trees, with the part-of-speech information appended to words, and with markables interposed around their words showing animacy and information status. As for most tools, there is a related search window for running queries and highlighting the results. It is not possible to play the audio in this tool. Because of changes in both NXT and the NXT representation of the data since this early tool, its behaviour is flaky. Markables are rendered twice, once before and once after the screen representation of the non-terminal to which the markable relates. All of A's parses are rendered before B's. The second is just the configuration option for using NXT's configurable named entity coder on the Penn Treebank syntax. It is very, very, very slow to render because NXT isn't expecting that many named entities.