==============================================
		GRAPHICAL USER INTERFACES	
==============================================

This release contains a number of graphical user interfaces for the
data, most of which are very old.  We have not put the effort into
making all of the work properly on the modern versions of the data and
with modern NXT, but below, where we know of problems introduced by
the passage of time, we mention them.  We've released the full set
because having the code may be useful for those working with the data.

On Linux, Macs, and Windows machines under cygwin, running the script
switchboard-guis.sh will give you a choice of GUIs available for use
on this corpus. On Windows machines under DOS, use
switchboard-guis.bat.
 
In order to use these scripts, you must first edit the one for your platform
to change the paths so that it matches your setup.  There are instructions
in the scripts themselves.

When you run the scripts, they give a choice of all programs that are
registered as available in the metadata file, xml/swbd-metadata.xml,
plus two choices that NXT always makes available - a generic display,
and a generic search facility that works over one observation.  (This
generic display is too memory-intensive to use on this data, but below
we explain how to get an alternative one.)  All of the programs that
can be run from the scripts can also be run from the command line,
usually with more options.

If you have the LDC release containing the Switchboard audio files,
Switchboard-1 Release 2, LDC Catalog Number LDC97S62, you can play the
audio from NXT, but you have to strip the sphere header and rename using
NXT's signal naming convention first. The easiest way is to use sox:

sox sw02005.sph sw2005.mix.wav

The system is set up to expect these .wav files to be in a sister directory
to the xml directory, called "signals".  This can be configured by editing
the following line of the metadata file, xml/swbd-metadata.xml:

  <signals path="../signals/">

Finally, when NXT loads file from this data set, it issues
warnings (viewable in the terminal window) about the name of the "stream" 
element.  Here's an example:

WARNING: Stream element "nite:terminal_stream" does not have the
declared NITE stream element name "nite:root".

These warnings don't matter.  (NXT now expects all stream elements for
a data set to use the same tag, and that tag to to be declared in the
metadata file, but the corpus uses different tags for different kinds of
coding.)

------------------------------------------------------ 
GENERIC SEARCH
------------------------------------------------------

This just gives a simple window for typing in queries and seeing a
display of the results on a single observation.  You can get a
spreadsheet view for a set of query results in the lower half of the
window by selecting on parts of the result tree.  This interface works
for any NXT-format data set even if no specific GUIS have been
written, but as soon as there are other GUIs, most people prefer them.
This is because you can get the same display from the search menu on
most modern NXT tools, but in addition to what you see with this tool,
the query results interact nicely with the rest of the display - for
anything you select in the query results, the corresponding parts of
the data display will be highlighted in orange.  In this corpus, the
dialogue act coder is the best choice of interface for arbitrary
querying.

NXT doesn't include any GUI for searching an entire corpus at once
because that's unnecessarily memory intensive for most users.  Refine
your queries on one dialogue at a time using the search menu on any of
the tools, and then run them over the entire corpus at the command
line using, for instance, CountQueryResults or FunctionQuery.  The
command line utilies usually have an option for searching all
observations at once rather than one by one.  For all but the simplest
queries and on all but the highest performing machines, this is
unlikely to work for this corpus because there are so many dialogues.

------------------------------------------------------
	ALTERNATE STRATEGY FOR GENERIC DISPLAY
------------------------------------------------------

NXT always includes a "generic corpus display" as one option.  It is
unwise to run it on this corpus using the script because by default,
it displays all the annotations.  This takes a lot of memory and looks
very busy.  You can, however, run it from the command line, specifying
which kinds of annotations you wish to see by naming them in a simple
query.  After setting the same variables as in the script, from the top
level directory of this release, e.g.:

java net.sourceforge.nite.gui.util.GenericDisplay -c xml/swbd-metadata.xml -o sw2012  -q '($t turn)($p parse)($w word):'

will show turns, syntax, and terminals on the observation sw2012.
Complete "codings" are shown, one per window.  It is sufficient to
name any one element from the codings you wish to see.  If you have
one option you particularly like, you can add it as a
<callable-program> in the metadata file.  This will appear in addition
to the full display option currently shown.


------------------------------------------------------
	DIALOGUE ACT CODER
------------------------------------------------------

The dialogue act coder shows the dialogue acts.  It is the best way to
see the dialogue act coding at a glance, and to hear the sound while
reading the transcription.  Because the tools is just one of NXT's
standard interfaces configured for this data, it includes options for
addressee and reflexivity coding that are not used by this data set.
Although the dialogue acts were not originally created using this
tool, the tool functions as an editor and can be used to modify the
existing coding.  It works best of the tools in the release.

------------------------------------------------------
	INFORMATION STATUS
------------------------------------------------------

This coder is the actual tool that was used to add information status
to markables on the corpus, albeit in a very early version of NXT.
The markables themselves were added automatically by running a query
to find constituents with the correct syntactic properties, unlike in
some corpora where they might be marked by hand.  The tool has two
modes. In coding mode, the code for the current markable is shown
using a coloured ball next to the button for the code.  When coding,
the tool moves forward automatically to the next uncoded markable.
These properties are designed for quick coding.  In checking mode, the
cursor does not advance automatically, and the codes are shown
in-line, with a coloured dot in the transcription next to uncoded
markables, so that the coding supervisor can check entire pages at a
glance.  In both cases, coreferential links have a separate
display. Selecting a link will highlight the associated anaphor and
antecedent on the transcription; when in doubt, you can tell which is
which by selecting just the anaphor or antecedent in the link
window. The only way to find out whether a particular markable
participates in any coreferential link is to run a query to check.

In the coding scheme, markables can be old, mediated, or new, and if
they are old or mediated, they can also have a "statustype" that
explains how they are accessible to the conversants.  The tool doesn't
enforce this restriction of when statustypes apply.

Because the data representation and NXT have changed drastically since
this early tool, its behaviour can be flaky.  It works OK in checking
mode but no longer in coding mode because it doesn't redisplay and
move to the next code well.  The original version of the program, as
used to add the coding in the corpus, interleaved words from the
speakers for one parse at a time, not one turn, as the tool does
currently. Try it on sw2525 or some other dialogue for which kontrast
coding exists. It has never been possible to play the audio in this
tool.

There are more details in README.EXERCISES.txt.

------------------------------------------------------
	ANIMACY
------------------------------------------------------

Again, this tool is the one that was used to create the data in the
first place, apart from a few dialogues where it was translated from
elsewhere.  It is very like the information status tool, with the same
flaws, and applies codes to the same markables, but here there are no
coreferential links.  There are three attributes, the animacy code, a
confidence code, and a code about the use of anthropomorphism.  Try it on
sw2005 or some other dialogue for which animacy coding exists.

There are more details in README.EXERCISES.txt.

------------------------------------------------------
	KONTRAST
------------------------------------------------------

This tool is similar to the information status tool, but uses
different "markables", again added before the tool is run.  In this
one, it's possible to listen to the sound.  It suffers from the same
flaky behaviour as the other older tools because of changes in NXT's
selection mechanism.  The original version of the program, as used to
add the coding in the corpus, interleaved words from the speakers for
one parse at a time, not one turn, as the tool does currently. Try it
on sw4880 or some other dialogue for which kontrast coding exists.


------------------------------------------------------
	OTHER TOOLS
------------------------------------------------------

Because the code could be useful, the release contains source for two
further tools that, because they are not so usable, are not presented
on the GUI menu.  The code to declare them is commented out in
swbd-metadata.xml.

The first, SwitchboardDisplay, is a tool that will show you the
transcription displayed as syntax trees, with the part-of-speech
information appended to words, and with markables interposed around
their words showing animacy and information status.  As for most
tools, there is a related search window for running queries and
highlighting the results.  It is not possible to play the audio in
this tool. Because of changes in both NXT and the NXT representation
of the data since this early tool, its behaviour is flaky. Markables
are rendered twice, once before and once after the screen
representation of the non-terminal to which the markable relates.  All
of A's parses are rendered before B's.

The second is just the configuration option for using NXT's
configurable named entity coder on the Penn Treebank syntax. It is
very, very, very slow to render because NXT isn't expecting that many
named entities.