-----------------------------------------------------------
	  Description of the LDC verbal analyzer/synthesizer for 
		    spoken Egyptian Colloquial Arabic
	-----------------------------------------------------------

	June, 1997

	Project leader:		   Cynthia McLemore

	Consultation:		   Megumi Kobayashi
				   Sean Crist
				   M. Kaneko

	Transducer programming:	   Zhibiao Wu

  
CONTENTS

	1. Summary abstract
	2. System requirements
	3. About the transducer


-----------------------------------------------------------------------
1. Summary abstract

	The LDC verbal analyzer/synthesizer (transducer) for Japanese
was compiled primarily for support of the project on Large Vocabulary
Conversational Speech Recognition (LVCSR), sponsored by the
U.S. Department of Defense.


-----------------------------------------------------------------------
2. System requirements

The transducer software presented here is intended for use on UNIX
operating systems; it involves the use of the UNIX "make" utility, and
programs compiled from C source code; the source code is available
from the LDC via anonymous ftp:

	ftp://ftp.cis.upenn.edu/pub/ldc/misc_sw/fst-0.3.tar.gz

If you are using a Sun sparc workstation, you can make use of the
compiled program files that have been included here with the
transducer data; the program files are located in the "bin"
directory.  Users of systems other than Sun sparc workstations will
need to obtain and compile the source code distribution mentioned
above.


-----------------------------------------------------------------------
3. About the transducer

	The analyzer/synthesizer is a finite-state transducer.  The
transducer has a finite set of states (arcs) that specify all possible
inflectional forms for every verb found in the transducer.

	The transducer program requires two input files:

		- Japanese.lmfst
		- Japanese.glfst

The file "Japanese.lmfst" was created manually and specifies all of
the arcs of the inflectional system of Japanese, including both stem
arcs and affixal arcs.

The file "Japanese.glfst" is the result of the arcs in "Japanese.lmfst"
having been read into the transducer.  If any changes are made in
"Japanese.lmfst", then "Japanese.glfst" must be created again.

The file "Makefile" can be used (via the UNIX "make" command) to
create "Japanese.glfst" and "Japanese.words_glfst".

The directory "man" contains UNIX manual-page files that describe the
usage of the two FST programs provided in the "bin" directory.  These
manual pages may be helpful in understanding the following remarks
about the use of the transducer.

You can specify any field as the input field (-i option) and all other
fields as the output field. Currently in Japanese.glfst, the field
numbers are: morphological_tag(0), romaji(1), kanji(2),
hiragana(3). Or you can specify one field as the output (-j option).  

For example, if you want to input romaji and output all other three
fields, you can run the following command:

Lfst_trans -d Japanese.glfst -i 1 -m 

If you want to output kanji only, you can run the following command:

Lfst_trans -d Japanese.glfst -i 1 -j 2 -m