VOICE ACROSS HISPANIC AMERICA
			-----------------------------

	                     Corpus Documentation

        The Voice Across Hispanic America consists of 38,740 utterances from 
570 female and 345 male native speakers of American Spanish. Each speaker 
provided between 5 and 45 utterances. There were a total of 31,066 read 
utterances and 7,674 spontaneous utterances (including 3,468 yes/no responses).
Details of the corpus design, collection and development are given in the 
final report. 

        Each utterance is stored in a separate waveform file; all the files
from each speaker are in a separate directory. Speaker directories are arranged
in "train" and "test" directories, where the "test" set is a representative
sample of 100 calls drawn from the overall collection.

        Each speaker is identified by a 4-digit number, based on the order in
which he/she called the data collection system. This number ranges from 0001 to
1446. There are several gaps in the speaker numbers, as several speakers had to
be discarded from the corpus either because they hung up after one or two
utterances or because they did not provide valid speech. Speaker directory
names are of the form:

		spk_<4_digit_speaker_number>

	Within each speaker directory are the all the speech waveform files in
NIST SPHERE format. The header of the speech file contains important speaker
and utterance information, as well as information about the sample data format.
(The sample data are stored in single-channel 8-bit mu-law form, starting at
byte offset 1024 in each file.)  See the final report and the transcription
conventions document for a description of the header items; also, refer to the
documentation in the "sphere" directory for information about the file header
format. The waveform file names are of the form:
		
		<utt_num><mode><utt_type><spkr_num>.sph

where  <utt_num>  - 2-digit number from 01 through 45
       <mode>	  - 'r' for read speech, 's' for spontaneous speech
       <utt_type> - 1-letter code indicating the utterance type 
		    (explained below)
       <spkr_num> - 4-digit speaker number from 0001 through 1446

Examples:	01sy0053.sph,  15rw0972.sph, etc.


The list below shows the order and numbering of the utterances and the
description of the utterance types elicited from each speaker.

Utt. #	Utt.
& Mode	Type	Description
------	----	-----------
 01  s	y	yes/no
 02  r	i	5-digit Caller Id Number (CIN)
 03  r	p	phone number
 04  r	w	application word
 05  r	r	phonetically rich sentence
 06  r	c	credit-card number
 07  r	w	application word
 08  r	r	phonetically rich sentence
 09  r	w	application word
 10  r	r	phonetically rich sentence
 11  r	m	money item (dollar amount)
 12  r	w	application word
 13  r	r	phonetically rich sentence
 14  r	q	quantity item
 15  r	w	application word
 16  r	r	phonetically rich sentence
 17  r	u	unsegmented 8-digit string
 18  r	w	application word
 19  r	r	phonetically rich sentence
 20  r	a	unsegmented 8-character alphanumeric string
 21  r	w	application word
 22  r	c	credit-card number
 23  r	w	application word
 24  r	r	phonetically rich sentence
 25  r	p	phone number
 26  r	n	"name-at-dept." phrase
 27  r	w	application word
 28  r	p	phone number
 29  r	w	application word
 30  r	p	phonetically rich sentence
 31  r	w	application word
 32  r	n	"name-at-dept." phrase	
 33  r	d	date item
 34  r	w	application word
 35  r	p	phone number
 36  r	o	spelled word
 37  r	l	list of 6 digits
 38  s	t	spontaneous time item
 39  s	p	spontaneous phone number
 40  s	y	yes/no
 41  s	o	spontaneous spelled word
 42  s	s	spontaneous speech
 43  s	s	spontaneous speech
 44  s	y	yes/no
 45  s	y	yes/no