A summary of the size and content of the corpus is given below:

number of speakers		150 speakers
	males			75
	females			75
range of speaker age		10 yrs. to 70 yrs.

number of items per speaker	323 items
	isolated digits		15
	four digit sequences	35
	city names		100
	monosyllables		110
	control words (set A)	13
	control words (set B)	24
	control words (set C)	26

number of repetitions per item	4 repetitions
total number of utterances	193,763 utterances (per channel)

sample frequency		16 kHz
sample type			16-bit linear
number of microphones		2 (dynamic and condenser)


--===----=====------=======--------=========--------=======------=====----===--

Distributions of nonstandard orthographic items for the isolated digit subset of the 
JCSD Corpus. The vast majority of the data is fairly clean, with approximately 10% 
showing some type of nonstandard behavior. The statistics for other segments of the 
corpus are comparable.

Description						Frequency
							(17,390 items)

"clean data":
	nominal pronunciation/				15,364	(88.4%)
		no non-speech markers

with a non-speech marker				2,026	(11.6%)
	and/or an alternate pronunciation

with an alternate pronunciation				1,161	(6.7%)
distribution:
	only an alternate pronunciation			1,121	(96.6%)
	both an alternate pronunciation			40	(3.4%)
		and a non-speech marker

with a non-speech marker:				905	(5.2%)
distribution:
	only a non-speech marker			865	(95.6%)
	both an alternate pronunciation			40	(4.4%)
		and a non-speech marker

non-speech markers					979	(5.2%)
distribution:
	{mouth noise}					543	(55.5%)
	{breath noise}					344	(35.2%)
	{paper rustle}					48	(4.9%)
	{non-speech noise}				20	(2.0%)
	{throat clear}					7	(0.7%)
	{background noise}				7	(0.7%)
	{cough}						6	(0.6%)
	{sniff}						2	(0.2%)
	{mouth_noise}					2	(0.2%)

--===----=====------=======--------=========--------=======------=====----===--

An overview of the geographic distribution by region of the speaker population.

Geographic		Combined		Males		Females
Region			(150)			(75)		(75)

Chubu			16			10		6
Chugoku			5			3		2
Hokkaido		1			1		0
Kanto			90			40		50
Kinki			16			10		6
Kyushu			9			5		4
Tohoku			10			6		4
Unknown			1			0		1

--===----=====------=======--------=========--------=======------=====----===--

Distribution of speaker age in the JCSD Corpus.

Age			Combined		Males		Females
			(150)			(75)		(75)
10-19			1			0		1
20-29			50			25		25
30-39			40			20		20
40-49			32			15		17
50-59			22			11		11
60-69			5			4		1

--===----=====------=======--------=========--------=======------=====----===--

An overview of the prompting material used in the JCSD Corpus.

Description					Number of items

Control Words:
	Banking Services			13
	Word Processors				24
	Home Electronic Equipment		26

Digits:
	Isolated Digits				15
	Four Digit Sequences			35

City Names:					100
	a phonetically-rich subset 
	of common Japanese city names

Monosyllables:					110
	all Japanese monosyllables plus 
	several used to pronounce 
	foreign words