Credit Card Conversations
NIST Speech Disc 8-1.2
May, 1992
This CD-ROM contains training data for wordspotting on the Switchboard credit card conversations. Thirty five conversations are included. They may be used for cross validation and algorithm parameter determination, as well as for ordinary training. A ten conversation test set will be released later.
The following directories and files are contained in the top-level directory of this disc:
All Switchboard corpus files, other than ref files (see below) are of the form:
sw
Where,
CONVERSATION-ID ::= 1000 ... 9999 (base 10)
FILETYPE ::= .wav | .txt | .mrk (see below for descriptions)
The thirty-five included conversations are:
For the earlier conversations, those preceding 3170, there was
generally an initial time offset between the channels, and
variation in the offset as the conversation proceeded. This was
due to certain peculiarities in the collection process including
some random losses of data. For the later conversations this
problem was corrected.
For some of these conversations, those with significant cross
talk, using which the offset could be tracked, samples have been
deleted from non-speech parts of the data to approximately
correct the offsets. Corresponding changes have been made in the
marked transcript files. For these conversations, as well as for
conversations with little crosstalk, a combined channel version
may be created by summing. It is assumed, however, that the
standard procedure will be to process the channels separately.
The following conversations have been processed in this manner:
1060 2023 2163 2301 2313 2390 2409 2710 2718 2800 2883 2951 2987
The following files are located in the Switchboard documentation,
"swb1/doc", directory:
1026 1037 1038 1044 1060
1081 1083 1088 2023 2067
2163 2301 2313 2390 2399
2409 2536 2682 2710 2718
2764 2800 2883 2917 2951
2987 2999 3170 3332 3409
3439 3751 2781 3821 3855
Switchboard Filetypes
Documentation