Home › Language Resources › Data

2000 Communicator Dialogue Act Tagged

Item Name:	2000 Communicator Dialogue Act Tagged
Author(s):	Rashmi Prasad, Marilyn Walker
LDC Catalog No.:	LDC2004T15
ISBN:	1-58563-305-4
ISLRN:	451-626-470-363-6
DOI:	https://doi.org/10.35111/sp5p-5637
Release Date:	June 15, 2004
Member Year(s):	2004
DCMI Type(s):	Text
Data Source(s):	telephone conversations
Project(s):	Communicator
Application(s):	nominal expression generation, speech recognition, spoken dialogue modeling, spoken dialogue systems, summarization, tagging, topic detection and tracking
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2004T15 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Prasad, Rashmi, and Marilyn Walker. 2000 Communicator Dialogue Act Tagged LDC2004T15. Web Download. Philadelphia: Linguistic Data Consortium, 2004.
Related Works: Hide	View isAnnotationOf LDC2002S56 2000 Communicator Evaluation hasContinuation LDC2004T16 2001 Communicator Dialogue Act Tagged relatesTo LDC2003S01 2001 Communicator Evaluation

Introduction

2000 Communicator Dialogue Act Tagged was developed by the Linguistic Data Consortium (LDC) and contains approximately 314,000 words of system and user interactions with entity and dialogue act tagging.

This release is an addendum to 2000 Communicator Evaluation (LDC2002S56) developed by LDC in 2002. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the log files of LDC2002S56.

Dialogue Act annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further info on the 2000 Communicator Evaluation corpus, please refer to the main publication from 2002 linked above.

Data

The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 16 MB.

Dialogue Act tagging was done automatically via pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology. Here is the breakdown for dialogues and dialogue acts:

Dialogues	Dialogue Acts	Tagged Dialogue Acts	Unique Tags
648	22,752	22,701	61

Each dialogue is segmented into system and user turns. Except for one system, no utterance segmentation was done within the turns in the log files. The number of utterances is therefore the same as the number of turns. Utterance segmentation is carried out and reflected as the dialogue act segmentation. Here is a breakdown of the distribution of turns, and words:

	System	User	Total
Turns	13,013	11,715	24,728
Words	275,938	38,285	314,223

The release also includes the raw transcripts from the dialogues.

Samples

For an example of the data in this corpus, please view this sample (TXT).

Sponsorship

This research was conducted using funding from the following grant number and funding agency: DARPA - contract MDA972-99-3-0003.

Updates

None at this time.

2000 Communicator Dialogue Act Tagged

Introduction

Data

Samples

Sponsorship

Updates

Copyright

Available Media

View Fees