JANA: A Human-Human Dialogues Corpus for Egyptian Dialect

Item Name: JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
Author(s): AbdelRahim A. Elmadany, Sherif M. Abdou, Mervat Gheith
LDC Catalog No.: LDC2016T24
ISBN: 1-58563-777-7
ISLRN: 498-037-802-860-2
Release Date: November 15, 2016
Member Year(s): 2016
DCMI Type(s): Text
Data Source(s): telephone conversations, text chat conversations
Application(s): machine learning
Language(s): Arabic, Egyptian Arabic
Language ID(s): ara, arz
License(s): LDC User Agreement for Non-Members
JANA: A Human-Human Dialogues Corpus for Egyptian Dialect Agreement (For-profit)
Online Documentation: LDC2016T24 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Elmadany, AbdelRahim, Sherif Abdou, and Mervat Gheith. JANA: A Human-Human Dialogues Corpus for Egyptian Dialect LDC2016T24. . Philadelphia: Linguistic Data Consortium, 2016.

Introduction

JANA: A Human-Human Dialogues Corpus for Egyptian Dialect was developed by researchers at Cairo University. It consists of 82 transcribed dialogues from call center inquiries annotated for dialogue acts.

Data was collected from call centers for banks, airlines and mobile network providers as follows: (1) spontaneous spoken dialogues from inquiries to banks and airlines; and (2) instant messaging (chat) dialogues from a mobile network provider's online support system.

Data

The transcribed dialogues consist of 52 telephone calls and 30 instant messaging conversations, amounting to approximately 20,311 words. The data contains roughly 3,001 conversation turns, with an average of 6.7 words per turn, and 4,725 utterances, with an average of 4.3 words per utterance. The data was transcribed using Transcriber.

All data is presented as UTF-8 XML.

Samples

Please view this sample.

Updates

None at this time.

Pricing

Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$1650 under the Commercial License Agreement for JANA: A Human-Human Dialogues Corpus for Egyptian Dialect (LDC2016T24).

Current fees in this catalog entry reflect those pertaining to a for-profit organization license. Not-for-profit organizations should contact LDC's Membership Office to license this data set.

Available Media

View Fees





Login for the applicable fee