Home › Language Resources › Data

Qatari Corpus of Argumentative Writing

Item Name:	Qatari Corpus of Argumentative Writing
Author(s):	Abdelhamid M. Ahmed, Debra Myhill, Esmaeel Abdollahzadeh, Lee McCallum, Wajdi Zaghouani, Lameya Rezk, Anissa Jrad, Xiao Zhang
LDC Catalog No.:	LDC2022T04
ISBN:	1-58563-992-3
ISLRN:	703-290-141-447-2
DOI:	https://doi.org/10.35111/k307-kg62
Release Date:	July 15, 2022
Member Year(s):	2022
DCMI Type(s):	Text
Data Source(s):	essays
Application(s):	automatic content extraction, discourse analysis
Language(s):	Arabic, English
Language ID(s):	ara, eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2022T04 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Ahmed, Abdelhamid M., et al. Qatari Corpus of Argumentative Writing LDC2022T04. Web Download. Philadelphia: Linguistic Data Consortium, 2022.
Related Works: Hide	View relatesTo LDC2014T06 ETS Corpus of Non-Native Written English LDC2015S10 Arabic Learner Corpus LDC2025T03 The Xi’an Multi-Language Learner Corpus

Introduction

Qatari Corpus of Argumentative Writing was developed by Qatar University, University of Exeter and Hamad Bin Khalifa University and is comprised of approximately 200,000 tokens of Arabic and English writing by undergraduate students (159 female, 36 male) along with annotations and related metadata. Students were native Arabic speakers and fluent in English; each student wrote one Arabic and one English essay in response to specific argumentative prompts. They were instructed to include in their essays a clear thesis statement supported by relevant evidence.

Data

The corpus is divided into Arabic and English parts, each of which contains 195 essays. Part-of-speech annotated files are included with the essay text. All text files are in UTF-8 encoded text format.

Metadata is comprised of information about the students (gender, major, first language, second language) and information about the essay texts (serial numbers of texts, word limits, genre, date of writing, time spent on writing, place of writing). Metadata is presented in UTF-8 encoded CSV format.

Qatari Corpus of Argumentative Writing

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees