Qatari Corpus of Argumentative Writing (QCAW) Author(s): Abdelhamid M. Ahmed, Debra Myhill, Esmaeel Abdollahzadeh, Lee McCallum, Wajdi Zaghouani, Lameya Rezk, Anissa Jrad, Xiao Zhang Release Date DCMI Type(s): Text Data Source(s): University students’ written essays Project(s): Writing the Future: Metadiscourse and Voice in English and Arabic Argumentative Writings of Qatari University Students (NPRP11S-1112-170006) Application(s): automatic content extraction, discourse analysis Language(s): Arabic, English Language ID(s): ara, eng Introduction Qatari Corpus of Argumentative Writing (QCAW) is the output of the project, Writing the Future: Metadiscourse and Voice in English and Arabic Argumentative Writings of Qatari University Students. QCAW was developed to support the research and development of corpus studies of L1 Arabic argumentative writing and L2 English argumentative writing. The goal of the corpus release is to provide a comparable database to investigate writing arguments in L1 Arabic and L2 English by L1 Arabic university students. Data With the Ethical Approval Certificate authorized by Qatar University and the University of Exeter for data collection, we invited undergraduate students to participate in the project. Each participant signed a consent form. The participating students are bilingual, i.e., L1 Arabic speakers who are fluent in L2 English. All the data was collected in 2019 in class under controlled conditions. QCAW consists of an Arabic sub-corpus and an English sub-corpus. Each sub-corpus contains more texts written by females (n= 159) than males (n= 36) because there are more females at the university than males: the ratio of female to male students at the university was 3 to 1 at the time of data collection. Following are the task instructions for argumentative essays: • having a clear thesis statement supported by relevant evidence, • establishing a clear relevance of the arguments to the essay topic, • developing critical thoughts by presenting opposing views, support them by evidence, and make your position clear on the issue. Two writing prompts were given to the participants: • Do you agree or disagree with the following statement? With the help of technology, students nowadays can learn more information and learn it more quickly. Use specific reasons and examples to support your answer. • Do you agree or disagree with the following statement? Telephones and emails have made communication between people less personal. Use specific reasons and examples to support your opinion. As a specialized corpus for L1 Arabic (MSA) argumentative writing and L2 English argumentative writing, the Arabic sub-corpus consists of 195 essays, 97,248 tokens in total, and the English sub-corpus consists of 195 essays, 98,379 tokens in total. Texts fewer than 250 words and incomplete texts were excluded. Data is stored in UTF-8 encoded plain text files. Details of the corpus are presented in Table 1. Table 1. Corpus make-up Corpus # texts Average essay length SD Essay Length Range Tokens Arabic 195 498.71 84.56 251-808 97,248 English 195 504.51 94.87 263-1158 98,379 Both Arabic and English texts underwent amendments. Headings and titles were removed as was any text which simply repeated the task instructions or the writing prompt. For the English texts, spelling was standardised to American English as this was the common spelling used in most of the texts. Examples of spelling changes included words such as ‘example’ (misspelled as ‘exmple’), ‘appear’ (misspelled as ‘apper’) and ‘so’ (misspelled as ‘sp’). No other changes were made to the texts in relation to grammatical tense accuracy or turn of phrase accuracy. Tagged versions of both Arabic and English texts are provided with the corpus. Farasa was used for Arabic annotation, and TreeTagger was used for English annotation. Metadata annotation information includes metadata information of learners (gender, major, first language, second language) and metadata information of texts (serial numbers of texts, word limits, genre, date of writing, time spent on writing, place of writing). The metadata file is presented in UTF-8 encoded CSV format. Sponsorship The corpus is based upon work funded by Qatar National Research Fund (QNRF). Acknowledgement The publication of QCAW is a joint effort of our team members from Qatar University, University of Exeter, and Hamad Bin Khalifa University.