Boulder Lies and Truth
Item Name: | Boulder Lies and Truth |
Author(s): | Franco Salvetti |
LDC Catalog No.: | LDC2014T24 |
ISBN: | 1-58563-695-9 |
ISLRN: | 974-370-635-113-0 |
DOI: | https://doi.org/10.35111/tj47-sd65 |
Release Date: | November 15, 2014 |
Member Year(s): | 2014 |
DCMI Type(s): | Text |
Data Source(s): | reviews |
Application(s): | anomaly analysis, subjectivity analysis |
Language(s): | English |
Language ID(s): | eng |
License(s): |
Boulder Lies and Truth |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Salvetti, Franco. Boulder Lies and Truth LDC2014T24. Web Download. Philadelphia: Linguistic Data Consortium, 2014. |
Related Works: | View |
Introduction
Boulder Lies and Truth was developed at the University of Colorado Boulder and contains approximately 1,500 elicited English reviews of hotels and electronics for the purpose of studying deception in written language. Reviews were collected by crowd-sourcing with Amazon Medical Turk.
Each review was required to be original and was checked for plagiarism against the web. Reviews were annotated with respect to the following three dimensions:
- Domain: Electronics (e.g., iPhone) or Hotels
- Sentiment: Positive or Negative
- Truth Value:
- a) Truthful: a review about an object known by the writer reflecting the real sentiment of the writer toward the object of the review
- b) Opposition: A review about an object known by the writer reflecting the opposite sentiment of the writer toward the object of the review (i.e., if the writer liked the object they were asked to write a negative review; if the writer did not like the object, they were asked to write a positive review)
- c) Deceptive (i.e., fabricated): a review written about an object not known by the writer either positive or negative in sentiment; the objects reviewed were provided via a URL from the tasks in (a) and (b)
Data
Each review was judged a total of 30 times: (1) 10 times to evaluate its perceived quality (on a range from 1-5); (2) 10 times with judgments about its perceived truthfulness (e.g., truthful or somehow deceptive, a lie or a fabrication); and (3) 10 times for its perceived sentiment (i.e., star rating).
The following metadata is available for each review:
- time consumed by the writer to write the review
- a pair review ID coupling the two reviews (positive/negative) written about the same object by the same person, either false or truthful
- the ID of the writer who wrote the review
- the writer's disclosure as to whether the object to be reviewed was already used and/or known to the writer
- the URL identifying an instance of the object (i.e., hotel or electronic product) on the web
- a flag for plagiarized reviews
- a marker for reviews that may be removed from the corpus
- the reasons for rejecting a review
Samples
Please view this sample.
Updates
None at this time.