Boulder Lies and Truth

Item Name: Boulder Lies and Truth
Author(s): Franco Salvetti
LDC Catalog No.: LDC2014T24
ISBN: 1-58563-695-9
ISLRN: 974-370-635-113-0
Release Date: November 15, 2014
Member Year(s): 2014
DCMI Type(s): Text
Data Source(s): reviews
Application(s): anomaly analysis, subjectivity analysis
Language(s): English
Language ID(s): eng
License(s): Boulder Lies and Truth
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Salvetti, Franco. Boulder Lies and Truth LDC2014T24. Web Download. Philadelphia: Linguistic Data Consortium, 2014.

Introduction

Boulder Lies and Truth was developed at the University of Colorado Boulder and contains approximately 1,500 elicited English reviews of hotels and electronics for the purpose of studying deception in written language. Reviews were collected by crowd-sourcing with Amazon Medical Turk.

Each review was required to be original and was checked for plagiarism against the web. Reviews were annotated with respect to the following three dimensions:

  • Domain: Electronics (e.g., iPhone) or Hotels
  • Sentiment: Positive or Negative
  • Truth Value:
    • a) Truthful: a review about an object known by the writer reflecting the real sentiment of the writer toward the object of the review
    • b) Opposition: A review about an object known by the writer reflecting the opposite sentiment of the writer toward the object of the review (i.e., if the writer liked the object they were asked to write a negative review; if the writer did not like the object, they were asked to write a positive review)
    • c) Deceptive (i.e., fabricated): a review written about an object not known by the writer either positive or negative in sentiment; the objects reviewed were provided via a URL from the tasks in (a) and (b)

Data

Each review was judged a total of 30 times: (1) 10 times to evaluate its perceived quality (on a range from 1-5); (2) 10 times with judgments about its perceived truthfulness (e.g., truthful or somehow deceptive, a lie or a fabrication); and (3) 10 times for its perceived sentiment (i.e., star rating).

The following metadata is available for each review:

  • time consumed by the writer to write the review
  • a pair review ID coupling the two reviews (positive/negative) written about the same object by the same person, either false or truthful
  • the ID of the writer who wrote the review
  • the writer's disclosure as to whether the object to be reviewed was already used and/or known to the writer
  • the URL identifying an instance of the object (i.e., hotel or electronic product) on the web
  • a flag for plagiarized reviews
  • a marker for reviews that may be removed from the corpus
  • the reasons for rejecting a review

Samples

Please view this sample.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee