Detecting Fake Reviews Using Algorithms

Today’s shoppers are increasingly using online reviews to help make purchase decisions on everything from cell phones to air conditioners to booking hotels and airlines. So it comes as no real surprise that some retailers are hiring review writers to post fake but positive reviews of their products on various websites. Good or bad reviews on popular sites such as Amazon, Yelp or TripAdvisor can profoundly affect public perception of a company’s product and eventually their bottom line. For consumers who constantly rely on online reviews before making any purchase decision, this is a serious problem they cannot afford to overlook.

It's not always easy to differentiate fake reviews from genuine ones, but a team of researchers at Cornell University has been developing sophisticated automated methods to identify them, based on analysis of text. The algorithm looks for certain ‘deceptive indicators’ as described in the paper they published, to sniff out the fake ones.

An example of fake and genuine review. Can you tell which is what?

I have stayed at many hotels traveling for both business and pleasure and I can honestly stay that The James is tops. The service at the hotel is ﬁrst class. The rooms are modern and very comfortable. The location is perfect within walking distance to all of the great sights and restaurants. Highly recommend to both business travellers and couples.

My husband and I stayed at the James Chicago Hotel for our anniversary. This place is fantastic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the staff very attentive and wonderful!! The area of the hotel is great, since I love to shop I couldn’t ask for more!! We will deﬁnatly be back to Chicago and we will for sure be back to the James Chicago.

The researchers mixed 400 positive genuine reviews of Chicago hotels with 400 deceptive reviews produced for the study, and asked three human judges to tell them apart. They could not.

Next, they trained their system on a subset (80%) of the 800 reviews, and tested it on the remaining (20%) reviews. Their algorithm got it right 90% of the time.

The fakes tended to be a narrative talking about their experience at the hotel using a lot of superlatives, but they were not very good on description. Naturally: They had never been there. Instead, they talked about why they were in Chicago. They also used words like “I” and “me” more frequently, as if to underline their own credibility.

fake-review

Of course, it is easy to train people to avoid cues for detection based on the finding of the paper. This is probably what’s going to happen in the future, and just like the war between spam and spam filters, review spammers are going to get better and better at beating the system.

[via NYTimes, Cnet]