Andreas Weigend | Social Data Revolution | Fall 2016
School of Information | University of California at Berkeley | INFO 290A

Contributors: Daniela Vargas, Diya Sabhanaz Rashid, Elise Nguyen, Jonathan Li

Video: Part 1 Part 2
Transcript:
Audio: sdr2016topic12.mp3

Outline

0. Intro
1. Business locations
2. Summarize reviews
3. Ads
4. Spam and Quarantine
5. Search
6. Small group discussion

Introduction

In the last session, we spoke about the power and importance of education and wondered:
  • How to assess education? How to measure its quality?
  • What should the unit of analysis be? What should we measure? We can think of anything from number of students, to grades, hours, pages read, days attended, etc.

From this we learn that at the end of the day, what is really important is to ask good questions. This week's session focused on answering a major question: how to overcome one of economy's classic market failures: how do we get customers as close to having perfect information as possible?

Yelp is precisely looking into solving that very question: how can I best chose where to get dinner?

Screen Shot 2016-10-06 at 4.58.01 PM.png

Yelp employees came to the class and discussed with us how the company goes about organizing these massive amounts of data, specifically when it comes to (1) correctly pinning down business locations, (2) summarizing customer reviews, (3) Experimenting with ads, (4) avoiding spam, and (5) improving the search problem.

All of us have likely used Yelp before. But we might not know its story, so here's a little useful introduction, courtesy of Wikipedia:


"Yelp is an American multinational corporation headquartered in San Francisco, California. It develops, hosts and markets Yelp.com and the Yelp mobile app, which publish crowd-sourced reviews about local businesses.
Yelp was founded in 2004 by former PayPal employees, Russel Simmons and Jeremy Stoppelman. Yelp grew quickly and raised several rounds of funding. By 2010 it had $30 million in revenues and the website had published more than 4.5 million crowd-sourced reviews. From 2009–2012, Yelp expanded throughout Europe and Asia. In March 2012, it became a public company and became profitable for the first time two years later.
As of 2016, Yelp.com has 135 million monthly visitors and 95 million reviews. The company's revenues come from businesses advertising."

fwpr4.jpg



1. Business Locations: "Detecting wrong business locations with machine learning" by Andrew Danks

At the core of Yelp's business is the idea of connecting people with great local businesses. As a result, the search algorithm and the richness of the content are instrumental to the Yelp team. Yelp sources its data from - (a) Business owners; (b) Data partners, such as governments; and (c) Users, when they check-in or enter information about a business. When users look up businesses on Yelp, he/she gets several attributes, such as name, contact number, whether it accepts pickup or delivery, and addresses.


Sometimes, some of the addresses entered can be incorrect as a result of wrong geo-coding. This can happen because of several things. For example, in Japan, geocoding is challenging because the addresses in Tokyo use different characters. Sometimes, the addresses are inside "containers", such as multiple businesses inside an airport. Incorrect geo-coding leads to incorrect information being provided to Yelp's customer base, and it can affect Yelp's acceptability.


There are several ways Yelp uses the principles of machine learning and "hacking its own data" to correct business locations.

  1. Using photos: Photos often have location embedded as metadata, stored inside EXIF data. Photo locations are however susceptible to noise.

  2. Feature engineering: Feature engineering quantifies what a good map looks like using the distance between business location and the position of the users. The distance to centroid will be larger for a wrong location than for a right location). 75% of photos are within 25 meter of the business. It also uses clustering (DBSCAN). It is important to train the model to determine whether a location is good or bad, thus needs a bunch of examples of locations previously determined as good or bad (roughly 20k samples of good/bad locations), as well as a decision threshold.
The first iteration of the decision model is the same, and applied across all locations. However, because addresses vary based on location, the data architecture can be adjusted accordingly. The risk is lower when precision is wrong, therefore, the decision threshold is determined based on recall rather than precision while maintaining the same quality of user satisfaction.


The data architecture of business locations at Yelp has the following loop:

  1. User checks in; data goes to Yelp Main
  2. Yelp Main sends data to human moderators to check the quality of the data
  3. Human moderators use survey questions to authenticate the dats
  4. User feedback is queued for moderation
  5. All changes in business attributes is published in Kafka
  6. Quality of business data is restored

2. "Summarizing Millions of Reviews" by Nicholas Erhardt

For many data refineries, having a large reach and knowledge base is what make its product attractive and useful for the user. In the midst of growth, how do these data refineries deal with having too much information to share with a user base? Yelp deals with this challenge in two ways: empowering users with better search capabilities and summarizing the millions of reviews and photographs for it users. These solutions work in conjunction to provide the user with meaningful information faster.

Points to consider from Yelp’s Perspective
  • For Search
    • Provide users the capability to adjust display settings (i.e. how information is sorted and presented)
    • Allow users to more precisely pinpoint the information they want out of reviews
  • For Photos
    • Selecting better images, whether that entails quality or the emotions they invoke, to represent a business for the user
    • Using proxy information, such as that found in a photograph’s metadata, to qualify a photograph and more quickly identify “good” photographs

Yelp’s Solution
  • Text Understanding
    • Process Yelp review text and photograph metadata to extract the most important bits of information
    • Utilizing the extracted information to quantify the relevance of a review/photograph for the provided query

3. Ads: "A Product Manager’s Guide to experiments" by Bud Peters

Over >90% of Yelp’s revenue comes from businesses advertising to users within the App. The job of the product manager is how to optimize the ads, so that the customer (i.e. the business), gets the most impact. How to best fit ads into Yelp’s platform?

  • What is A/B testing? It's comparing two versions of a web page to see which one performs better. You compare two web pages by showing the two variants (let's call them A and B) to similar visitors at the same time. The one that gives abetter conversion rate, wins! (e.g.: would Yelp look better in red or purple).
ab-testing-1.png

  • Yelp has run extensive A/B testing to increase the impact of its ads. Here's how that's done:
    • How to increase advertiser's ad impact?
      • Advertisers will pay Yelp per "click" (Click-Through-Rate (CTR)), or per "lead" (Lead-Through Rate (LTR))
      • In order to increase its ad revenue, Yelp needs to increase ads' CTR or LTR
        • With CTR: users need to click on one of the clickable items in the ad
        • With LTR: as opposed to CTR, LTR actually entails a person performing an offline action (be it making a reservation, ordering pick-up for delivery, making a phone call —the phone number for businesses on Yelp is actually made-up so that the number of phone calls coming from the site can be tracked by Yelp), checking-in at the place, etc)
      • Yelp Ads typically have three clickable parts: the business' name, a photo, and the “read more…” at the end of the ad). A/B testing will be carried out to determine the most attractive mix within the ad.
      • When several options are present (e.g.: how to chose the review abstract out of all the existing reviews for one place?) it will be easier to create an algorithm and leverage the "Multi-Armed Bandit" problem
        • The multi-armed bandits problem: there will always be trade-offs between exploration of new possibilities and exploitation of what you already know: how can Yelp best determine the top three review abstracts that they will test? Program a Multi-Armed Bandit algorithm!
        • Fun fact, very clickable words include: “Ok”, “boyfriend”, “I don’t normally write reviews”, “awful”, “taxes”, whereas there are also very unclickable words out there: “attemps”, “blowout”, “Berkeley”)

4. Spam and Quarantine by Jim Blomo

A key challenge to spam prevention is you have to deal with spammers that learn and adapt to changes in the system. In order to prevent spams, we need to build an understanding of what spams are and how they masquerade.

There are two main types of spams:
  • Deceptive spam: deceptive spams are designed to deceive users into a particular belief. Fake reviews are an example of deceptive spams
  • Disruptive spam: disruptive spams are usually out of context and sent in high volumes. They tend to disrupt the user experience.

Several mediums exist for spams: private messages between users and messages, massive following, friend requests (e.g. adding every female users), message boards.

There are two methods of dealing with spam:
  • Anti-spam: blocking every spam message.
  • Quarantine: this method blocks the spam but does not let the spammer know that his message’s been blocked. The idea is if the spammer knows, he would eventually adapt to the quarantine system. For example, when one spammer sends a message on Yelp, the system can show the spammer that the message’s been sent. In reality, the message never actually got delivered. Another example is non-recommended reviews (see figure below). In this case, the spammer may not be aware that their spam’s been blocked or become "not recommended".

yelp not recommended.png

Do these methods work? Oftentimes, as we try to block the spams, the spammer might just have the incentive to send more spams to overwhelm the system.

Building a spam detection model requires a representative sample of spams and thinking about whether you can reuse labels across model. Obviously this is a challenge when the data of reviews is unlabeled. Furthermore, it isn’t always clear that one particular review is a spam and should be relegated to the “not recommended” list. Yelp's built a recommendation system to identify reviews that are most useful to users. This has helped with creating trust between Yelp and its users. Solving this problem would create a positive feedback loop where good content inspires even more good content creation.

In order to solve the labeling issue, outlier detection is a critical step to get started. For example, we were shown a review that uses punctuations to draw the shape of a burger. To identify legitimate reviews, we could rely on feature engineering and base our decisions based on typical user behavior. For example, normal and new users tend to have a bias toward positive reviews. Abnormally high review rate might suggest that the user is not human.

Other ways to build trust on the Yelp platform include the elite program and the ability to share favorite businesses among users.

5. Search by John Hawksley

Solving the search problem for Yelp requires understanding of two questions:
  • What does the user most likely want?
  • Which businesses have these things?
It is not easy to measure the outcome of search results, particularly in terms of how satisfied the user was. We can look at several criteria to rank search results using mathematical models:
  • Relevance
  • Quality
  • Distance
  • Content

6. Small Group Discussion


The series of Yelp Tech Talks was concluded by small group discussions centered on new ideas and features that could be implemented in Yelp. Students came up with a plethora of hypothetical new additions to Yelp, including ways for increased personalization and subjective attributes. The discussion on personalization centered not on whether personalization was good or bad but rather how it should implemented for the user. One example of personalization discussed was the use of location data to push relevant and nearby points of interests for the user. The discussion on subjective attributes started off with what constitutes a subjective attribute and who can have a say in a subjective attribute. Later, it delved deeper into how to generalize a subjective attribute for a particular business and weight user inputs on subjective attributes as well as how to present the subjective attribute to the user.

Beyond brainstorming new features and their implementations, we discussed the importance of proper experimental methodology, such as eliminating bias and utilizing the proper metrics, to test the effects of these new features. To facilitate a conversation on bias, we discussed the shortfalls of testing whether or not smiling faces would positively correlate with increased business and engagement. We reached a consensus wherein multiple confounding variables (i.e. gender) would make it difficult to accurately attribute any differences in business and engagement to a smile. The metrics by which experimental groups are compared are also extremely important, and this was discussed in the context of app installation enforcement. App installation count is a popular and simple metric; however, it would be illogical to use such a metric in the context of testing whether enforcing app installation before showing content affects product usage.

All in all, this was a great exercise in thinking about data usage and data experimentation.