Class 1

Andreas Weigend | Social Data Revolution | Fall 2016
School of Information | University of California at Berkeley | INFO 290A

Main Contributors:
Kyle Rentschler, Rohan Phadte, Noah Jacobs

Video: Social Data Revolution: Topic 1
Transcript: sdr2016topic01.docx
Audio: sdr2016topic01.mp3

Topic 1Social Data Revolution
What is Social Data? What is the term “Social Data Revolution”?

Let’s start with your average morning routine. What is the first thing we do in the morning? Generally speaking, we first check our phones to find emails, notifications, and messages we received the night before. As we move to the bathroom, the GPS on our phones track our location to the accuracy of just a few feet and our operating systems can sense our movement as the strength of wifi connection changes. On the way to the kitchen to make breakfast, we place the bread in the toaster or turn on the espresso machine and then suddenly, the smart meter on our home notes the precise influx of energy demands and knows that we are awake.

When we open the garage door to leave the house, the electric providers measure the electrical signal and know that we are leaving the house. The GPS signal from our phones confirm this as well. On the way to work, cameras on at intersections take thousands of pictures of passing car license plates while private companies and governments use the data to identify us. When we park our car, parking meters keep track the presence of a car, and alerts a meter maid if we extend the allotted paid time. This is the social data we give.

This social data revolution pushes the boundary of what we share and what we as society are comfortable to share. There are two types of social data that we deal with: explicit and implicit data. Explicit data is data that is purposely left by us, such as reviews on Amazon, posts on Facebook. Implicit data are not deliberately left by us, such as history of google maps usages, photos/videos taken by traffic cameras.

Implicit data is often a product of social services. For example, we give access to our geolocation so we can get directions to new restaurants or our friends’ house. This gives rise to the “give to get” notion: you need to share some data in order to receive a service that we desire. This works in situations that align up social incentives for the consumer (us) and the service provider (such as Google). In this case, Google provides us a social service, and we give access to our locations. This implicit data is used for other social services, such as sending targeted ads based on recently visited locations.

external image F2rUeA9WbYFO5tHHOqgFl7xC_QxAgwX3mhOHcGxl0adsUhEi6ril41ewwJi566H0sLLKXw8tiKxNRRFkFNAkbJ4-I9wPBnsqUl0Jv1NZ5WWchEQjEarZv0-I2mcgfoz6qV6X1WDq

It’s this “aligned-incentive” symbiosis that brings up the question, “Who owns the data?” One might think that the data is owned by you or even the service provider. In reality, the answer is not very clear. The data is co-owned by both the consumers and the providers, and if there are more parties involved, the data can be shared between all of them.

What’s the value of data?

One way to answer this question is by asking another one. More specifically, what would your decisions look like without that data? Let’s say that you uploaded a picture of your dog on Facebook. Your dog is cute, and she means a lot to you. Hence, much sentimental value is attached to that picture. Your friends that view your post on Facebook would probably admit your dog is adorable, but unquestionably would not have the same level of sentimental value toward your dog as you do, and would probably continue scrolling down their news feeds. Therefore, we call this data (such as the pictures of your dog), not that valuable. On the other hand, let’s say you wrote a critical Amazon review for keyboard, claiming that many of the keys don’t function correctly. As a result, many customers read that review and consequently, don’t buy the keyboard. We then can claim that this data is valuable, as it influenced decisions of many other individuals.

Data to track:

A growing number of cameras are now deployed across the United States with the intent to take pictures of license plates. The photos can be used to track down the whereabouts of drivers, find missing or stolen vehicles, and catch fugitives. While these cameras seem beneficial, the databases that store the data are operated by private firms, raising many privacy concerns, and some of the data-collecting practices were even deemed illegal in most of California. Some concern, for instance, include worrying partnerships between the police (on-road cameras) and private data firms that could sell access to the data to insurance companies and repo firms. As a result, an insurance company may increase rates for certain drivers based on their driving patterns picked up with license plate cameras. Others warn that tracking cars over time can reveal where people live, work, worship and who they associate with. Even a subtle break from the pattern could hint at breakups, affairs, activities deemed too private or sensitive. With this data being sold to and between companies, the privacy risks only go up.

One of the most important types of data: Geolocation

As discussed above, geolocation services are one of the most important, powerful and perhaps dangerous types of data widely used today. Often times, location services are on by default, and we may not realize how much data they actually collect. For readers who have a mobile device running on iOS, here is a quick exercise to see how much of your geolocation data is being collected. Go to the Settings section, and hit the Privacy tab. Then, click Location Services at the top, which will probably say 'on'. From here, scroll right to the bottom of the menu and click System Services. Finally, press Frequent Locations, which is the last option available. Here you will find an image that looks something like this:
external image _SPiD6bmjEglqFwd-hVCgq48jfmwmWBnFwHw28PYfdEz38zbOTnr5Gqqc3OEWFU-f-lmgKXyhMvsnk208C6LcGWoNgCfj3ngwtUHU_Thf4LuLrc3kGAkpo_Uob6GOYp1KteF6JXR

It has some common city locations from some period of time (usually about the past month). Now, select one of the locations to see more details. You will see something like this:

external image lVCHtpNoKZVLLiFOdM1amjpVfiaGFbVLdteuyMds_yntd40pujVKB_fNKqruDz5EPaAYugv3grfgFLGEWkyoV814pKT1pfLoHrlgJwidDWQzpxQ5TYpcR--zYk8nr3lwOnUUJvQM

Now you see much more information, such as specific places you frequently visit. Classrooms, stores, your house; clearly your phone knows a lot about you. Select one of the locations to see even more data. I’m going to choose the RSF (for some reason listed as “Hearst Ave” on my phone), which, according to my phone, I have been to 30 times since mid-July. Tapping into the details for that location I get:

external image BIDUU4-Onom_uyEIBpAdDcSsYpFS3OIU8pdL1LNFKf-WOFzY8KYQs0gtCNnHHBy_R2BTAg1j21FUbgGcBDxdkspJ5B_VtlwFmbLHJQ6V_ZtOrlu4TNak9RICKrM6S8oeRiWRjgiV

Not only does my phone know how often I visited a given place, it even knows what specific times I was there, accurate to the nearest 15 minutes.

This is just the tip of the iceberg of data kept about you by your mobile phone, carrier, and various accounts. Though this data is kept by my iCloud account, even more comprehensive logs are kept about your location if you enable location for your Google account. The applications are nearly endless. Let's go back to the example of the frequent gym visits. Assuming my Google account has kept similar logs, it can use this data to decide which advertisements to display during my web browsing as well as infer many other things about me that I wouldn't otherwise disclose. In addition to simply observing trends in a user's location, Google can keep track of changes in his/her daily routine and learn even more information about the user.

The Right to be Forgotten:

While there are zero mentions of the term “right to be forgotten” in the law at this time, we can have this right by using: protection of personal data, right to privacy, and liberty of the press. The right to be forgotten arises from the desires of individuals to “determine the development of their life in an autonomous way, without being perpetually or periodically stigmatized as a consequence of a specific action performed in the past” which might show up on the internet.

external image l7kY8_AbYYJIaUBAHiVSFMV_a2iE-55HsjoMr-b5-lDaX0I0FF-rbN_LjtAMUcRJcWMeQvhtIvfkSueVpfKCeCnM2knE8w73ovuXfmoqsIp72n_uDRB1_Sn8WVFGq5qMTfmiYmLk

Big Question: Individual liberty vs Common interests

“It will be very hard for people to watch or consume something that has not in some sense been tailored for them.” - Eric Schmidt, Google

If you take all of these filters together, all of these algorithms, you get a filter bubble: your own personal unique information that you live in online. What’s in this bubble depends on who you are, what you do, but you don’t decide what gets into this filter. More importantly, you don’t see what gets edited out.

external image YwP9O5F2v1r-a2Inokhobb8LOPWmKVqUtO53UvAqe1YdckBqqu0sn9_T7Pnbhkh2_yS-TfiZ13_eE8ZZ-MyGiWaKQBTnGsSKu_TbDroq4i9BYjSTDWCK9rRZSviCsVYkZybv2gLp

In a TV broadcasting society, there are “gatekeepers”, the editors, who control the flows of information. With the advent of the internet, it swept away these broadcasters with organic information… but that’s not what’s happening anymore. Instead, it’s more of a passing of the torch. Instead of human gatekeepers on TV, we have algorithmic ones filtering our information online. However, these algorithms don’t have any embedded ethics that broadcasters once had. So if algorithms get to curate the world for us, and decide what we get to see and don’t get to see, then we need to make sure that they’re not just keyed/sorted by relevance (as much money as that seems to generate for online companies), but also show us what’s important, or uncomfortable, or challenging, or a different point of view.