Class 3

Andreas Weigend | Social Data Revolution | Fall 2016
School of Information | University of California at Berkeley | INFO 290A

Video: Social Data Revolution: Topic 6
Transcript: sdr2016topic06.docx
Audio: sdr2016topic06.mp3

Wikilead: Michael Theodorides
Contributor: Caryn Tran

Topic 6: Data Literacy
external image 25ece8b.jpg

Guest Speaker:
Teri Elniski, Senior Advisor, U.S Department of Commerce & Former Marketing Executive

Additional Guest:
Gam Dias, Mo-Data, Data Strategist & Product Management VP

external image 1*w2hgMRXt7ZfiasII0i6l6w.jpeg

"Data Literacy is the ability to read, create, and communicate data as information" - Wikipedia

Topic 6: Data Literacy consisted of a guest visit from Teri Elniski from the Department of Commerce discussing Data Literacy. The topic included an in-class activity on the best usages of Department of Commerce data. The topic was concluded with a class discussion of the in-class activity and Q & A with Teri Elniski.

Data Literacy
What is Data Literacy? According to Wikipedia, Data Literacy is the ability to read, create, and communicate data as information (described formally in varying ways). According to Teri Elniski, Data Literacy is bridging data and the world. Data storage began as means to save data for compliance and reference; Today data is stored and brought to "life" with data refineries and data processing algorithms. Data has become the new oil, the new gold. Data Literacy is understanding how to work with data and how we can make decision with data. Data runs everything we do and the ability to use data is becoming analogous to the ability to read to book. Given data literacy, one has to consider access to data as being a part of the equation.

Is computer science/programming necessary for data literacy?

Just as we slowly detach the notion of literacy from commonplace customs or knowledge, one can postulate that the notion of data literacy will disappear over time. Deciding what is necessary in terms of data literacy is a big question that follows. According to the Bureau of Labor Statistic Employment Projections, there are more than 500,000 open computing jobs in the US. No doubt that more and more jobs are expanding the computing field, but is it necessary to learn these tools in order to passably live in our data driven world? Perhaps so if one desires to shape the landscape of data beyond being just a data point.

The Importance of Data Equity is?

It is important for data to be easily accessible and usable. Hand in hand with data literacy, data equity empowers individuals and smaller entities. Big companies such as Google and Amazon have had the leading advantage in terms of collecting and storing data, and access to that data has given those companies better opportunities for technological and economical advance. Efforts to increase access to data have been taken up by many, including the government as a major contributor.

The Power of Open Data?

The value placed on data varies from entity to entity, but one can imagine all the applications. To understand the possibilities, see the following examples of public data sets that are available for public use:

Case Study: The Department of Commerce
external image 2000px-Seal_of_the_United_States_Census_Bureau.svg.png
Data Literacy and the Department of Commerce. The Department of Commerce and Obama is on a quest to bridge the gap between Washington and Silicon Valley. Once solely responsible for creating jobs and ensuring economic prosperity, the Department of Commerce has become "America's Data Agency". The department collects a vast array of data including: census data (2010 Census Form), patent trademark office data, and even weather data. The government was once the biggest data organization in the country, and Washington understands to their is a need for more equality around the distribution of data. Small companies need to have America's data available, the department's data needs to be modernized and made more available to enable the building of products and APIs. Today the departments API are difficult to find and use; and are mostly usable by trained data scientists.

2010 Census Questions:
external image ami-population-census-2010.jpg
(As referenced by students during the in-class exercise)

  1. How many people were living or staying in this house, apartment, or mobile home on April 1, 2010?
  2. Were there any additional people staying here April 1, 2010 that you did not include in Question 1?
  3. Is this house, apartment, or mobile home: owned with mortgage, owned without mortgage, rented, or occupied without rent?
  4. What is your telephone number?
  5. Please provide information for each person living here. Start with a person here who owns or rents this house, apartment, or mobile home. If the owner or renter lives somewhere else, start with any adult living here. This will be Person 1. Wh
  6. at is Person 1's name?
  7. What is Person 1's Sex?
  8. What is Person 1's age and Date of Birth?
  9. Is Person 1 of Hispanic, Latino, or Spanish origin?
  10. What is Person 1's race?
  11. Does Person 1 sometimes live or stay somewhere else?

In-Class Exercise: Brainstorm Applications of Census Data
external image censusCartoon_LG.png
Students were joined in groups of eight to brainstorm and discuss the following question:
If you have all the data the Department of Commerce has, what would you do with it? How can you leverage this data to build a company or product? How can we learn more about America?

Here is a summary of ideas suggested by student groups:
  • Matching a person's activity on the Internet to their census data; Using this to recommend product more effectively (this is especially useful for small retailers that don't have Amazon's data advantages)
  • Predicting demand for affordable housing and transportation by tracking demographics and differences in people.
  • Census data can be used to make Strategic Infrastructure Plans (example: building more schools in a community)
  • Customer Segmentation for Advertising
  • Applications targeting specific demographics; Geographical division of income for seller to target
  • Cross-referencing census data with data from larger corporations such as Google, Facebook, Amazon, and Zillow to predict and measure inaccuracy in census data
  • Combining census data such as income, race, etc. with felocy data to predict areas and communities of violence

external image img.jpg

"The notion of Data Literacy goes away as you become more data literate; The notion of Big Data is going away as we gain data" - Andreas Weigend

Andreas Weigend presented the hypothesis the as we gain more data, the notion of data literacy will disappear much as the literacy of using electricity in a house (using electricity is not something listed on resumes today while it was resume worthy 100 years ago). Data Science and Computer Programming may become an ubiquitous skill just as the ability to use an Excel spreadsheet is
today. The government must market their data sets and make them easier to use for the average person. The government can't and won't change in a dime, but eventually the government will catch up to the Data Revolution of Silicon Valley. Today large corporations have the infrastructure and data providing them with an unfair advantage over startups; The government and society
must make more data available to even the playing field. The skills necessary to drive meaningful information from data should be thought from grade school to the university level.
Interesting References:
  1. Department of Commerce as America's Data Agency