Image

Quick Look: Big Data or Big Daddy?

Share this Blog post

The other day a friend of mine asked me if I would set up her thermostat. This seemed like a strange request until she told me that her new home came with a web-enabled Honeywell Lyric thermostat. Once I had a chance to look at the Lyric thermostat a light went off: Ahhh … the Lyric is competing with the Nest thermostat. The Nest came out a few years back and like the Lyric the Nest is web-enabled. But the Nest made headlines because Google bought Nest (for a reported 3.2 billion dollars). Why would Google buy a web-enabled thermostat? Two words: Big Data.

As a part of setting up my friend’s Lyric thermostat, I had to go through several forms of connectivity: first a bluetooth connection, then a WiFi connection, and then a connection to the Lyric servers at Honeywell. The point to keep in mind here is that many web-enabled devices send information back to data servers. As an example, when you ask Apple’s Siri a question, that request is sent to one of Apple’s server farms to be processed. That request is time-stamped and stored (often with location information). Now imagine millions of people asking Siri questions each day and you can begin to get a sense for what is known as Big Data. The new economy is a Big Data economy. But let’s go back to web-enabled thermostats for a moment.

When the Nest thermostat first came out there was concern that the data being sent back to Nest servers could be used for nefarious purposes. Because the Nest thermostat continuously sends data back to Nest servers, it would be very easy for Nest technicians to know when you were home and when you were away. If this data were to be hacked, it would be like hanging a sign on your front door that flashed “Not Home … Rip Me Off.”

Ok, ok … I know what you’re thinking: Web-enabled thermostats are not that big a deal. Let’s look at another example that has been used often in the news as a way of warning against Big Data. A few years back the retail giant Target sent mailers to a young woman profiling baby goods: diapers, infant formula, cribs, etc. This woman’s parents saw these mailers and confronted their daughter: “Are you pregnant?!” The young woman was indeed pregnant. Turns out that Target runs correlation algorithms on the Big Data it collects from each transaction. Through these correlations Target is able to tease out sensitive information like pregnancy status. Martin Ford, writing in his 2015 book Rise of the Robots, tells us that a “data scientist working for [Target] found a complex set of correlations involving the purchase of about twenty-five different health and cosmetic products that were a powerful early predictor of pregnancy.”

So, any company able to collect big data—Google, Apple, TiVo, Netflix, Amazon, Facebook, any large retail chain, institutions that issue credit cards, etc.—can run these correlation algorithms. This is why Big Data is such huge Big Money. It is no wonder that everyone wants to get in on the new Big Data economy. Just the other day while buying a CD I found myself in an isle at Best Buy that I found a bit creepy. All the products in this isle were web-enabled from front door locks to sprinkler timers. Heck, even Comcast (which is already a huge Big Data company) is moving into the home security business. Why? Because all of your sensitive home security information is sent through their servers. The question surrounding Big Data is, “Will Big Data lead to Big Surveillance, Big Daddy?”

Is there any way to do anything about Big Data? Probably not. The best thing to do is to keep your “data exhaust” as small as possible. Use the metaphor of a hybrid car. A hybrid car greatly reduces emissions but does not eliminate them. Just be aware of when and how often you send data to server farms. Purchase goods through Amazon (as I do): fertilizing that data farm. Use Comcast Home Security: planting seeds at the ol’ data farm. Ask Siri questions: irrigation waters. One reason people were reluctant to switch from regular texts to texts sent via Apple’s iMessage was because iMessage texts go through Apple’s data servers.

As an aside, there’s an aspect of Big Data that really scares me, way beyond the surveillance aspect, which is scary in and of itself. If you know anything about correlations then you know that correlations do not establish cause and effect. That a rooster crows as the sun rises does not mean that the crowing makes the sun rise. Turning to Martin Ford again, he writes: “The insights gleaned from big data typically arise entirely from correlation and say nothing about the causes of the phenomenon being studied.” What Martin has discovered is that Big Data no longer cares about cause and effect. In essence, Big Data is seeing to it that the correlations derived from Big Data replace cause and effect. This is huge. Martin calls this “the big data revolution.” Here’s how Martin describes the Big Data Revolution: “[T]he idea that prediction based on correlations is sufficient and that a deep understanding of causation is usually both unachievable and unnecessary.”

Why is this a revolution? Well, most of reductionist science is about determining cause and effect relationships. And the Industrial Revolution came about by capitalizing on these various cause and effect relationships (i.e., the fuel source explodes and drives the piston downward). With the advent of Big Data what drives the piston downward is not fuel in the usual sense but you adjusting your web-enabled thermostat or renting a video through Netflix or updating your Facebook page. And here’s the big bug-a-boo: you do not get paid to be a fuel source for Big Data and Big Correlations. So, sure, it’s very convenient to be able to adjust your thermostat using a smartphone app, or rent a Netflix movie, or ask Siri questions, or arm or disarm your home security system remotely, or watch a YouTube video for free, but all of these events generate data exhaust that is being trapped and mined by Big Data companies. Your convenience is their big profits. So, just for grins, ask Siri, “What do you do with my data Siri?” I’d be interested in the answer given that my old iPhone 3GS doesn’t have Siri. [1]

Notes:

[1] At home I have an old Mac PowerBook G3. This laptop came out in 1998. It runs Mac OS9. It still chugs away running an old copy of Quicken, a home finance program. I still use this setup for two main reasons: 1) It still works and has the features I need, and, 2) This computer and its operating system came from a time before data was regularly sent to server farms. So, one effective way to reduce your data exhaust is to use old computer technology that did not depend on a connection to the web. Now you know why I still carry around my trusty Palm Pilot IIIx. I can still sync my Palm Pilot to my Mac PowerBook and the web is none the wiser. Oh, here’s a point to keep in mind: all forms of cloud computing involve data server farms. So, another way to reduce your data exhaust is to not use cloud computing like Apple’s iCloud, which, I admit, is becoming tougher and tougher to do. The convenience that cloud computing brings to the table (i.e., easily keeping all of your devices synced up) is simply too enticing. OK, one more example of convenience. Progressive Insurance has a program called Snapshot. From the Progressive web site we learn:

Like plugging into an outlet. Snapshot fits into your car’s OBD-II port. (Most modern cars have one.) Every time you power up, Snapshot follows suit—you’ll see the lights dance and hear a beep.

Snapshot sends all of your driving information back to their data servers including routes that you have traveled. Sure, the Snapshot data could qualify you for safe driver discounts, but you are creating huge amounts of data exhaust (and actual exhaust unless you’re driving an all-electric car). I encourage you to think twice before you invite Big Daddy along for a ride.