Data Collection Is Creepy, Even When Its Mainly Data Hoarding

“I’m Terrified of My New TV,” Michael Price writes, describing some truly creepy data collection features in his new “smart” television set. Price is right to be scared of the capabilities such devices make possible.

The amount of data this thing collects is staggering. It logs where, when, how, and for how long you use the TV. It sets tracking cookies and beacons designed to detect “when you have viewed particular content or a particular email message.” It records “the apps you use, the websites you visit, and how you interact with content.” It ignores “do-not-track” requests as a considered matter of policy.

It also has a built-in camera — with facial recognition. The purpose is to provide “gesture control” for the TV and enable you to log in to a personalized account using your face. On the upside, the images are saved on the TV instead of uploaded to a corporate server. On the downside, the Internet connection makes the whole TV vulnerable to hackers who have demonstrated the ability to take complete control of the machine.

More troubling is the microphone. The TV boasts a “voice recognition” feature that allows viewers to control the screen with voice commands. But the service comes with a rather ominous warning: “Please be aware that if your spoken words include personal or other sensitive information, that information will be among the data captured and transmitted to a third party.” Got that? Don’t say personal or sensitive stuff in front of the TV.

You may not be watching, but the telescreen is listening.

I do not doubt that this data is important to providing customized content and convenience, but it is also incredibly personal, constitutionally protected information that should not be for sale to advertisers and should require a warrant for law enforcement to access.

But here’s the thing: this creepily “smart” TV may collect all of that data, but it doesn’t seem to be any good at making any use of it. It barely seems interested in making use of it. It doesn’t show much sign of more convenient “customized content,” but neither does it show much sign of attempts to monetize this data by selling it to advertisers.

I have a Comcast “bundle” — slow and expensive American “broadband,” cable TV, and a landline (with no phone plugged into it — I don’t even know the number). In theory, Comcast could have an incredibly detailed demographic dossier on our household.

NotMe — I do not own a motorcycle. Geico and Comcast have collected more than enough data to know that. And yet Comcast is still selling Geico ad time in which Geico attempts to sell me insurance for the motorcycle it knows I don’t have. Geico is still trying to sell Edward Snowden motorcycle insurance.

But when it comes to mining that data for any apparent purpose, they don’t even seem to have a grasp on the basic geography they could learn from our billing address. We get ads for political candidates in other states and for restaurant chains that don’t have any franchises in our region. We get ads for diapers and baby care products. And Comcast sends its subscribers an endless stream of ads urging them to switch to Comcast.

The ‘vixen likes to put on Bizarre Foods when she’s working around the house. If she’s working on a project, then Andrew Zimmern is probably eating something unusual or unpleasant in the background. Somewhere, a series of zeroes and ones on a server “knows” this. But no one seems interested or capable of trying to leverage that information in any way. We still get the same ads, content and service as any Comcast cable customer who never watches the Travel Channel.

Think of the steady stream of insurance company ads begging us to trade our personal data for a potentially money-saving quote. They’re desperate to collect a fraction of the demographic data on us that Comcast already possesses and ignores. Yet, like Comcast, they seem committed to indiscriminate marketing of all of their goods and services to everyone, everywhere, without any interest in which people that data suggests might be more or less receptive to their pitch.

Geico and Progressive are competing fiercely to sell me motorcycle insurance. Comcast carries their motorcycle insurance ads and channels them into our home. All three companies already have more than enough data collected that they ought to know we don’t own a motorcycle. The local supermarket chain does a better job leveraging the data it collects through my “Bonus Card” than any of these larger companies do.

I spent 10 years as an online copy editor for America’s biggest newspaper chain. It was hugely frustrating to me that the newspaper site wasn’t anywhere near as dynamic and responsive as, say, Amazon or Netflix. Visit Amazon a dozen times and the site changes — adapting to your preferences based on those prior visits. But if you visit your newspaper site every day, going directly to the baseball scores, or the prep sports, or the crime page, that site will never learn and adjust. The Phillies fan who goes to the site every day for Phillies news will see exactly the same home page as the old guy who goes to the site every day to read the obituaries. The two readers will also see exactly the same advertisements on the website. The site records all of their patterns, but it does nothing with that information.

That’s annoying for the reader — it’s the inconvenience of uncustomized content. But it’s also a big-time money-loser for online journalism because uncustomized, untargeted advertising isn’t as lucrative as more focused, targeted advertising could be.

Consider my former employer, Gannett, which has newspapers in nearly every local market in America. Ten years ago — back when Gannett was still a semi-credible news-gathering company that hadn’t yet laid off way too many reporters — those newspapers all had education reporters producing a steady stream of news about local schools. In theory, that means Gannett had a database of parents with children — one that could be organized by zip code and by a rough estimate of the children’s age. That information would have been regarded as valuable for countless potential advertisers, but Gannett never did anything with it. Go to any local Gannett paper website and click on any story about schools or education. You won’t see ads accompanying that article that have anything to do with parents, education, children, etc. You’ll just see the same indiscriminate, untargeted ads you see all over the Internet — clickbait photo galleries of celebrity plastic surgery, weight loss scams, a lizard who wants to sell you motorcycle insurance, etc.

I appreciate Michael Price’s fears and caution. The amount of intensely personal information that is, in theory, being collected about all of us — on the Web, by our TV sets and cable boxes, etc. — really is disturbing. It’s not hard to imagine a thousand different nefarious scenarios for how such information could be put to shady, intrusive use. But such schemes would require not just greed and/or malice, but also an accompanying level of competence that Big Data hasn’t yet demonstrated. Price’s point about vulnerability to hackers is probably what scares me most, because so far data thieves have shown far more competence than data hoarders have when it comes to putting that information to use.

I’m certainly not saying we should be passive and unconcerned about the huge amount of data about us now being hoarded by a host of hosts and I don’t mean to dismiss or diminish those concerns. Our right to privacy needs to be defended by tangible legal means, not just by the incompetence and indifference of companies like Comcast. But still, we should bear in mind that a “smart TV” is only as smart as the people exploiting it.

Data collection is creepy, even when its mainly data hoarding