2016-10-24

We Are All Data Analysts Now

Here’s a quote from a recent op-ed about Peter Theil and the US presidential election:
My guess, based on zero data, is that, had I mentioned Peter Thiel one year ago, only a handful of readers would have recognized his name. And, had I told you that he was one of the founders of Paypal and the first investor in Facebook (he’s even portrayed briefly in The Social Network), my guess, again based on zero data, is that your opinion of him would have instantly improved. “Tell us more about this Thiel,” you would say.
Forget about Peter Thiel and Donald Trump for a moment, and just pay attention to the phrasing in that passage. The author uses the phrase “based on zero data” not once, but twice. Of course, substantiating the author’s “guess” with empirical research is a pretty silly concept. It’s not relevant to the thrust of the article. I choose to highlight it here, however, because it reveals something about our modern perspectives. In a bygone era, an author might have chosen to say “I suspect,” or “I’d wager,” or “I’d venture to guess…” but today, this author – along with many people out there – choose to stipulate that they are saying so in spite of not having analyzed the matter empirically.

The implication here is that analyzing the data is the conceptual default. Welcome to the modern age. We are all data analysts now. The question from here is how we can expect this fact to color our perspectives.

Recently, someone on my Facebook feed made a jibe at a certain kind of person for believing a certain kind of thing about the Cold War. The jibe was that sociologists should study why that certain kind of person had reached a certain kind of conclusion, followed by a very ideologically charged epithet for the Soviet Union. Because I happen to know people were alive during the Cold War who grew up in countries that benefitted from Soviet foreign policy, and because I happen to believe that Westerners do not have the total story regarding Soviet foreign policy, I added a comment on Facebook suggesting that the hypothetical sociologist should include the perspectives of residents of the Third World.

This comment resulted in immediate demands for which people, which country, which events, I was thinking of. Naturally, I avoided naming specifics – not because I couldn’t give them, but because I didn’t want a debate about the merits of a particular Cold War policy to distract from my real point, which is that one’s perspective on the Cold War is invariably shaped by which narrative one most identifies with. In the US, we’re accustomed to thinking of the USSR as an “Evil Empire.” In the Third World, the question really comes down to which major world superpower had the biggest impact on one’s country – and was that impact positive or negative? This is why Nelson Mandela famously chose to work with the USSR despite Western objections, because the Soviet Union had helped Mandela’s cause when no one else would. You could argue that Mandela was a communist and the Soviet Union helped him only for that reason, but again this really only distracts from my point. My point is that the USSR wasn’t a villain to everyone. Anyone interested in a complete history of the Cold War ought to do a proper accounting of everything.

But my point fell on deaf ears because all anyone could do was demand which country, which events – empirics, empirics, empirics. Let’s see the data and analyze it. Hand it over. This speaks to the core cognitive problem we face today: we approach everything as though we are data analysts. That makes us very good at solving problems that can be solved by data analysis; it makes us horrible at solving other kinds of problems.

Similarly, I came across a recent Facebook post (a public one, so feel free to hunt it down if you’re so inclined) by Less Wrong religious leader Eliezer Yudkowsky, arguing that everyone should vote for Hillary Clinton because the downside risk of a Donald Trump presidency is World War 3. That’s not an exaggeration – that really and truly is what Yudkowsky said. Of course, the reality is that World War 3 may happen – or not – regardless of which candidate wins the election. The only reason Yudkowsky counts this as a risk of a Trump presidency, and not a Clinton presidency, is because Yudkowsky is biased. This should be perfectly clear because, as I just said, World War 3 can happen under any set of assumptions; Yudkowsky only includes that set of assumptions in his estimation of a Trump presidency.

But, remember, this is how Bayesian reasoning works. Yudkowsky is willing to bet on WW3 + Trump, and unwilling to bet on WW3 + Clinton; ergo, it is more probable until he decides to “update his priors.” He thinks it, therefore it is. Now we can finally see that Bayesian reasoning, when done incorrectly, is basically magical thinking.

Bayesian reasoning done correctly, however, can be a powerful way to solve statistical and machine-learning problems. In other words, it’s good at solving data analysis problems, but bad at solving foreign policy problems. As you can see, the problem runs deep.

It gets worse: A common trope among the libertarian crowd is that voting is ineffectual on the margin, thus it doesn’t matter whether or not you vote. But, if everyone acted on this information simultaneously, then no one would vote and the thesis would invalidate itself. So this notion is actually a paradox: completely meaningless, utter nonsense. Voting is marginally ineffectual because voting itself is effectual. Similarly, profits are maximized when the marginal profit of the next unit is zero. It would be stupid to say that “producing gallons of milk is a waste of time” just because we’ve reached the point of diminishing marginal benefit. But when we’re stuck analyzing problems from a data analysis mindset, we undermine our ability to solve problems through other means.


And that’s just politics. Think about all the other areas of our lives that are surely suffering due to the fact that we’re stuck in a data analysis paradigm.