The Value of Data Journalism
Ryan Cooper of The Week is having a good laugh at Nate Silver’s expense, calling into question the usefulness of data journalism in the process. This subject is fair game; the genre has not exactly covered itself in glory this election cycle. I can relate, as I was fairly confident that Donald Trump would not be able to accumulate the 1,237 delegates needed to capture the Republican nomination outright.
My colleague David Byler has already done a nice retrospective on the reasons behind our collective miss. But should we go further? Should this episode cause us to throw the entire field of data journalism into the circular file? I may be biased (on which, more later), but I don’t really think so. Consider a slightly off-topic example: blackjack.
There is no doubt that it can be useful to know a thing or two about the statistics of gambling before sitting down to play blackjack. Perhaps most importantly, unless you know some fairly complex counting techniques, you’re going to lose money eventually! But more to the point, you really will maximize your stay at the table over time if you’re willing to undertake some counterintuitive moves, such as hitting when you are showing a hard 16 and the dealer is showing a 10. I suppose you could dismiss this as undue emphasis on what is likely to happen, but the truth is that what is likely to happen is pretty darned consequential.
I can already hear the fingertips clicking away on keyboards in response to this: “That’s just math; what folks like Silver do is far more subjective!” First, as an aside, we should be careful about assuming that even “hard” sciences are assumption-free. The metaphors we choose to interpret reality – arms races, division of labor, survival of the fittest – often smuggle in our own assumptions about how we think the world works or should be working.
Second, taking an empirical approach to elections analysis does help you place a smart bet, even if – and this caveat is important – it doesn’t mean you’ll win every bet. For example, Silver suggested in January that Bernie Sanders would have a difficult time winning the nomination due to the demographic makeup of the early primary states. He was basically correct. A familiarity with these data would help keep you from concluding that Sanders’ big run of wins in late March and early April was particularly meaningful; it was simply that the states most demographically favorable to Sanders had their turns to vote. Why not just go with a simple poll average? We certainly encourage reviewing poll averages here at RealClearPolitics, but deeper dives into the data than simple poll averages provide often help drive a broader understanding as to why an election is likely to go in a particular direction.
Likewise, the fact that most elections are predictable is consequential in and of itself, and aids interpretation. Knowledge of electoral history—and an understanding of what political professionals call election “fundamentals”— help keep us from, say, ascribing too much significance to Ronald Reagan’s personality for his 1984 landslide (incumbent presidents rarely lose in an environment of 7 percent economic growth), or to Bill Clinton’s supposed re-invention of the Democratic Party in 1992 (incumbents rarely win with sub-40 job approvals when seeking a fourth term for their party). This sort of background is important in its own right, but it is also relevant after the election when people attempt to ascribe larger meaning to the election returns, and use it to build (often unwarranted) narratives about mandates.
But, we should absolutely be aware that the hypotheses we test and models we build don’t spring Athena-like from our brow. Rather, they reflect our internalized experiences, and our beliefs about what is likely going on. This, in turn, makes them susceptible to error, beyond what a 95 percent confidence interval might suggest. I’ve written about this at length elsewhere, but this snippet gets at my general view:
We must also acknowledge that the creation of these models is never “neutral.” Models are generated from hypotheses about how elections work, and these hypotheses are not plucked from the ether. They are susceptible to bias. We test familiar stories about economics, demographics and wars, because those narratives are familiar to us, and have been a part of election analysis for over 100 years. But we can’t really be sure that the correct hypothesis lies within those stories. The correct relationship might be something we’ve never thought to test, and would never think to test, because it is far outside our internalized narratives about what causes a given electoral outcome to occur.
Even worse, we are often inclined to validate the stories to which we hold and to challenge those that are contrary to our foundations. Cognitive dissonance and bias confirmation are real phenomena, and they consistently outfox our best efforts to avoid them. Our narratives about race, class, gender and ideology can cause us to go back and double-check a data set to spot an error when a result runs contrary to those narratives, and it can cause us not to find the error if we are comfortable with the result. We can also fall into this trap by simply failing to vet fully a model that comports with established expectations.
So, yes, data journalism could probably benefit from a hefty dose of humility. Perhaps more importantly, data journalists ought not play the role of that annoying stranger at the table who has had too many drinks and who hectors the other players when they stay on that hard 16. After all, a player really will win a fair number of hands if she doesn’t hit on a hard 16 while the dealer shows a 10, and, well, sometimes it’s fun to break the “rules” and actually gamble.
Quantitative analysis need not be the be-all/end-all of elections journalism. To beat our blackjack metaphor into the ground, you can play to win (or lose more slowly), but it’s simply a hell of a lot of fun to spend a Saturday afternoon sitting around a table in Vegas with your friends, partaking of free drinks, watching people, and mocking your friends’ bad luck. In other words, to get the most out of the experience, there’s a lot more to do than sit around reflexively playing Basic Strategy.
Whether the things that Cooper lists are really what journalism should be about is not a debate I find particularly useful. What I will say is that “gumshoe” journalists would have a better experience if they became more statistically numerate, while the data journalists would do better to engage more with the crucial context that traditional reporting provides.
In truth, the theme of Silver’s book, “The Signal and the Noise,” is not that far off from this synthesis. His point is that, to function well, the game of baseball needs both data analysts and classic scouts, as their respective strengths and weaknesses complement each other. The journalism world is far from achieving the sort of synthesis Silver describes, but that’s no reason to jettison one approach or the other entirely.