Last week, Ross Douthat wrote a column tying the recent rise in suicide rates nationally to the retreat of society from traditional institutions, including religion and marriage. Nate Cohn of The New Republic responded with a column in which he argued:

Contrary to what Douthat might expect, there’s no correlation – zero -- between a states’ [sic] suicide rate and religion, marriage rates, or single occupancy homes. State economic growth or unemployment don’t line up, either. … If anything correlates with suicide rates, it’s a states’ [sic] population density: In populous areas, suicide rates are low; in the sparsely populated hinterlands, suicide rates are high. … A more intriguing possibility is gun ownership, which, like suicide rates, is highest in the West and lowest in the Northeast. … Then again, the South has high levels of gun ownership and higher levels of depression than the inland West, but suicide is rarer in Alabama than Montana.

Cohn has since followed up with this article, and to be fair I’m not entirely certain how much of the above is still his argument. He now appears to contend, for example, that economic development actually does play a substantial role in driving suicides.

Regardless, my purpose here isn’t so much to look at the merits of the various claims and counterclaims -- I have no particular dog in this fight, because I doubt I’m going to add much to a question that has been studied and debated extensively by sociologists for decades. Rather, I want to focus on the particular example excerpted above from Cohn’s first piece, because I think there’s something valuable about the use of statistics to be learned here.

In particular, I found it intriguing that Cohn didn’t discern any relationship between religiosity and suicide rates, when there are a raft of studies using international, state, and individual data sets concluding that there is such a relationship.

Indeed, at first blush, it seems that Cohn is correct: If you run simple, bivariate regressions of state suicide rates against states’ levels of religiosity, population density, and gun ownership, the latter two factors have statistically significant relationships with suicide rates, while the former does not (divorces per 1,000 males actually is significantly related here, but there are varying ways of measuring this, and I’m assuming Cohn is just using a different data set).

But simple regressions have their limits, especially when explaining complex phenomena. Sometimes relationships become apparent only after you control for other factors. For example, assume we are trying to figure out something relatively simple: Which factors affect how much gas our (unfortunately hypothetical) C5 Corvette consumes.

Our initial -- perfectly reasonable -- hypothesis is that this is largely a function of how many miles we drive. So, every time we stop for gas, we write down the mileage on the odometer, as well as the number of gallons of gas we put in the tanks.* At the 15th stop, we calculate the following:

Note that the first stop was when we first wrote down our mileage, so we couldn’t calculate total miles driven on that tank.

After running the regression analysis, we find, surprisingly, that there actually isn’t a statistically significant correlation between the number of miles driven and the amount of gas consumed at the “industry standard” of 95 percent confidence. We only explain about 18 percent of the variance overall. How can this be?

Well, it turns out that the relationship between miles driven and gas consumed is dependent on an additional piece of data: A C5 Corvette does about 10 miles per gallon better on the Interstate than in cities.

So we would prefer a multivariate regression analysis here. While simple regression simply looks at the relationship between factors “A” and “B,” multivariate analysis looks at what sort of relationship there is between “A” and “B” when you hold factor “C” constant. If we’d taken note of what percent of our driving had been on the Interstate (here, generated randomly), our chart might have looked like this:

If we run our regression using both “miles driven” and “freeway” as variables, we now explain 99 percent of the variance, and both of the variables are strongly significant. (Of course, in the real world, temperature, air density, tire wear and inflation, and a host of other factors will impact our mileage.)

If we compare the data output for the first and second regressions, we can see what has happened. Controlling for freeway/street driving changes our understanding about the magnitude of the coefficient for miles driven, as well as the precision with which that coefficient is estimated (and thus its statistical significance). Put in lay terms, a necessary precondition to understanding the relationship between miles driven and gallons consumed is considering the *type* of miles driven.