About this Blog
About The Author
Email Me

RealClearPolitics HorseRaceBlog

By Jay Cost

« A Review of the Pennsylvania Primary | HorseRaceBlog Home Page | Questions without Answers »

Obama's Success in Central Pennsylvania

On Wednesday, I offered an initial analysis of the Pennsylvania primary. In it, I argued that Clinton did roughly as well with her core demographic groups in Pennsylvania as she did in Ohio.

Yesterday, I was corresponding with a friend of mine who noted that Clinton's performance among certain groups worsened relative to Ohio, and that she made up the difference because her best groups were more populous.

The most striking instance of this was Clinton's victory among the elderly. Clinton won the elderly by 46 points in Ohio, but by just 26 in Pennsylvania. According to this hypothesis, what made up the gap is that the elderly constituted 14% of the electorate in Ohio, compared to 22% in Pennsylvania. The upshot of this is that if you take Clinton's vote margins in Pennsylvania, apply them to the demographics of Ohio, the latter would have been much closer.

However, there is a catch. Can we take the Pennsylvania results and place them with the Ohio demographics? The validity of that action depends upon how similar the two states are. I argued in March that Ohio could give us a rough estimate of what to expect in Pennsylvania. In a situation such as that, where there is not much data and we have to use what we can find, bringing Ohio into a discussion of Pennsylvania was very useful. However, as noted at the time, there are real limits to this line of analysis. Pennsylvania is a very diverse state. Some places have a lot in common with Ohio. Some places do not.

So, this offers us an interesting analytical question. Did geography play a factor in the Pennsylvania race? More specifically, how close were the results in certain parts of the state to the results in Ohio?

An easy way to test this would be to carve up the exit polls by region to create more detailed cross-tabulations. We'd look not only at how Clinton did among the elderly statewide, but the elderly in the southeast, the southwest, etc. Unfortunately, we cannot do this. We do not have access to this kind of data.

We can approach this in another way, using the countywide vote results. In March, we used linear regression to build a predictive model for countywide Ohio results based on median white income, the percentage of African Americans in a county, and the percentage of residents aged 20 to 24. We can tweak this model to work for Pennsylvania. In fact, we can build a model to explain Pennsylvania and Ohio at the same time. We'll use the three variables mentioned above, plus the percentage of senior citizens among all whites in a county.

Remember that our analytical question is whether voters in certain parts of Pennsylvania behaved like Ohio voters. Accordingly, we'll divide Pennsylvania into five segments: southwest, northwest, central, southeast, and northeast. Our predictive model will include a factor for each of them. The idea behind this is that if Obama did better in a given Pennsylvania region relative to Ohio - controlling for race, income, and age - it will be picked up by one of these variables.

We might expect Obama to have improved relative to Ohio in the southeast. However, this does not appear to have been the case. When we control for race, income, and age, we get roughly the same results in Ohio and southeast Pennsylvania. The same goes for southwest Pennsylvania.

What is significant is the variable that captures counties in central Pennsylvania. This was surprising. The model indicates that, controlling for race, income, and age, Obama performed better in central Pennsylvania than he did in Ohio. Additionally, there is a modest statistical significance to the variables for the northeast and northwest segments of the state. However, when we use a more expansive definition of central Pennsylvania, re-classifying the counties in the northeast and northwest segments that abut the center segment as part of the center, this significance washes away.

What is the upshot of this? Obama did not improve relative to Ohio in Erie, Pittsburgh, Scranton/Wilkes-Barre, or even Philadelphia. However, he did improve in the "Middle T" of the state. This improvement was not puny. If we compare a county in Ohio to one in central Pennsylvania with similar racial, income, and age demographics, we should find Clinton's margin to be 7 to 17 points smaller in the Pennsylvania county.

Let's enliven this with a graphical illustration.

First, let's build a simple predictive model of Ohio countywide returns based upon median white income. We know, of course, that other variables are important factors. We just finished building a comprehensive model, after all. However, median white income is the best predictor, and our task here is just to illustrate the point.

This model gives us a line to graph. It looks like this.

Ohio Predictions.gif

The idea here is that we plug in the value of median white income for an Ohio county, and we get a prediction for Clinton's margin of victory in the county.

Next, we place on top of this graph a scatter plot of the counties in each segment of Pennsylvania.* What we are looking for is whether the Pennsylvania observations systematically fall above or below the line. We expect that there will be no systematic pattern for the counties of the southwest, southeast, northwest, or northeast. They will fall above or below the line randomly because each segment of the state behaved roughly similar to Ohio. However, we do expect a systematic difference between this line and central Pennsylvania counties. In particular, we expect the observations to fall systematically below this line because Clinton's margins should be smaller in central Pennsylvania.

Let's check the northeast, northwest, and southeast first.

OH and PA 1.gif

There seems to be no pattern here. The counties in these segments of Pennsylvania do not fall systematically above or below the line. Next, let's check the southwest and the center.

OH and PA 2.gif

Notice how the counties of southwestern Pennsylvania fall very tightly along the line. No part of the state mimicked Ohio more closely than southwest Pennsylvania.

Next, notice the two counties toward the bottom. One of them is Centre County, home to Penn State. The other is Union County, home to Bucknell University. So, the fact that Clinton "underperformed" here should come as no surprise.

Placing them aside, we can notice that the remaining central counties fall systematically below the Ohio prediction line. This means that Clinton's margins in central Pennsylvania were smaller than they "should" have been. This is exactly what we found above. Controlling for race, income, and age, Obama did better in central Pennsylvania than he did in Ohio. We can't say that about any other part of Pennsylvania.

This is not to imply that he did particularly well in central PA. Clinton still won the counties by an average of 25 points. The point is that, if this area were behaving like Ohio or the rest of Pennsylvania, she would have won them by something closer to 33 points.

What might explain this result? It is hard to say, though it is noteworthy that central Pennsylvania is the most Republican part of the state. We have found again and again in this primary season that, outside of the South, white Democrats in heavily Republican areas tend to prefer Obama more than other areas. It is unclear what has caused this trend, but the observations in central Pennsylvania are consistent with it.

Finally, we should note the irony of central Pennsylvania's support of Obama. These are the locations where you can find many of the "small towns" about which Obama was speaking in San Francisco - and yet they seemed to be tilted in his favor. In a certain sense, small town Pennsylvanians preferred Obama more than the rest of the state!


[*] We'll display only the counties where the African American population is less than or equal to 10%. The reason for this is that, as the African American population increases, the tightness of the dispersion of the data points decreases. Remember that this is only for the purposes of illustration. Our linear regression model accounted for this perfectly well.

-Jay Cost