About this Blog
About The Author
Email Me

RealClearPolitics HorseRaceBlog

By Jay Cost

« Is This Race Over? | HorseRaceBlog Home Page | Predict the Race for Yourself »

A Review of the Pennsylvania Primary

The Pennsylvania primary is in four and a half weeks. The conventional wisdom is that Hillary Clinton has an edge in the state. Does this intuition bear out on closer inspection? I have spent the last few days soaking and poking in the available data - and I think it reasonable to favor Clinton in Pennsylvania.

We have talked on this site before about the demographic variables that seem to be driving the election results. The two that I think are the most powerful are the number of African Americans in a state and how "upscale" white voters are.

A state's African American population has a curvilinear relationship with election results. In states with few African Americans, Obama does very well. In states with many, he also does well. Clinton does well in states with a middling amount - say 5% to 15%. Our working hypothesis on this page is that this is due to the fact that Obama is perceived differently by white voters depending upon the racial demography of the state. The "upscale" variable captures the differences in the socioeconomic status of the whites in each candidate's coalition. Clinton is winning "Mondale voters," and Obama is winning "Hart voters." We measure this via median white income.*

These variables are not comprehensive explanatory factors. Other causes are definitely influencing vote returns. However, these two can account for upwards of 60% of the vote results we have seen. Thus, they give us a good starting point to analyze Pennsylvania.

If these are the two variables we shall use to understand Pennsylvania - how shall we employ them? The best approach is to contextualize Pennsylvania in the larger mid-Atlantic region. Comparing Pennsylvania's to its five neighbors that have already voted can give us a sense of which neighbor might best serve as a guide.

Mideast Region Demographics.gif

It appears that Ohio is our best bet. While Pennsylvania is whiter than all its neighbors, and its whites are poorer - it seems to have most in common with its neighbor to the west. Comparing Ohio to Pennsylvania should offer us a plausible baseline expectation for what will happen next month.

Let's push the comparison a bit further. After all, it is possible that Pennsylvania and Ohio appear similar on a statewide analysis, but this similarity is belied by countywide differences. To confirm that this is not the case - let's examine these two variables for all Ohio and Pennsylvania counties. In the following graph, the horizontal axis measures the median white income per county. The vertical axis measures the percentage of African Americans per county. We'll put Ohio counties in Buckeye red and Pennsylvania counties in Nittany blue.

Income and African American.jpg

The chart generally confirms the similarity we found in the statewide comparison. Pennsylvania and Ohio counties have similar distributions - they tend to have median white incomes of less than $50,000 and an African American population of less than 10%.

There are two notable exceptions. First, the blue dot in the top-left portion of the graph is Philadelphia County. As you can see, it has a higher proportion of African Americans than any county in Ohio. This favors Obama - and it is highly likely that he will carry it next month.

The second exception is that, generally speaking, Pennsylvania counties are more homogeneous than Ohio counties. Notice the cluster of blue dots in the bottom-left, and how the rest of the graph seems to be dominated by red dots. Part of this is due to the fact that Ohio has 21 more counties than Pennsylvania. However, this cannot explain the entire pattern. In fact, 67% of all Pennsylvania counties have median white incomes less than $40,000, and a white population of at least 90%. The same is true of just 45% of Ohio counties. This implies that we might see less variation in county-by-county results in Pennsylvania than we saw in Ohio.

By and large, however, it appears that Pennsylvania and Ohio have similar values for these two variables - both on a statewide and countywide basis. Using Ohio as our guidepost, we can make a rough, baseline estimate for what will happen in Pennsylvania.

First, we use ordinary least squares regression to build a simple yet powerful predictive model of the Ohio returns using the two variables listed above, plus one more. To account for Obama's strong showing in the youth vote - which can "upset" the "typical" result in a county with a large college-aged population - we include the percentage of residents aged 20 to 24.

Here's what our hypothesized, generic model looks like:

Clinton's Margin of Victory (Or Defeat) In a County = Baseline + Median White Income in County + Percentage of African Americans in County + Percentage of Residents Aged 20-24 in County + Unaccounted for "Error"

The regression method assigns specific weights to each predictive variable - thus giving us a mathematical equation.* In the end, the model accounts for 70% of all variation in countywide vote returns in Ohio. All three variables are statistically significant, which means it is very likely that we have found a causal relationship between Clinton's margin of victory (or defeat) and these three factors.

We can take this model and apply it to Pennsylvania. Here's what we do. We plug the values of percentage of African American, median white income, and percentage of young residents for every Pennsylvania county into this model. This generates a prediction of how Obama and Clinton will fare in all counties. Next, we take a weighted average of these counties. Counties with more registered Democrats (as of November, 2007) are weighed proportionally heavier than counties with fewer registered Democrats. This provides us not only with a prediction for each county, but also for the whole state.

This model predicts that Clinton should do roughly as well as she did in Ohio. Obama does well in metropolitan Philadelphia, but the model predicts Clinton to be strong through the rest of the state. Ultimately, the basic intuition of this prediction flows from the similarities between Pennsylvania and Ohio. These variables were found to be important factors in Ohio; Pennsylvania and Ohio have roughly similar distributions for these variables; and so, unsurprisingly, Pennsylvania performs similarly to Ohio.

We must not over-interpret these results! This is just meant as a rough, baseline gauge for what will happen. There are important reasons to anticipate differences between this estimate and the final result. Data limitations prevent us from accounting for these differences, and thus inhibit further refinements. Endeavors like this are inherently about doing the best we can given the data we have. Thus, it is important to understand what we have done and what we have not done. Please see this footnote.

Pennsylvania is, of course, a large and diverse state. What might we expect region-by-region? Let's review this by breaking it down by congressional districts. We'll start in the east and work our way west.

Obama should do well in PA 01 (the state's only minority-majority district) and PA 02, which together comprise most of the city of Philadelphia. There are four affluent congressional districts - PA 6, 7, 8, and 13 - that comprise most of the Philadelphia suburbs and portions of the city itself. These are the wealthiest districts in the entire commonwealth - so Obama should be relatively strong here. But how strong? Keep an eye on Bucks and Montgomery counties. Frequently, Philadelphia and its suburbs report their returns before the rest of the commonwealth. If that holds true for the April 22nd primary - the results here should give us a sense of what kind of night we are in for. If Obama scores big wins in one or both, the final results might be close. If Clinton pulls roughly even with him, or beats him outright - she should have a good night.

The fast-growing southeast corner of the state - PA 16 (Lancaster) and PA 19 (York) - should be competitive. As a point of comparison, Carroll, Hartford, and Cecil counties - directly to the south in Maryland and demographically/economically similar - split their votes between Clinton and Obama last month. Of course, it is hard to see these areas having a major influence in the overall outcome of the election. Both districts are heavily Republican. As this is a closed primary, neither district should be a big factor.

Another potentially competitive area could be the Lehigh Valley (PA 15). Historically, this area has been identified with big industries, whose workers we would expect to prefer Clinton. However, in the last twenty years there has been an influx of new jobs here. This might favor Obama. One wild card will be the relatively high share of Hispanics in the district. Will they come out to vote?

As we move north and west across the state, Clinton's margins should improve. The state becomes whiter and poorer. She should do well in Scranton and Wilkes-Barre (PA 11), Harrisburg (PA 17), Altoona (PA 9), and Johnstown (PA 12). She should do well in expansive PA 10, the rural northeast section of the state. Of course, this area is heavily Republican in its presidential politics, so its effect should be limited. Ditto the even larger PA 5 in the center of the state, though Penn State will help Obama in Centre County .

The western portion of the state heavily favors Clinton. In the northwest is PA 3, which stretches from Erie to the northern edge of metropolitan Pittsburgh. Clinton should do as well here as she did in Youngstown, just over the state line. For his part, Obama should do well with some of the wealthier suburbs of Pittsburgh, like Fox Chapel and Mt. Lebanon, which help comprise PA 4 and 18. He should also do well in exurban locations like Cranberry and North Huntingdon townships.

However, greater Pittsburgh is not nearly as prosperous as greater Philadelphia. Expect Clinton to hold her own in these congressional districts, and to do well in Washington, PA in the south (PA 12). Though there are some "upscale" white communities that will aid Obama - on balance, the economic situation of white voters in greater Pittsburgh should incline them to Clinton.

The city of Pittsburgh itself - namely PA 14 - should be a study in the racial divisions we have seen in this contest. There is a large African American population here, but whites in the city outnumber African Americans 2.5 to 1. On balance, Clinton should have an edge. African American neighborhoods like East Liberty and the Hill District will probably go heavily for Obama. White neighborhoods will probably divide by income. Shadyside will probably go for Obama, while Brookline and Bloomfield will probably go for Clinton. The trouble for Obama is that the kinds of voters in Brookline and Bloomfield are more typical than those in East Liberty or Shadyside.

Another advantage for Clinton in metropolitan Pittsburgh is that it is older than the rest of the nation. People aged 65 or older comprise about 12% of the nationwide population. Of the seven counties in greater Pittsburgh (Allegheny, Armstrong, Beaver, Butler, Fayette, Washington, and Westmoreland) - the elderly make up 17% of the total population. All in all, greater Pittsburgh is older, poorer, and whiter. So long as Hillary Clinton does not insult Ben Roethlisberger or Sidney Crosby, she should do well.


[*] Socioeconomic status can be complicated to capture because it is composed of many, interrelated factors. We have used median white income to measure it on this site. Obviously, using a single variable to gauge a multi-variable concept like this is not always optimal. On the other hand, using many, closely related metrics to explain a relatively small amount of marginal variation isn't either. Accordingly, we use white median income. As individual variables go, it can probably capture most of the effect that socioeconomic status is having on these vote returns.

[*]Actually, we build two models. The first uses the percentage of African Americans as an independent variable. The second uses the percentage of whites. The two produce essentially the same results, although the first is slightly (but not significantly) more precise. Diagnostic tests indicate that both are "BLUE."

[*] I am not going to offer an actual number for fear that it might be misunderstood. I have employed this method only to provide a rough gauge of what to expect. I cannot stress this enough. The more I apply these kinds of quantitative methods to election results, the less confident I am that such methods can supply anything more than a basic understanding of the dynamics. The dangers is that the methods themselves often seem to offer real precision. If you do not approach the results with care and caution - you can run into trouble. In particular, OLS regression is a predictive tool that can offer dangerous illusions of precision.

In this instance, there are two important problems.

First, the model's predictive power in Ohio - 70% - is at same time impressive and insufficient. The fact that three variables can explain 70% of the changes in over 80 counties is a sign that these are crucial factors in understanding how the Buckeye State voted. However, 30% of the variation in Ohio is left unexplained. In a primary election, this can make all the difference!

Second, and as noted above, this model is a baseline. It was constructed around the results in Ohio. Thus, we have assumed that there are no statewide differences between the two states. As we allow for such differences between the two states, this estimate becomes inaccurate.

In fact, we know that there are such differences. The problem is that we cannot measure them very well. For instance, Pennsylvania's primary is closed while Ohio's was semi-open (i.e. Independents were allowed to vote). This could shift the vote margin in Pennsylvania in one direction or the other.

Another important factor that simply cannot be captured is the possibility of momentum. That's not to say that there will be a momentum effect that shifts Pennsylvania one way or the other - it is only to say that there might be. If momentum were to make an appearance, it would by definition imply a shift in the voting patterns of these demographic groups.

Another potential factor is changes in median white income. The income data we have is from the last census. This could create fuzziness for both the model itself and its predictions for Pennsylvania. Recall that the model was built via Ohio. If Ohio counties have experienced changes in median income relative to one another since the census was conducted, our income variable is not picking up it up. In this is case, the model does not perform as well as it would if we had current data. There could be trouble applying the model to Pennsylvania as well. If Pennsylvania counties are now systematically wealthier than Ohio counties - this would shift Pennsylvania closer to Obama. Of course, the size of the shift would probably be small. If growth in median white income in Pennsylvania has outpaced growth in Ohio by $1,000 in the last eight years (in 1999 dollars) - you would only see a statewide shift of about 1.4%.

Joel Kotkin suggested another potential difference at the Politico the other day:

[B]eneath the similarities (between Ohio and Pennsylvania) lie important and perhaps critical differences. Sen. Clinton's new message of old style pessimism not surprisingly played well in Ohio in large part because it is stronger ties to an old-line Great Lakes auto industry now in free-fall. [snip]

In contrast, Pennsylvania's three percent job growth since 2003 - admittedly below the national average - has been jackrabbit fast compared to the Buckeye State's pathetic .5 percent. Most importantly, no place in Ohio remotely corresponds to the size, scale and complexity of the greater Philadelphia region, with its large concentrations of high-end technology and business service employment.

This is an intriguing suggestion, and it could be of relevance. What might be operable here are perceptions of the statewide economy. If Pennsylvania Democrats are more bullish than Ohio Democrats about their personal prospects - Obama might be aided. Of course, median income probably captures at least some of this phenomenon. Wealthier counties probably view the economy more favorably than poorer counties, regardless of the state. Nevertheless, Kotkin's point here is intuitively plausible. As this model fails to account for this difference in perception, we would see a shift from the baseline toward Obama.

There may be other important statewide differences that complicate an effort to make a precise prediction. Thus, we should only use this model to enhance our baseline understanding. If we make it out to be more than this, we run the risk of having a false understanding of the dynamics of the contest.

-Jay Cost