About this Blog
About The Author
Email Me

RealClearPolitics HorseRaceBlog

By Jay Cost

« The Romney Campaign, RIP | HorseRaceBlog Home Page | The Democratic Race Moving Forward »

On the State of the Democratic Race

The Democratic race is as tight as can be. Clinton and Obama currently split the pledged delegates as well as the popular vote. Why is this the case? What types of voters are coalescing around these candidates? Prior to Super Tuesday, I offered several essays on the matter (see here, here, here, here, and here). However, I was not able to offer conclusive statements. There simply had not been enough contests. Thanks to Super Tuesday, matters have changed. It is now possible to submit something more complete.

Since the publication of Berelson, Lazarsfeld, and McPhee's Voting in 1954, the study of groups has been a cornerstone of voting analysis. Voters frequently behave similarly to the way that others with similar relevant characteristics behave. If we know what those characteristics are - we can understand voter behavior better.

Currently, we have two types of data available for understanding group behavior in the Democratic race. We have micro-level survey data that comes to us in the form of the exit polls. We also have macro-level data that comes from the statewide results. We shall use both sets to understand group behavior. This puts us in a good position: because neither set is perfect, one can supplement the other. In particular, the exit poll data is incomplete. The media only releases select results from the exit polls - so that limits its utility. The media also tends not to poll caucus states, which have favored Obama to date. Thus, our analysis of the exit polls will "skew" somewhat toward Clinton. We should account for this when we interpret the data. We should also use the macro-level data to confirm the conclusions we draw from this micro-level set.*

Methodological considerations aside, what can the exit polls tell us about the coalitions of Clinton and Obama?

On the one hand, they confirm much of what we already knew. Consider the following chart, which reviews Obama and Clinton's share of several relevant groups weighted by their share in the population across all the voting states*:

Chart 1.jpg

At first blush, we see the same picture we have seen since New Hampshire. Clinton is winning a "traditional" Democratic voting coalition. It is centered around women and Hispanics - and includes voters with lower incomes, self-identified Democrats, union workers, and Catholics. Obama, for his part, is drawing a combination of the Jesse Jackson and Gary Hart voting coalitions - African Americans, wealthier voters, self-identified Independents, non-union workers, and white men.

However, there are some peculiar features here. In particular, Obama wins African Americans overwhelmingly, yet Clinton wins voters who make less than $50,000. This is noteworthy, given that nationwide African Americans make less than whites. Notice also that he loses white voters even though he wins Independents decisively. Again, Independents tend to be white. This implies that there might be some distinctions that the data as it is presented is simply not picking up.

Accordingly, I reexamined the same demographic groups, this time dividing the data according to southern and non-southern subsets. The results are arrayed in the following table:

Chart 2.jpg

As you can see, splitting the data according to region uncovers some intriguing divergences. Note first that Obama wins all demographic groups except whites in the South. He wins voters regardless of sex, income, party registration, or union affiliation. Why? African Americans. They have comprised about 41% of the southern vote to date - and they have broken for Obama by 69 points. In the South, we can also see the racial gap we saw in the earlier chart grow larger. Clinton wins all white voters by 13%. She wins southern whites by 28%.

Look outside the South - to the two columns on the right. In the non-southern states, we find a fascinating twist. The racial gap transforms into a typical gender gap. Obama wins white men by 8 points. Clinton wins white women by 19 points.

This implies that race is playing a different role depending upon the location of the contest. In the South, there is a racial divide. Clinton wins white voters. Obama wins African American voters. When African Americans make up a strong share of the vote (e.g. Alabama, Georgia, Louisiana, and South Carolina), he wins. When they do not make up a strong share (e.g. Oklahoma and Tennesse), she wins.

In the non-South, matters are more complicated. African Americans still go heavily for Obama - but whites are split. White men prefer Obama, white women prefer Clinton. By itself, this favors Clinton because white women go more strongly for Clinton than white men go for Obama; what is more, white women consistently make up a larger share of the vote. Of course, if the groups among which Obama does well are populous in a given state - he overcomes this gender gap. If Clinton's groups are strong, he doesn't.

Remember that we have excluded many caucus states that broke heavily to Obama. This implies that Obama probably does better than the above results suggest. Where he specifically does better, and how much better he does - we cannot know. However, the macro-level data can offer a way to confirm the demographic trends we have found.

Accordingly, I have run an ordinary least squares (OLS) regression analysis based upon the statewide results. OLS regression is a statistical tool that tests whether an explanatory variable accounts for variation in an dependent variable, controlling for other explanatory variables.* It thus provides a way to determine if certain demographic groups are separately influencing statewide results in a given state. Our dependent variable is the difference between Obama and Clinton's share of the results. We shall test several independent variables:

(1) Median income of whites per state. Obama's strength among non-southern white males, Independents, and higher income voters suggest that white voters break for Clinton or Obama by economic lines. The theory tested here is that as white voters make more money, they become more inclined to vote for Obama.

(2) Whether the state is a primary or caucus. Obama seems to do better in caucuses - perhaps because it takes more dedication to attend a caucus, and his voters seem to be more intense.

(3) Number of candidate visits. This is designed to measure campaign effects per state. It stands to reason that the more candidates visit a state, the more money they have poured into the state. This might influence vote turnout and therefore state results.

(4) Whether a state is "homogeneously" white. This speaks to an intuition I expounded after the South Carolina results. My theory is that white voters in states with large white majorities see Obama as an insurgent, independent-minded candidate. Meanwhile, voters in more diverse states see him differently. It was noted above that Obama picks up pieces of the coalitions won by Jesse Jackson and Gary Hart. Perhaps white voters tend to see Obama as Jackson or Hart depending upon the racial demography of their environment. Accordingly, "homogeneity" is measured via whether Hispanics and African Americans together constitute more or less than 10% of the population.

(5) Whether the state is a Southern state. There seems to be a unique effect among white voters depending upon the region. According to the exit polls, Clinton does better among southern whites than she does among northern whites. This variable will catch distinctions that show up on the macro level.

(6) Percentage of union workers per state.

(7) Percentage of Catholics per state.

(8) Percentage of African Americans per state.

(9) Percentage of Hispanics per state.

Again, these variables were used to predict the difference between Obama and Clinton's returns per state. The results are interesting. The model accounts for 69% of all state-by-state variation between the two candidates [that is the "adjusted r-squared" value]. What is more, eight of the above nine variables were found to be statistically significant at the 95% confidence level (or greater). The only exception was the percentage of Hispanics per state.* This means that we can be confident that all but one of them have influenced the state-by-state results.*

What exactly did we find?

(1) As the median income of white voters increases, Obama does better. This is consistent with the hypothesis offered above: wealthier whites are attracted to Obama, poorer whites are attracted to Clinton.

(2) Obama does better in caucuses than in primaries. This was the strongest predictor of all explanatory variables, which is not surprising in light of Obama's large victories in the caucus states.

(3) Clinton does better as the number of candidate visits increases. This was a bit of a surprise, but it is good news for her. Campaign effects seem to incline the electorate to her.

(4) Obama does better in states that are "homogeneously white." This is consistent with the hypothesis we offered: white voters in "homogeneous" states see Obama differently than white voters in heterogeneous states.

(5) Clinton does "better" as we move to the South. This might sound counter-intuitive. However, remember that we included this variable to account for the inclination of southern whites to go for Clinton. Obama's strength in the south is accounted for by the African American variable.

(6) As the union population increases, Clinton does better.

(7) As the Catholic population increases, Clinton does better.

(8) As the African American population increases, Obama does better.

Regression models like these have two important uses. First, they enable us to predict what will happen in the future. That was not the intention here. The point was not to offer predictions about what will happen next - and rightly so. The model's predictive power (69%) is very high from a certain perspective. From another perspective, though, its accuracy is not great enough to admit of "publishable" predictions - not when candidates are often separated by tiny margins. Tomorrow, I hope to take some tentative steps toward reviewing what to expect in the upcoming contests. The model (refined by today's results) will serve as a foundation for this analysis - but it will not be used for simple divination. It is simply not precise enough.

Another use of regression models is that they isolate and identify influential factors. This model definitely serves this purpose - it confirmed much of what the micro-level analysis showed and it elucidated some new trends. All in all, we have made some important steps - especially when we combine the macro analysis with the micro analysis. We have found that both candidates are putting together diverse voting coalitions that differ according to region. There is evidence that Obama wins Independents, African Americans, white males in the North, "upscale" white voters, and white voters in homogeneously white states. He also seems to do well in caucus states where enthusiasm is a factor. There is evidence that Clinton wins Democrats, Hispanics, white females everywhere, white males in the South, "downscale" white voters, Catholics, and white voters in heterogeneous states. She seems to do better in a state the more attention is paid to it.

As I said, it is not clear which candidate's voting coalition will be larger when all is said and done. Both of them are diverse and quite large. We will talk tomorrow about what to expect from these coalitions moving forward.


[*] Another caveat is appropriate to mention here. We have exit polling data from twenty-three states. Data from eighteen of them is used - excluding Florida and Michigan because the states were not contested. Also excluded are Illinois, New York, and Arkansas because each of them voted heavily for the "favorite son" or "favorite daughter." Our goal is to draw an inference from this data to the rest of the contests. We cannot do that if our data has major exceptions that will not recur between now and the convention.

[*] These figures were computed in the following way. A given demographic group in a state was weighted first by its contribution to that state's electorate, and second by the number of pledged delegates the state has. This method is different from a simple unweighted average of each statewide result, which would diminish the importance of bigger states and enhance the importance of smaller states.

[*] A difficulty here is that several caucus states report state or county delegate results, not raw votes. The model counts these delegate results as though they are raw vote results. This is obviously not ideal - but the inferential damage from this choice seems minimal. It does not appear that there are any differences from caucus-to-caucus, depending upon whether they report delegates or votes. Obama tends to win a large share of both. So, it does not seem to have a particular effect on caucus states - and the difference between caucuses and primaries is captured via the corresponding dummy variable. An alternative approach would be to exclude the delegate-reporting caucus states from the analysis - but this would exclude several observations, thus reducing efficiency. There are often judgment calls like these to make when dealing with non-experimental data - choosing between running a risk of bias or a risk of inefficiency. Since the bias effect seems to be minimal, and the inefficiency seems to be more sizable - it seems best to include the delegate-reporting states.

[*] Why was our Hispanic measure found to be insignificant? One reason might be that there is not a lot of state-to-state variation in the Hispanic population. Some states have large Hispanic populations, but most of them have uniformly low populations. As an explanatory variable stops varying, it becomes harder for OLS regression to pick up on its effect.

[*] One possible objection to this analysis is that previous results influence subsequent results - and therefore this kind of "cross-sectional" investigation is missing a key explanatory variable. This possibility was tested, and it was found to be unlikely. In particular, the outcome of the immediately prior statewide result was temporarily included as an explanatory variable. This factor was found not to have any influence. A test was also conducted to see whether the model does a better job predicting results prior to Super Tuesday versus results afterwards - which might be the case if the early "beauty contests" influenced the states of February 5th. The model seems to predict both types of states equally well. The upshot of this is consistent with the conventional wisdom that the Democratic race has been bereft of momentum to date.

-Jay Cost