Top Videos
Related Topics
election 2006
house
polls
2008 Polls NationalIowaNew HampshireGeneral Election
GOP | DemGOP | DemGOP | DemHead-to-Head

Send to a Friend | Print Article


Assessing the Generic Ballot

By Jay Cost

In May, I authored an essay that took issue with the use that pundits make of the generic ballot. I made two points. The first was that pundits should be cautious in their use of it because, over its history, it has sported a large, sustained Democratic skew. This makes it quite possible to find a Democratic false positive - which pundits have managed to find in, by my count, seven of the last eight House elections. My second point was that - skew or no skew - the ballot has problems that should make us wary of it. It does not predict final results nearly as well as it should, so we should be skeptical about following it as far as it will take us.

As the use of the generic ballot among pundits persists, I thought it time to amplify my remarks - and strongly encourage them to back away from its use until closer to Election Day.

But first, if you are unfamiliar with the potentials and the pitfalls of the generic ballot, check out Professor Charles Franklin's recent and excellent introduction to the subject here. His blog, Political Arithmetik, is a great place to go for a scholarly perspective on media polls. It is one of my (very few) daily "must reads."

Let us start with the skew in the generic ballot.

What do I mean by "the skew?" I mean the following: the generic ballot persistently overestimates the size of Democratic support and persistently underestimates the size of Republican support. That means that pundits, insofar as they are relying upon the generic ballot, follow suit. However, we do not have to. If we have a reasonable expectation of what that skew will be - we can make a prediction about what will occur in November. All we have to do is subtract the amount of inflation from the Democratic majority.

Currently - the average June/July Gallup generic ballot of "national adults" shows the Democrats leading the GOP 51.8% to 38.4%. If we take only the people who are registering a party preference (what is known as the "two-party vote"), we can see that the Gallup generic ballot shows the Democrats leading 57.4% to 42.6% among people who prefer either the Democrats or the Republicans. That amounts to a very hefty 14.8% lead.

But this does not factor in the skew.

Historically speaking, when the Democrats have that kind of edge in June/July, by November their victory in the popular vote "shrinks" to a much more modest 51.75% to 48.25%.

In other words, today's Gallup generic ballot does not predict a Democratic blow-out. Not at all. It predicts another squeaker on the order of Bush v. Kerry. Bush's share of the two-party vote in 2004 was 51.2%. Kerry's was 48.8%. Michael Barone's "49-49 Nation," if you believe the generic ballot, has not actually gone anywhere. This year will be Round 3.

This is what I mean when I say that the generic ballot has a large, sustained skew about which pundits should be cautious. I was not kidding.

The next question is: should we believe this result? I mentioned earlier that if we have a reasonable expectation of the skew, we can still use the generic ballot. But, is this expectation reasonable? There are reasons to think that the answer is no. I acquired this estimate by using a statistical process known as "ordinary least squares regression." Regression produces a predictive model so that we can input the value of today's generic ballot, and receive as an output a prediction of the final result. Regression requires certain assumptions to be a valid process - and in the case of the generic ballot, two of the assumptions seem to be violated.

The validity of this model is predicated upon our expectation that the generic ballot has an average level of predictive power. Of course, since this is a model - we expect the predictive power to vary from year to year around this average. Some years the generic ballot will be more accurate than in other years. In other words - we expect the amount of "error" (i.e. the difference between what our model predicts and what actually happens) to fluctuate.

But, we also have expectations of this error. One expectation is that its rate should be roughly constant. We should not have one set of observations where our model has only a little bit of error - and another set of observations where there is a great deal of error. It means that our model does not perform equally well across all observations. Another expectation has to do with over- and underestimation. We expect our model to sometimes overestimate the true value and to sometimes underestimate the true value. But there should be no pattern to this over- and underestimation. For instance, we should not expect it to underestimate the first ten observations and then overestimate the second ten.

The generic ballot seems to have both of these problems. Specifically, as it makes a larger prediction for the Democrats, the model's error rate tends to increase. Furthermore, as it makes a larger prediction for the Democrats, the model tends to underestimate their final result. What does that mean for the 51.75/48.25 prediction? It means that it is not valid.

Why is this the case? It gets back to the skew. As the Democratic lead in the generic ballot increases, two things happen. First, the size of the skew increases. When the Democrats have a small lead - the generic ballot overestimates the Democrats' share by an average of 9%. When they have a big lead - it overestimates their share by an average of 14%. Second, the variability of the skew increases. When the Democrats have a small lead, the skew varies between 0% and 11%. When they have a big lead, it varies between 2% and 24%. Both of these are violations of assumptions needed to use the generic ballot as a predictive tool. Thus, we cannot use it as one. Is it possible to salvage it - to "reconstruct" the data in such a way that it does not violate these assumptions, or to use another statistical test whose assumptions it does not violate? It might be - but to my knowledge nobody has, as of yet, found a way to do it. Without this "reconstructive surgery," the generic ballot remains an invalid indicator of election results.

I think that the reason for both violations relates to the presence of non-voters who are registering a party preference. We know two things about non-voters that are relevant for this discussion - they are more inclined to the Democrats than voters are, and they know less about politics than voters do. This could make all the difference.

Suppose, for instance, that the national mood favors the Democrats. This would induce both voters and non-voters to move into the Democratic column when queried by a pollster - but if the non-voters move into the Democratic column at an exponentially higher margin, the rate of overestimation would increase. If 3 voters and 9 non-voters moved into the Democratic column in one year, and 4 voters and 16 non-voters moved into the Democratic column in another year - the overestimation of Democratic gains would not be the same between the two years. It would go from 200% to 300%. Also, suppose that the rate of non-voter support for the Democrats, even though it is higher than the rate of voter support, varies much more dramatically because non-voters pay less attention to politics. In some years, 9 non-voters support the Democrats for every 3 voters; in other years, 27 non-voters support the Democrats for every 3 voters. That would make the extent of overestimation bounce around quite a bit. The more likely non-voters are to be in the mix, the more likely we will see the overestimation vary.

In other words - the fact that non-voters are registering support for the Democrats might be what makes the generic ballot an invalid measure. Not just skewed. Not just pro-Democratic. Not just in need of a slight corrective. But invalid. As in - don't-trust-it-because-it-will-shoot-you-and-your-dog-and-leave-you-both-for-dead invalid.

Here's the kicker: this type of problem is not necessarily limited to the generic ballot. It could be in any poll of any size and scope. So long as (a) respondents are asked, implicitly or explicitly, to register a party preference; (b) the pollster is not able to isolate and exclude non-voters from the sample - we should suspect that poll to be invalid as well.

My intuition is that a great deal of summer polling is indeed invalid in this way - because, even though pollsters try to eliminate non-voters from the sample, they are harder to spot so far from Election Day. Almost everybody is "paying attention to the campaign" right now because there is no campaign right now. Almost everybody claims that they will vote right now, but it is easy to claim you will do something when you have 90 days until you actually have to do it. If the pollsters are not spotting the non-voters, any question about party preference in November would probably be invalid in the same way.

How can we confirm that these problems are present in other polls? Unfortunately, we cannot. I was only able to isolate the problems in the Gallup generic ballot of "national adults" because I had 24 elections worth of data to work with. Gallup is the only polling firm that has been asking the generic question for a sufficient length of time - and they only offer historical data about "national adults." What about their "registered voter" model? Who knows whether this problem is extant. The data is not available. But if that model is not getting rid of non-voters, it will still have these validity problems. Ditto for any generic ballot of any size for any type of voter by any polling firm. Insofar as a poll is not isolating and excluding the non-voters, the poll's questions about partisan preferences might not be valid indicators of election results.

Ditto also for any candidate-related polls. They might have the same problem, and we would never know. With questions that ask for partisanship in the context of candidates - e.g. "Do you support Republican Mike DeWine or Democrat Sherrod Brown?" - you only ever get one observation. The DeWine v. Brown battle is only going to happen once, not 24 times. So - those polls might also be invalid, and we would never know!

So what is the prudent response? It is skepticism. Be skeptical of summer polling. The Gallup generic ballot of national adults and all these other summer polls might very well have a lot of troubling characteristics in common with one another. Specifically, they all might have a large proportion of non-voters who have slipped through the screening process. This should make us wary.

One thing we do know by now: the summer Gallup generic of adults, absent some kind of corrective measure, is not valid for making election predictions. We should stay away. It is like Australian table wine - which, as Eric Idle once noted, "is not a wine for drinking. It is a wine for laying down and avoiding!"

© 2000-2006 RealClearPolitics.com All Rights Reserved


Email Friend | Print | RSS | Add to Del.icio.us | Add to Digg
Sponsored Links

Jay Cost
Author Archive