About this Blog

RealClearPolitics HorseRaceBlog

By Jay Cost

HorseRaceBlog Home Page --> Polling/Public Opinion

Some Reflections on Polling in the Primaries

The polling has been bad this primary cycle. Last year's national polls were wrong - for both parties. The late polls in New Hampshire were wrong. Even in a state like Virginia, where McCain was supposed to win by a huge margin, he only won by a modest margin. What is more, pollsters have disagreed in state after state. One predicts California going to Romney. Another predicts it going to McCain.

What is going on?

There are surely many answers to the question. I'd like to suggest one I have not seen discussed in depth.

Let's start by taking note of an observation that political scientists have made since the 1950s. That is, average voters do not pay much attention to politics. This is a hard pill for political junkies to swallow, but swallow it we must. Indeed, I think most of the many inferential errors of inside-the-Beltway pundits can be chalked up to their false assumption that voters pay as much attention as they do. They need to get over this. It is just false.

We move from this starting point to a question: how do voters pick their candidate despite these low levels of information? Most researchers will tell you that they make use of cognitive heuristics - mental shortcuts that help them make decisions amidst uncertainty. Uncertainty is the consequence of inattention. Voters simply do not know that much about what is going on - but they nevertheless make a vote choice. Their mental shortcuts are tried-and-true ways of making good decisions despite this uncertainty.

Think of it this way. Political junkies might know the in's-and-out's of each candidate's health care proposals. They can thus make decisions about which they like best. However, the average voter does not have this kind of information. Yet, when he gets to the voting booth, he gets the same choice that the junkie does. The shortcut is his way through the uncertainty.

This leads us to another question - what serves as the shortcut? The answer for virtually the entire country is partisan identification. Upwards of 90% of the nation has some kind of party affiliation. This is despite the polls that identify the size of the independent vote as larger. It is not. There is, for sure, some portion of the country that tells the pollsters they are independent - but when we look carefully at them, we see that most of them lean to one party or the other.

Not only is party identification held by almost all of us - it is an incredibly precise predictor of vote choice. Republicans will almost always go 90/10 for the Republican candidate. Democrats will do the same for their candidate. Most of us have a party identification, and most of us rely on it quite heavily.

What does that mean in a general election? If partisanship is a near universal feature that is incredibly powerful, the preferences of most voters are anchored throughout the campaign - even though they are paying very little attention. They do not have to pay attention to know whom they will vote for. Accordingly, we will see the polls vary only a little bit throughout the campaign. Oftentimes, they will break in late October or even early November. However, the magnitude of the break will be relatively modest. This is not to say it will be inconsequential; just a few point swings in either direction could make a difference in many states. They might swing by +/- 10 points during the whole cycle - but this is paltry compared to some of the massive swings in this primary cycle.

A big reason for this stability is partisanship. As I said, it serves as an anchor. That is a good metaphor for it. Partisanship anchors preferences, keeping them from swaying, drifting, or listing wildly during the campaign.

In a primary campaign, voters must choose among candidates who are all of the same party. Partisanship therefore does not enter into their decisions. It is a non-factor. I think this might be inducing the wild swings in the polls. The polls are varying because the voters are; the voters are varying because their partisanship is not stabilizing their preferences.

Of course, primary voters tend to pay more attention to politics than general election voters. This probably makes them more able to make decisions without the use of their partisanship. Nevertheless, they still pay a price for not being able to use it. To say that primary voters are better informed than general election voters is not to say that they are well informed, or that they behave how the media implicitly assumes they do (i.e. carefully following every speech, parsing every sentence, keeping a constantly updated evaluation of the state of the race, etc). They do not do this.

It thus should be unsurprising that candidate personalities are so influential in voters' decision-making processes. How else do you make determinations when party distinctions are non-existent? Candidates often try to create clear contrasts, but these usually amount to making mountains out of molehills. The average voter is not really paying much attention, anyway. Thus, they have to go by their personal evaluations of the candidates.

This is not to say that vote choices are random. Clearly, there has been a regular pattern to the early contests. Certain types of voters obviously prefer certain types of candidates. The point is that without partisanship, personality is what makes the difference. And voters do not start to take careful note of personalities until late in the cycle. Consider that 48% of New Hampshire Democrats claimed to make their choice in the last week of the campaign. From a certain standpoint, that is incredible. If you think about all of the attention political junkies have paid to this race since last January - it is almost unbelievable to think that voters would not have decided months ago. But, if we put ourselves in the shoes of the average voters, and try to recreate their thought processes - it makes a lot of sense. Their partisanship cannot serve as a quick, easy guide. Thus, they have to take a good, long look at the candidates as people. Given their typical inattention to politics, the time when this happens is the last week or so.

This might explain the wide variability of the primary polling. Because they have not been anchored by partisanship - voter opinions have been unstable for most of the cycle, up until the very end when we are wont to see a massive break in one direction or another. The "error" in the polls might simply be a reflection of public indecision. For that matter, Clinton's massive lead through most of last year might have its origins here as well - without their partisanship, poll respondents had little to go on except their vague sense of the media's consensus view of the race. Predictably, they claimed to support Clinton. Finally, this might account for momentum. Voters take a close look at winners at precisely the moment they are basking in the glow of positive media coverage. Unsurprisingly, researchers have found that more informed voters are less susceptible to momentum effects.

These considerations have two implications. The first is good news for pollsters: life will get better for you! When we move into the general election - the polls will settle down and start agreeing with one another. I would note the stability in those head-to-head match-ups. They have barely budged an inch even as the race in both parties has been chaotic. Given that voters have their party identification to ground their general election responses to the pollsters - it makes sense they would be steady. The second is a warning to consumers of political news: continue to be wary of these primary polls. Without partisanship anchoring vote choices, they are still prone to large dramatic shifts at the last minute. That's what happens when voters try to make decisions without their use of most trusted cue, their partisanship.

On the ARG Poll

Anybody who checked Drudge today will have seen that there is a "shock poll" that puts Hillary Clinton 15 points in front of Barack Obama in Iowa. The polling company that produced the poll is ARG, and this is what it had to say about its results:

Hillary Clinton leads Barack Obama among women 38% to 21%, which is unchanged from a week ago (Clinton 36%, Obama 23% among women). Obama has lost ground among men to John Edwards and Clinton. Among men, Clinton is at 28%, Edwards is at 27%, Obama is at 16%, and Joe Biden is at 11%. A week ago, Obama was at 27% among men, followed by 21% for Clinton and 19% for Edwards.

This poll might indeed be a trend - the first sign of a swing back to Clinton among Iowa Democrats. Unfortunately, we will not be able to know for a few days - as polling companies presumably suspended operations over Christmas. I have a few caveats that I would put in place on this poll - just a few basic warnings about why we should not over-interpret these results.

***

1. ARG polled the weekend before Christmas, from 12/20 to 12/23. This might not be the best time to construct a sample of likely Iowa voters. No other poll I know of has come out with a sample taken from those days. This is a sign that most other pollsters were wary about Christmas weekend.

2. The ARG poll has Clinton up and Obama down by statistically significant amounts relatively to its last poll (12/16 to 12/20). On the Republican side, it has Mike Huckabee down and Ron Paul up by statistically significant amounts. This is a lot of movement - four candidates made statistically significant moves in the course of three days. Recall the last point, and note that these are three days when respondents probably were not thinking much about politics. December 20th to the 23rd are days usually filled-to-the-gills with last-minute holiday preparations. They are not great days for reflecting on the state of the presidential campaign. Thus, this movement might be due to the sampling effects mentioned in Point 1.

3. There are other elements of the poll that just don't scan with me. For instance, it shows Fred Thompson at 3% and Alan Keyes and Duncan Hunter both at 2%. ARG has shown Thompson low over the last few weeks - so this would not be a consequence of ARG's internal sampling method thrown off by the holiday weekend. But its last two samples estimated Thompson's support well below the rest of the Iowa polls. And 3% just does not pass the "smell test."

4. Mark Blumenthal has noted several interesting facts about ARG. First, they sample more heavily than any poll from first time Iowa caucus goers (on the Democratic side). This is probably why they usually have Edwards below where he is in the RCP Iowa Democratic average. Edwards is doing relatively well among previous caucus goers, but ARG is "diluting" their influence with the first-timers. Now, ARG's intuitions about first time caucus goers may be correct, but they are on the margins on this issue. Second, they did an extremely poor job of reporting their sampling methodology when Blumenthal requested it. They would not provide any information about respondent demographics, and they would not provide information about the number of long-time caucus goers in their Republican samples.

5. Just because differences between polls are statistically significant does not mean that they are necessarily caused by changes in the population. Clinton, Obama, Huckabee, and Paul have all made statistically significant moves - but some of these could still be statistical blips induced by the sample. This is as good a time as any to review exactly what statistical significance is.

The technical language that describes the margin of error usually reads something like this: "We are 95% confident that the true values are +/- 3%."

This is referring to Type I error, or the error of the false positive. It means that 95% of the time, when you take a poll and get 17%, the real world value will be between 14% and 20%. This also means that 5% of the time (or one time out of 20), it will be outside this range. This is the poll's tolerance of Type I error. The chances are 5% that you will have a false positive - you will believe that the real value is between 14 and 20 when in fact it is not.

But suppose you have 20 different statistics you are looking at. What are the chances that the real world value of at least one of them will be outside the margin of error simply due to sample effects? It is 64%!

This is something that is rarely noted when looking at poll trends - it is called the experiment-wise Type I error rate. When you look at the polls to divine trends, you are implicitly doing some form of statistical hypothesis testing. You are trying to determine whether changes are due to sampling error, or whether they are due to shifts in the population. To do this, you have to assume that sampling error will only explain so much variation in the polls. Usually (95% of the time), this assumption holds up. Occasionally (5% of the time), it does not. When it does not hold, you have committed Type I error. And the more polls you look at, the more likely it is that you have committed it.

***

Casual readers, please take note: I am not predicting that this is a blip. Contrary to what some have assumed, I do not make predictions about the ways the polls will move. That is a fool's errand. My point here is simply that it is possible that this movement is induced by sampling effects - and we should be careful not to over-interpret these results.

Interesting Internals in the ABC/WaPo Poll

A few days back, I saw that Clinton web video called "Caucusing Is Easy." You probably saw it, too. Anytime Bill does something, it gets noticed:

At the time, I thought it was a bit strange - insofar as it did not cohere with the conventional wisdom that Clinton and Edwards were the ones winning the support of veteran caucus voters, and Obama was winning over newer "voters." It seemed to me that this kind of video would be something to expect from Obama.

That last paragraph sports some important scare quotes - because newer voters often turn out not to be voters at all. Systematic survey evidence has picked this up - and it conforms with anecdotal accounts. Remember Nader's huge rallies in 2000? How about Dean's in 2004? This is why videos like this get made: young people are unreliable voters, and often need to be charmed into voting. Many pundits have speculated that the young supporters of Obama might be his Achilles' heel - as they are sufficiently motivated to come out and see him on a cold fall day, but not motivated to support him on caucus night. The former has some curiosity value. The latter? Not so much.

So, it surprised me that Clinton was the one creating a "hip" video about caucusing. Then I saw this in the Washington Post's write-up of its latest poll:

Overall, the poll points to some strategic gains for Obama. His support is up eight percentage points since July among voters 45 and older -- who accounted for two-thirds of Iowa caucus-goers in 2004. He also runs evenly with Clinton among women in Iowa, drawing 32 percent to her 31 percent, despite the fact that her campaign has built its effort around attracting female voters.

And despite widespread impressions that Obama is banking on unreliable first-time voters, Clinton depends on them heavily as well: About half of her supporters said they have never attended a caucus. Forty-three percent of Obama's backers and 24 percent of Edwards's would be first-time caucus-goers. Previous attendance is one of the strongest indicators of who will vote.

First off - WaPo does a good job of splashing some cold water on the statistical significance of the topline results. The margin of error on this poll is +/- 4%, so a 4 point lead for Obama is not statistically significant. However, statistical significance is conditioned by the number of observations. It has a lot in common with a simple computation of the standard deviation - which has the number of observations in the denominator. As the number of observations decreases, the standard deviation increases. It's the same basic premise for the margin of error.

So, when you are dealing with a subsample of the whole poll - say, voters 45 and older - the margin of error increases. I'd have to run a statistical test to confirm it (and the internals of the poll just do not provide the data to enable it) - but my intuition is that Obama's gains with older voters is not outside this increased margin of error. An 8 point difference between this poll and the July poll would be significant with the topline results, which are based on 500 observations - but probably not with a subsample of about 200 observations. WaPo was thus wrong to identify this as a "strategic gain" for Obama. It could very well be a sampling anomaly.

That being said - there are still some interesting inferences that we can make without running an undue risk of Type I error (i.e. wrongly concluding that something is significant when in fact it is not). The poll found Clinton and Obama relying on first time voters by about equal measure. The difference between their new supporters is not statistically significant - so the conventional wisdom about how Obama is relying on new voters more than Clinton does not hold. Accordingly, we have explained how Hillary's campaign could coax Bill onto a NordicTrack.

What probably does hold is the argument that Edwards is relying on new voters less than Clinton and Obama. The breakdown is basically 50/43/24. That is, (about) 50% of Clinton supporters are new, 43% of Obama supporters are new, and just 24% of Edwards supporters are new. Again, I would have to see more data than what ABC News/WaPo is providing - but my intuition is that the difference is statistically significant. Edwards is relying less on new voters.

This conforms with what Ana Marie Cox wrote last week in Time. Most of Edwards' supporters are reliable caucus goers. This might give us some clues about what to expect on caucus night. If attendance at the caucus is greater than what it has been in years past, that might bode well for Clinton and Obama. If it is equal to or less than what it has been, that might bode well for Edwards.

Another point on Clinton v. Obama. Clinton is splitting the female respondents evenly with Obama. She pulled in 31% of female respondents. He pulled in 32%. It is surprising to me that Clinton is not pulling in more females in this poll, given the tone of her campaign of late. These results make Mark Penn's promise to pull in Republican women seem like rhetoric designed to win over Democratic voters hungry for a victory.

For comparative purposes, I would note that the recent CBS News poll also found no statistically significant difference between Clinton and Obama among first time and long time caucus attendees. It also seems not to have found a statistically significant difference between them on levels of female support (although Clinton does lead this category by 12%, it is such a small sample that I suspect the lead is statistically insignificant), and levels of support among voters aged 45-65. However, it did find what appears to be a statistically significant difference on levels of support among voters 65 and older, though there is a tie between Clinton and Edwards among voters of that age. [Again - I'm "spitballing" these conclusions of significance because none of these polls ever give you the data you need to draw more assured conclusions.] So, all in all, I would say that the results of the ABC/WaPo poll roughly conform with the results of the CBS poll.

Now - it is important to remember that there are boundaries that we have to obey when drawing inferences from these Iowa polls. I wrote about this last week. Care is important because the Iowa Democratic caucus is a poor fit with the way polls are conducted. I am pretty sure that I have managed to color within these lines in this write-up - but we need to be careful.

What Moves the Polls?

Mark Mellman had a very excellent column in the Hill yesterday. The topic involved the validity of horse race polling.

This is what he had to say:

Regular readers have heard me rail against the inaccuracy of early polling, which often fails to presage ultimate electoral outcomes.

Yet I have also maintained that presidential elections are predictable based on the fundamentals -- incumbency, war/peace, prosperity and the like.

Recognizing the implicit contradiction, political scientists Gary King and Andrew Gelman asked, in one of academia's best-titled papers, "Why Are American Presidential Election Campaign Polls So Variable When Voters Are So Predictable?"

Their answer focuses on learning during the campaign, and while they may be right, I fear the problem runs deeper. [Snip]

Though it is heresy for a pollster to say it, the evidence also suggests people are only mediocre predictors of their own behavior. Responses to horserace questions a year out may be a special case of faulty prediction.

Hardly an original thought, it can be traced at least to Russian exile and founder of Harvard's sociology department Pitirim Sorokin, who titled his 1936 paper, "Can One Predict His Own Behavior 24 Hours In Advance?" His answer, based on a study of federal employees, was a resounding no: When asked how much time they would devote to various activities during the subsequent eight-hour workday, the average person was off by five hours.

My hat is off to Mellman - who actually relies upon the work of one of the best political scientists in the country, Gary King, to make an argument about politics. A very rare thing indeed! Usually, political scientists get no more "airtime" than the occasional self-evident quote that journalists integrate into their preconceived storylines.

Mellman's topic - the invalidity of early election polling - is one that I have discussed frequently on this blog. I'd like to extend this conversation because I think the problem with media polls gets to what I think are some serious failings in the way journalists and pundits analyze politics.

The best way to discuss this is simply to review the Gelman and King article - which, I should note at the outset, is an attempt to explain a problem with general election presidential polling. Their title indicates the question with which they tussle. Political scientists have developed models that do a very good job of predicting presidential elections based upon "fundamental" variables like incumbency, partisanship, and the state of the economy. All of these are available a long time before ballots are cast. Meanwhile, the polls run all over the place prior to Election Day. How to explain this?

Gelman and King offer what they call the "Enlightened Preference" Model. They assert that:
(1) Voters do not have full information throughout the campaign about the "fundamental variables" that ultimately drive vote choices.
(2) Voters do use all available information to make their decisions.
(3) Voters do not rationally account for uncertainty during the course of the campaign.

This explains how polls can vary so wildly, and yet final results can be so predictable. Voters base their election decisions on basic variables. Thus, their vote choices are quite predictable. But it is only at the end of the campaign that they have fully grasped the values of the variables. Additionally, they do not factor this lack of knowledge into their thought processes. And so, when pollsters dial them up - they rely on the data they have available, but give answers that are less certain than they realize.

Gelman and King write:

[W]ithout sufficient knowledge of their fundamental variables, and when asked to give an opinion anyway, most respondents act as they will in the voting booth on election day: they use the information at their disposal about their fundamental variables, and report a "likely" vote to the pollster. We believe that this report is sincere, but the survey response is still based on a different information set from that which will be available by the time of the election.

Note that this does not mean that the campaigns are useless. The campaign organizations are the agents that provide the information to the voters. If there is a rough organizational "balance" between the campaigns - then the relevant information will be communicated to the voters through them. The media can play a role here - by providing relevant information to the public so that their choices are as informed as possible. Of course, in the last several cycles, most of the media's work has been in covering the horse race.

Now, as I indicated, I think there is a lesson in this for all of us who produce and consume political news and analysis. Of course, bear in mind that this is not a certification of the Gelman/King theory. My specialties in political science are party and campaign organizations. It is not in public opinion or political psychology. I lack the credentials to certify this theory. My intuition is that they are pretty close to the truth - but I am not up on the latest scholarly research to tell you with certainty.

However, I do know enough about these subjects to know that popular accounts of voter psychology are seriously askew because they falsely assume both too much and too little of the average voter. Gelman and King summarize what they take to be the "journalistic model" of the political campaign. I think there is a lot of validity to the characterization:

Under this model, voters base their intended votes partly on fundamental variables, but considerably more on the day-to-day events of the presidential campaign. Voters are assumed to have very short memories, relying for their decisions disproportionately on the most recent campaign events and last piece of information they ran across. Candidates are thought to be able to easily "fool" voters by changing their policy stance during the campaign or causing the opposing candidate to say or do something foolish. [Snip]

Also according to the journalists' model, voters do not take their role in the process very seriously, have very little information of the campaign and the issues, and frequently do not vote on the basis of their own self-interest.

This, Gelman and King argue, is how journalists implicitly explain the day-to-day movement of the polls. I think that this is a valid explanation of the way journalists and pundits approach politics. And, like Gelman and King, I think it is completely wrong-headed.

I would note three salient features about this false view of the voters:

(1) It assumes too much of them. Implicit in this theory is the idea that average voters pay as much attention to politics as political junkies do. We saw a great example of this early this month when pundits and journalists started talking about how the last Democratic debate "changed" everything. Give me a break! 2.5 million people watched that debate. It didn't change a thing! To think that it did requires one to accept a false premise: that average voters follow politics like the junkies.

(2) It assumes too little of them. It assumes that they can be beguiled - that, for instance, they voted for Bush over Dukakis because the latter put a dumb-looking helmet on his head. Again - give me a break! There is an implicitly condescending attitude toward citizens in media theories about vote choice.

(3) It assumes that average voters are quite like journalists. They love the gamemanship of politics. The twists and turns of the daily political soap opera is what they find to be valuable. They care, for instance, that Hillary Clinton's latest wardrobe choice has a lower neckline. And they really care about the political strategy that influenced the decision.

If these three assumptions are false - then most of the media's horse race coverage is a skewed version of what is really going on. I have been arguing this point for quite a while on this blog. What the media assumes about the voters is simply not true - and therefore its analysis is based upon false premises, and thus is wrong-headed. To appreciate this - try the following experiment: take any standard issue pundit discussion that you see on the news, ask yourself whether they are falsely assuming these three things about the voters, and then ask yourself whether - if they stopped making these false assumptions - the analysis would be different.

What, then, drives general election poll numbers? Gelman and King argue that one of the things that is surely driving it is the partial information that has been collected to date. That is, voters are in the process of collecting the data they need to make an informed choice in November. When queried in, say, July - they have only acquired some of that data, which is what they use to make their selection.

I would argue that there is something else going on in these survey responses. Sitting in the background here is John Zaller, whose theory on public opinion merges very nicely with the theory on vote choices that Gelman and King proffer. His 1992 book, The Nature and Origins of Mass Opinions, is required reading on the graduate level - and I'd wager that a lot of undergrads are forced (much to their chagrin!) to slog through what is a very technical read. Zaller argues that one of the reasons public opinion varies is that respondents "receive" informational tidbits here and there from the dialogue of political elites. If those tidbits are essentially compatible with preconceived notions, they are "accepted" and mentally stored by the respondent - to be "sampled" from when the pollster comes calling.

Receive, accept, sample: the "RAS" Model of public opinion. So, in an ironic twist - the analyzers of public opinion actually are the ones creating some of it. It is not that the last Democratic debate had an independent effect on the polls (if there indeed was an effect at all); the effect was caused by the fact that analysts predicted that it would have an effect. Mass opinion is influenced by the elite conversation - so when elites talk about how an event shaped public opinion, they are in fact helping to shape public opinion in that way. But this kind of self-fulfilling prophesying is just a "game" that is quite separate from the way in which vote choices are formed. Public opinion can be tweaked by the elite dialogue - but insofar as this dialogue is not providing information on critical variables, it is not influencing vote choices. It's just shifting the numbers temporarily.

Now, let me reiterate that my specialty is not political psychology or public opinion. I have read and have understood the "great works" in the field, but I am not up-to-date on the current literature. And, as I said, Gelman and King's model is meant for general presidential elections. Nevertheless, I think that this explanation is probably not too far afield from reality. At the very least, I am certain that there is a disconnection between the way average voters really are, and the way journalists and pundits see them - and that this disconnection induces many, many errors in the way the latter examine politics. These errors ultimately make the media dialogue irrelevant for what really matters: who wins the election.

Or, to quote Gelman and King:

"Journalists should realize that they can report the polls all they want, and continue to make incorrect causal inferences about them, but they are not helping to predict or even influence the election. Journalists play a critical role in enabling voters to make decisions based upon the equivalent of explicitly enlightened preferences. Unfortunately, by focusing more on the polls and meaningless campaign events, the media are spending more and more time on "news" that has less and less of an effect."