About this Blog
About The Author
Email Me

RealClearPolitics HorseRaceBlog

By Jay Cost

« Out of Touch? | HorseRaceBlog Home Page | A Reader Comments about the Primaries »

Technical Thoughts on the ARG Poll

I'd like to toss in my two cents about the ARG poll that John discussed this morning.

Polls are ways to estimate population parameters based upon the characteristics of a sample. So, if a poll says that 47% of respondents support Rudy Giuliani - we may be able to estimate that 47% of the public feels the same way.

"We may be able." Why'd I say that? It is because estimates such as those in a poll have two relevant characteristics. The first is efficiency. The second is biasedness. Both of these condition whether we can use a poll to infer the value of a population parameter. In other words, if a poll says 47% of respondents support Giuliani, and we want to know whether we can estimate that 47% of the public does - we have to evaluate whether our estimate is efficient and unbiased.

Efficiency is an important characteristic. It is a way to discuss the extent to which any given poll will diverge from the average result you'd obtain if you conducted the poll a multitude of times. Most public polls give you a snippet of information about efficiency every time they report their results. ARG says: "Margin of Error: ± 4 percentage points, 95% of the time, on questions where opinion is evenly split." What exactly does this mean? Suppose that you ran the ARG poll 1,000 times. The average value you find is 25% support for Rudy. The technical language above means that in 19 of 20 polls, you will find a result between 21% and 29%. This is a measure of efficiency. It indicates to us the extent to which any given poll will vary from the average poll value.

Importantly, in 1 out of 20 polls, we will find a result less than 21% or greater than 29%. This is critically important because, while the chance that any given poll is outside the margin of error is small, the chance that at least 1 poll in 20 is outside the margin of error is 64%! By my count, there have been at least 20 polls taken on the Republican nomination battle in the last two weeks. So, there is a 64% chance that at least one of them was outside the margin of error.

This is a consequence of the fact that there is always some inefficiency when you use a sample to measure a population. Fortunately, the RCP polling average offers a good way to enhance efficiency. Efficiency is partially conditioned by sample size. If you take a poll of 100 people, your result will be much more inefficient than a poll of 10,000 people (all else being equal). Accordingly, averaging the polls together diminishes ineffiency - and therefore the extent to which we can expect variation in the averaged value. In the case of the ARG poll, the results are interesting - but note how they are kind of "smoothed out" when they are factored into the RCP average for South Carolina.

Biasedness is an even more important consideration. Whereas there are statistical procedures to control for inefficiency - biasedness is much more difficult to deal with. It is also much harder to identify. This is the correct way to think about bias. Suppose that Giuliani's actual support among all South Carolina Republicans is 25%. In other words, the population parameter is 25% support. To measure this parameter, you conduct 1 million polls all via the same methodology. Then, you average all of the polls together. If the average value equals 25%, you can conclude that your methodology produces an unbiased estimate. That is, the average of all of your polls equals the population value. If, on the other hand, the average comes out to be 21%, your methodology is producing biased results.

Obviously, bias is a much more difficult thing to estimate. The perennial debate among bloggers about whether pollsters should allow partisanship to vary, or whether they should sample a set number of Republicans and a set number of Democrats is really a debate about minimizing bias. One side thinks that the other side's estimate biases the final results - i.e. the expected value of the poll is not identical to the population value of the characteristic being measured.

Bias, then, can be a product of the methodology that you use to put together your sample. Of course, a truly random sample is expected to be unbiased. But there are a whole host of problems with the idea of a random sample when it comes to preelection polls. After all, pollsters are taking samples of the voting population - this population does not yet exist! People have not yet voted. Registered voter polls might therefore induce bias - the voting population's preferences might differ systematically from the preferences of registered voters. So, pollsters often try to estimate who will and who will not vote. But then, bias is dependent upon whether or not their selection protocols are any good. If a poll screens out likely voters and fails to screen out unlikely voters - then the poll results might be just as biased.

This is one reason why I am so skeptical of all of these nameless and faceless polling firms. I take a very pragmatic approach to the issue of bias. Polling firms of which I know or which I know to be affiliated with a major news outlet are probably less likely to be biased than other polls. So, I like Gallup. I like ABC News/Washington Post. I like NBC/Wall Street Journal. I like Cook/RT, etc. I do not have detailed data on their sampling methodologies - but from my position of limited knowledge, the fact that these outfits are tied to major news figures with a stake in quality results make me more comfortable. Polls like ARG make me uncomfortable simply because I know so little about them. I feel like I have less warranty against bias.