With one month to go before the U.S. election, political junkies’ Twitter feeds are bound to be flooded with political polls and forecast model estimates. Some of you will be checking FiveThirtyEight and Real Clear Politics compulsively. There’s no shame in that. There’s a lot that goes into polling, and reading polls is not as straightforward as checking who’s ahead and by how much. With that in mind, here at Caracas Chronicles we thought it was a good time to unleash our polling nerdom—ok, my polling nerdom—into a primer to help you better judge what you’re reading, and also help you put a lid on your emotions when numbers swing against your preferred candidate.
For the 2015 Venezuelan parliamentary election, Caracas Chronicles provided data-driven analysis of the election, and published Omar Zambrano’s forecasting model, which did pretty well. Since then, it has become impossible to do the same, because for data-driven analysis you need, you’ve guessed it, good data. There are very few decent polling firms in Venezuela, their samples are small, and even during elections most conduct polls too unfrequently for anyone to build informative polling averages or regularly updated forecasting models à la FiveThirtyEight. What’s more, their record in the last high-turnout election in the country was poor. Using past election results as the basis for forecasts doesn’t make sense after the emigration of more than five million people—out of which between 70% to 80% could be of voting age. But we political junkies have to get our fix somewhere, so for this post we’re focusing on an election where there’s plenty of data to dig into.
Think of Polls As a Range of Outcomes
Pollsters will frequently describe electoral polls as a snapshot of a race at a given point in time. While the snapshot analogy is useful to highlight the temporal limitations of polls, it fails to describe a key feature of the photo: it’s usually a very blurry photo, especially in close races.
One key issue is the margin of sampling error—or simply margin of error. Polls have margins of error because pollsters don’t ask every voter how they’ll vote; they only poll a sample of the population because asking everyone, or even close to everyone, would be too expensive, time-consuming, and unnecessary. They poll only a small portion, chosen randomly to avoid injecting their own biases in the sample, and not every random sample will yield the same results: if you ask a question to a 1,000 people you will get some results, and if you ask a different group of 1,000 people you will most probably get different results.
Pollsters hope the results from the sample will be reasonably close to the preferences of the whole population: they hope the sample will be a “representative sample” of the population. The larger the sample, the closest one should expect the poll results to match the true value, i.e. the smaller the margin of error. While we commonly use the word “population” when discussing surveys, in electoral polls we’re not interested in what the population prefers, but what voters prefer. We say that in this case our “universe of interest” are voters. Polling firms will define the universe in different ways: some report results among all adults, registered voters, or likely voters; or report all of these. Likely voter polls are usually preferred to the other two.
In statistics parlance, such small changes are most likely noise. Look instead for trends towards a candidate over some time, and their levels of support.
When reading polls results, you will usually find they report a margin error such as “plus/minus 2% at the 95% confidence level”. Let’s say a poll reports Donald Trump has a 40% approval rating. Then, at plus/minus 2% at 95% confidence, what the poll is saying is that “If we repeat this poll 100 times, in 95 of those polls Donald Trump’s reported approval rating will be within 2% of the true value, and the remaining five, the reported rating will be off by more than 2%”. And no one can tell for sure if any given poll is one of the 95 “correct ones” that’s close to the true value, or one of the five “wrong” ones.
A Poll Can Miss the Election Winner and Still Be ‘Right’
There’s a lot of uncertainty wrapped around electoral polls in close races. Close races, with polls where different outcomes are included within the margin of error, should be considered tossups—a statistical tie—even if a candidate is consistently ahead in every poll. The margin of error refers to the support level for each candidate, not to the difference between them. If the reported margin of error is 3%, and the difference between the candidates is 5%, it’s wrong to say that “the difference is outside the margin of error”. It’s not.
Say we get a Florida poll where Biden leads 52% to 48% over Trump, with a margin of error of plus/minus 3% at 95% confidence. That 4% lead for Biden looks solid at first sight, but that’s not what the poll says, but rather: “Biden’s vote should be in the 49% to 55% range, 95 out of 100 times, and Trump’s in the 45% to 51% range”. You should notice right away that those ranges include outcomes in which Trump beats Biden (say, 51% to 49%), the 4% “lead” notwithstanding. Biden’s lead, per this poll, should be between -2% and 10%. Biden could win by 10% or lose by 2% and the poll would still “be right”.
The UK Brexit referendum provides a cautionary tale on taking poll leads at face value. The Brexit win was framed as a failure of polling, but except for a few likely outliers (more on this later), Brexit polls in the two months before the vote were generally close. Out of the last 40 polls, 21 showed Leave ahead and 19 showed Remain leading, with the percentage of undecided voters regularly hitting double figures. The polling average (more on this latter, too), showed only slim leads for Remain, under 2%. The conventional wisdom was that Remain would win, but polls were pointing to a very close race with lots of undecided voters.
Polls Aren’t Only About Margins, But About Levels Too
At the time of writing, Biden was leading Trump in the RCP polling average for Pennsylvania, Wisconsin and Michigan by around 5%. Going into the 2016 election, Clinton had a lead over Trump by between 2% and 6% in these states; not that different from Biden. However, while margins appear similar, the levels of support are different.
Clinton’s support in these states was around 46% in the months before the election and never ventured into the +50% territory, while Trump’s support averaged around 40% before surging closer to the election. Trump ended up winning all three states by less than 1%, with Clinton getting no more than 47.5% of the vote.
Biden looks stronger than Clinton in these states because unlike Clinton’s, his average support is between 49% and 50%, has ventured over 50% at times, and Biden is regularly getting over 50% in polls. Clinton’s support of 46% left a lot of space for Trump to grow, even if he was behind, with polls reporting between 8% and 13% of undecided or third parties voters. By getting two-thirds of undecided voters—as some exit polls at the time suggested—Trump was able to beat Clinton.
Lacking any big news, scandals, or any explanation for a big change in the race, then outliers are most likely noise.
If a candidate is over or very close to 50%, with few undecided voters and no significant support from third parties, their rivals have an uphill battle to catch up. Unlike 2016, in 2020 the share of undecided and third-party voters appear to be in the low single digits. If Biden’s support is around 49%, Trump could get 75% of all undecided voters—better than in 2016—and still lose.
Beware the Noise
We should ignore news reports of Biden or Trump’s lead increasing or decreasing by small percentages. A 2% change is well within the margin of error of most polls. Even if voters’ preference remains unchanged, one would expect different polls to report different results, simply by virtue of random sampling of voters; polls will sometimes catch a bit more of one group, and other times a bit more of others. In statistics parlance, such small changes are most likely noise. Look instead for trends towards a candidate over some time, and their levels of support.
Be Wary of Massive Changes Too
As explained above, polls, by design, will sometimes be wrong. Electoral polls are usually designed so we expect them to be wrong about 5% of the time. So, for every 20 polls, we should get on average one “outlier”: polls that don’t look like the rest. For example, if a weekly poll by Reuters is regularly showing a Biden lead in the 6% to 10% range every week, and then suddenly reports a 4% Trump lead or a 20% Biden lead, we should ignore it. The press won’t ignore it, of course: that shift sounds like big news! But it likely isn’t, it’s an outlier.
There’s a caveat to this warning: large changes in polls could indeed be right when there’s a reason for it. Lacking any big news, scandals, or any explanation for a big change in the race, then outliers are most likely noise. However, if something important happened—say, a candidate was caught on tape talking about grabbing women’s genitals, or the FBI announced they’re investigating a candidate—then sure, a large change in polls could be reporting factual information about the state of the race.
Polls Should Be Weighted (i.e. Ignore Twitter polls)
Polling isn’t as simple as asking people questions and tallying answers. Pollsters want a “representative sample of the voters”, a slice that on average looks like the universe of voters as a whole. If voters are split 50-50 between men and women, they hope for a random sample that’s close to 50-50 men/women, and also matches shares of other categories, such as race and income levels.
However, random sampling seldom yields samples that match the shares in the right proportion for each category. First, there’s the random element: sometimes polls will catch too much or too little of one group. Second, some people might be less willing to answer polls. For example, younger people are less inclined to spend ten minutes answering questions from a stranger. Third, some groups are harder to reach; maybe they live in areas with bad cell phone reception, while some are easy to reach at their homes (the elderly) and others are not (college students).
To correct this issue, pollsters “weight” their results, i.e. they won’t simply average all answers, but will instead give more weight to some, without changing the actual answers. If they get a 60-40 men/women split, they’ll reduce the weight of men from 60% to 50%, and increase the weight of the women from 40% to 50%. Decent pollsters will do this for several categories, which could include sex, race, income, location and education. This is fairly standard; unweighted electoral polls are trash and is one of several reasons why you shouldn’t pay attention to electoral polls carried out on Twitter or in a website, which aren’t weighted.
Don’t Pay Much Attention to ‘Crosstabs’ in Polls with Small Samples
News reports of electoral polls typically report the “crosstabs” of the polls; i.e. the support levels within specific groups, such as “80% of African Americans respondents support Biden”. But remember, the smaller the sample, the larger the margin of error. When reporting subgroups from the poll, the sample can be a lot smaller than the full poll, and therefore carries larger margins of error. For example, in a June poll by The New York Times, 12% of the 1,337 respondents were African American; that’s just 160 people, a sample too small to read too much into the results.
In a typical poll of 1,000 respondents with a 3% margin of error, the margin of error of answers from subgroups could be around 8% to 15%, depending on several factors.
Polling Firms Make Choices that Impact Poll Results
When weighting polls, polling firms make choices. Where to get the population or voters’ shares to weight polls? From the census, voter registries, or exit polls from past elections? Decisions by firms will have an impact on the poll, and while they should try to remain as objective as possible, decisions made by humans will unavoidably inject their own biases and opinions. And they can be wrong. For example, by choosing to weight only by sex and race in the U.S., but not education, or use proportions that make wrong assumptions about the composition of the electorate.
Pollsters “weight” their results, i.e. they won’t simply average all answers, but will instead give more weight to some, without changing the actual answers.
Before the 2016 U.S. election, there were warnings that polling firms could be using incorrect shares to weight polls, which underestimated the number of white voters and white voters without college degrees; the latter, a key source of support for Trump. In September 2016, The New York Times and four survey experts processed the same answers from a poll of Florida, each weighing it according to their preferred methodology, and arrived at five different results, ranging from Clinton ahead by 4%, to Trump ahead by 1%—a 5% spread between results. The poll that gave Trump a 1% lead assumed 70% of the electorate was white, while the Clinton +4% poll assumed it was 65% white.
Choices by polling firms can result in “house effects”: polling firms that regularly report better results for one party, compared with other polls or to election results. In the U.S., Rasmussen polls typically report better results for Republican candidates, since the firm adjusts its results for “shy Republicans” or “shy Trump voters”: they believe some voters prefer not to reveal their support for Trump in live telephone polls (if Rasmussen were right, we should see differences between live polls and online polls; there isn’t, for the most part). For the past months, Rasmussen gave Biden a small lead of around 2% over Trump—other polls average around 7%—and between February to September, it was the only firm to report a Trump lead (of 1% in July). Conversely, Quinnipiac national polls frequently show Biden leading by double digits; just as “off” as Rasmussen but in the other direction.
Single polls can be way off, both because of random sampling and choices by polling firms. One way to address these issues is to average results from different polls, to cancel out sampling errors—cancel out polls that overestimate voting intention for candidate A, with polls that underestimate their level of support. The more polls included in the average, the closer the average will be to the true value.
Two of the more popular polling averages for the U.S. election come from FiveThirtyEight and Real Clear Politics. These are weighted averages by sample size so that a poll with a sample of 4,000 people will have more weight than a poll with a sample of 1,000. There are methodological differences between the two: the RCP average has a cutoff date—they only average the most recent polls—while the FiveThirtyEight average doesn’t, they rather weight polls by recency and several other factors, including polling firm quality.
Polling Averages Can Be Wrong, Too
In the U.S., polling averages tend to miss the actual national election result by around 2%, which is a lot larger than would be expected if the only source of errors was sampling error. Polling averages can cancel out errors in polls due to sampling error but are not immune to systematic errors in polling—errors in the same direction found in most or all polls due to pollsters’ choices, or problems reaching some groups.
In theory, polling averages should help correct for errors caused by these pollsters’ choices if some firms made errors in one direction, and a similar number of firms erred in the opposite direction. However, most polling firms weight polls using similar methodologies or shares available from public sources, so if these are wrong, errors could be replicated in polls from most firms.
If all polls are wrong, a polling average will show the average of wrong polls. These systematic errors only become apparent once the election results are in, when it’s too late. We already mentioned that in 2016, polling firms underestimated the share of white voters, and especially white voters without college degrees—around 45% of voters, among which Trump beat Clinton by a 2-to-1 margin— in the swing states that swung the election. It was the sort of systematic error that cannot be canceled out by polling averages.
In Florida, the “Hispanic” category fails to capture differences among different Hispanic groups: Cuban Americans tend to lean Republican, while Puerto Ricans lean Democratic, and Venezuelan Americans likely lean Trump but we can’t tell for sure if they lean Republican.
As Nate Silver explains, in the case of the U.S., these errors might cancel out in national polls, but not in state polls. At the national level, if polls overestimated Clinton’s performance with white voters without college degrees, and overestimated her performance with Latinos, the net effect could be close to zero. But in states with lots of these white voters, and few Latinos, the net effect would not be zero, but instead would overestimate Clinton’s overall performance. And that’s what happened with polling in the Midwest in 2016.
Furthermore, within some states, there might be differences within racial groups. For example, in Florida, the “Hispanic” category fails to capture differences among different Hispanic groups: Cuban Americans tend to lean Republican, while Puerto Ricans lean Democratic, and Venezuelan Americans likely lean Trump but we can’t tell for sure if they lean Republican. A recent poll among Venezuelans, in Florida, showed strong support for Trump over Biden, at 66%-34%, but their method to choose respondents wasn’t random, it was dubious, at best, and the sample was small as only 30% of respondents were actually registered to vote, while another survey showed barely 5% of of Venezuelan American voters were registered as Republicans, and 33% as Democrats. A decent poll should account for these differences within groups when weighting. In national polls, these differences within groups in specific states tend to cancel each other out; in state polls, they might not.
Election Forecast Models
Since the 2008 U.S. election, forecast models have gained in popularity, most notably FiveThirtyEight’s. This year, The Economist also came out with their model, which is also updated regularly. These models take both national and state polls, as well as other “fundamentals”—such as the state of the economy—and combine them to create thousands of simulated scenarios for the outcome of the election. While people and the press tend to read these models as explicit forecasts of the result—“FiveThirtyEight says Clinton will win for sure because they give her an 80% chance!”—these are probabilistic models, i.e. they account for the fact that polling is uncertain, and seek to report who’s favored to win, and not who’s going to win. When the FiveThirtyEight model gave Clinton a 70% chance of winning in 2016, what they were saying was “If we repeat this election 100 times, we believe Clinton would win in 70 of those elections”.
Say Germany will play Venezuela tomorrow in a football match, and a betting website gives the Germans a 95% chance of winning. That sounds about right: if these teams meet 20 times, Germany should win 19. If Venezuela then beats Germany, should we say the odds were wrong? Of course not. Football matches, even between national teams of vastly different skill levels, are events with uncertain outcomes. Weird things can happen; Goliath sometimes has a bad day. And the same goes for elections.
Of course, each specific election only happens once, we have no way to repeat the same election under the same conditions 100 times to see if the model was right. We only get one. What these models do is try to simulate the election more than once in a computer, and see what happens.
FiveThirtyEight’s model reflects the U.S. Electoral College; the model simulates the presidential election in each state and then adds up the Electoral College delegates to see who would win the election. An extremely simplified way of thinking about these forecasting models is to think of it as drawing numbers from 1 to 100 from a bowl to decide the winner of each state, with each candidate getting numbers assigned to them according to the polls and other factors. In a state like Alabama, the model would assign something like 97 numbers to Trump and three to Biden, and the opposite in Virginia. In Ohio, it would give about 52 to Biden and 48 to Trump. The model then “draws” numbers for all states and tallies the delegates to determine the winner of the simulation. It runs this simulation 40,000 times and tallies the results. If Biden wins in 28,000 simulations, then the model reports Biden has a 70% chance of winning.
The FiveThirtyEight model is a bit more sophisticated than that, of course. It doesn’t run 50 completely independent simulations but instead accounts for links between state results. For example, if Trump overperforms the polls in Ohio in one scenario, the model knows he’ll likely also overperform polls in neighboring states with similar demographic profiles, such as Pennsylvania. The model also tests random variations in several voting groups; for example, scenarios in which Biden does very well with white voters without college degrees, or Trump overperforms polls with African American voters.
Poll-Based Models Need Good Polls
Needless to say, models that use polls depend on poll quality. FiveThirtyEight’s model tests scenarios in which polls are wrong by random margins, too. But it can’t completely escape the impact of systematic errors in polls.
Even if the model accounts for polling errors, polls can be wrong in different directions and areas, with different impacts on the election outcome: it doesn’t only matter if polls are wrong but where they’re wrong. In 2016, Clinton overperformed the model estimates in 11 states—which, unluckily for her, were mostly solidly Democratic states—and underperformed the model in vital swing states in the Midwest. Had Clinton instead underperformed the model in Democratic states and overperformed in swing states, she would have won handily. To put it another way, it doesn’t matter much if polls in California, New York, and Alabama are way off—states where one candidate absolutely dominates—but if they’re way off in any direction in Pennsylvania, Michigan, and Florida, models can’t help us.
So We Shouldn’t Trust Polls and Models?
“Trust” is probably the wrong word to use when discussing polls. Electoral polls should be used to inform who’s favored and who’s not, and not attempt to make definitive predictions. Furthermore, timing matters, as polls can fail to capture changes in voters’ preferences late in the race, and even if they did, we might not have enough late polls to separate noise from true changes. People change their minds, and a poll could be “right” one day and be “wrong” hours later.
Sure, if a candidate is leading 70%-30% in twenty polls with large samples, their victory should be close to certain. But who cares about polls in 70-30 races; we are interested in polls in close races.
For close races, it’s good to always think probabilistically about polls and models, think of them as describing the likelihood of certain outcomes, and not predicting the final outcome with certainty, since elections are events with lots of uncertainty around them.
This post was originally posted on Caracas Chronicles – View Original Article