Plural Vote - More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs
More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs
More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs. For the purpose of transparency, in this article we are detailing our methodology in-depth.
Posted under 2020 ELECTION
Published 4 months ago
5
Minutes
0
Comments
Screenshot2020-07-01at6.09.42PM.png

Based on polls and search trends, the Plural Vote presidential forecast model for the 2020 race forecasts Biden having a 76.4% probability of winning the Electoral College. This marks the highest probability that we have recorded since the forecast was launched on Friday, March 27th.

PredictionObama2012.png

For this presidential race, our state projections depend primarily on polls to predict vote outcomes. In addition to polls, which constitute 2/3rds of our predictions, we incorporate a unique model based on "media partisanship". Its measurements are gathered from Google Trends in order to capture media polarization and thus predict vote margins. This search trends model tracks shifts in the relative frequency of searches for Fox News, Washington Post, MSNBC, New York Times, and Huffington Post. This model is combined with polling to form estimates of how each state will vote, which are more predictive of past election outcomes than polling averages alone.

PredictionClinton2016.png

For the purpose of transparency, in this article we are detailing our methodology in-depth. In addition, the source code for the Search Trends component of our statistical model (programmed in the R language) has been made available on GitHub. The source data for the comparison between polling averages and our estimates is also available on Github.

In 2016, polls alone had a correlation coefficient of r^2 = 0.910, whereas our model produces a higher correlation coefficient of r^2 = 0.941:

In 2012, polls had a correlation coefficient of r^2 = 0.954, whereas our model produces a higher correlation coefficient of r^2 = 0.964:

When our methodology is applied identically to the 2012 and 2016 elections, our state-by-state predictions prove more accurate than polling. Our search trends and polling model correctly predicted 46 out of 50 states in 2016 and 50 out of 50 in 2012. For reference, FiveThirtyEight called 45 out of 50 states in 2016 and 50 out of 50 correctly in 2012. As another point of comparison, raw polling averages called 46 out of 50 states correctly in 2016 and 49 out of 50 in 2012.

The mean absolute error of our model in 2012 across all states was 3.4 points. This represents 0.7 points less error than polls alone, which showed 4.1 points of error. In addition, the mean absolute error of our model in 2016 across all states was 4.3 points. This represents 0.2 points less error than polls alone, which showed 4.5 points of error.

In order to avoid overfitting, the methodology behind our model is neutrally devised and its retroactive predictions do not incorporate information available with the benefit of hindsight.

Below can be seen what our model would have predicted in the 2012 and 2016 races for each state.  Empty rows indicate that the state lacked polling and was solidly Republican or Democratic.

Model’s predicted vote margins for 2012:

State Search Trends + Polling Mean Absolute Error
District of Columbia
Hawaii -31.89 10.82
Vermont -33.35 2.25
New York -28.96 0.78
Rhode Island -25.50 1.96
Maryland -24.46 1.62
Massachusetts -21.54 1.60
California -18.87 4.25
Delaware
New Jersey -15.53 2.28
Connecticut -15.30 2.03
Illinois -18.24 1.37
Maine -16.00 0.71
Washington -12.25 2.62
Oregon -10.03 2.06
New Mexico -5.12 5.03
Michigan -5.29 4.21
Minnesota -6.48 1.21
Wisconsin -2.92 4.02
Nevada -7.59 0.91
Iowa -4.05 1.76
New Hampshire -0.50 5.08
Pennsylvania -6.38 0.99
Colorado -3.49 1.88
Virginia -6.93 3.06
Ohio -1.93 1.05
Florida -1.87 0.99
North Carolina 2.16 0.12
Georgia 7.59 0.23
Arizona 8.50 0.56
Missouri 10.35 0.97
Indiana 5.89 4.31
South Carolina 15.49 5.02
Mississippi
Alaska
Montana 11.90 1.75
Texas 13.55 2.23
Louisiana 16.04 1.17
South Dakota 13.96 4.06
North Dakota 16.27 3.36
Tennessee 20.10 0.30
Kansas 15.01 6.71
Nebraska 12.05 9.73
Alabama
Kentucky 12.16 10.53
Arkansas 19.71 3.98
West Virginia 14.71 12.05
Idaho 33.03 1.12
Oklahoma 24.00 9.54
Wyoming
Utah 36.36 11.68

Model’s predicted vote margins for 2016:

State Search Trends + Polling Mean Absolute Error
District of Columbia
Hawaii
California -24.90 5.21
Massachusetts -32.44 5.24
Maryland -26.21 0.21
Vermont -31.64 5.64
New York -27.21 4.72
Illinois -16.93 0.04
Washington -16.70 0.47
Rhode Island -18.08 2.58
New Jersey -16.17 2.07
Connecticut -17.33 3.69
Delaware -26.86 15.49
Oregon -14.15 3.17
New Mexico -9.19 0.97
Virginia -7.35 2.03
Colorado -5.66 0.75
Maine -8.26 5.30
Nevada -2.09 0.33
Minnesota -6.49 4.97
New Hampshire -4.10 3.73
Michigan -4.39 4.62
Pennsylvania -4.20 4.92
Wisconsin -4.33 5.10
Florida -1.65 2.85
Arizona 4.16 0.66
North Carolina 0.82 2.84
Georgia 6.43 1.34
Ohio 3.61 4.52
Texas 11.72 2.73
Iowa 1.05 8.36
South Carolina 10.53 3.74
Alaska 8.20 6.53
Mississippi 17.94 0.14
Utah 17.86 0.22
Kansas 12.55 5.87
Missouri 10.44 8.07
Indiana 10.98 8.03
Louisiana 18.05 1.59
Montana 19.96 0.46
Nebraska 23.70 1.35
Tennessee 14.96 11.04
Arkansas 23.27 3.65
Alabama
South Dakota 15.96 13.83
Kentucky
Idaho 26.85 4.92
North Dakota
Oklahoma
West Virginia 26.29 15.78
Wyoming

The methodology we apply for our Search Trends model to predict outcomes in an election straightforwardly adheres to these following steps:

1) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the previous election cycle.

2)  Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).

3) Create an OLS Linear Regression to fit the search trends data of the previous election cycle for the five media outlets in order to predict the state-by-state election outcomes of the election cycle prior to the previous one.

4) Gather the prediction error for each of the regression’s prediction vs. the outcomes for the previous election cycle.

5) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the current election cycle.

6)  Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).

3) Apply the OLS Linear Regression used to predict outcomes for the previous election cycle to create election state-level predictions of the current election.

4) Subtract from each state the prediction error of the model in the previous cycle.

5) Normalize the predictions for the current election cycle by subtracting the median in order to set the median of the state-level vote estimates to zero.

The R code for the unique search trends portion of our estimates is available on GitHub. The prediction generated by this model is weighted 1/3rd of our state-level forecasts, with the remaining 2/3rds being polling. We average polls through a LOESS moving regression. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the Beta, Weibull, and Logistic distributions. The Electoral College vote and probability for each candidate to win the majority of electors are estimated using 20,000 Monte Carlo simulations.

Feel free to follow @plural_vote for regular updates on the current electoral state of the 2020 race. Plural Vote updates daily its presidential and Senate race models.

Written by PLURAL VOTE. This article was last updated on 7/2/2020.
Read onTRENDING
1
FORECAST
Is Trump or Biden winning the race for president?
Our model depends on polls to predict vote outcomes. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the beta, weibull, and logistic distributions. The Electoral College vote and probability is estimated using 10,000 Monte Carlo simulations. With 218 days left, our poll-based statistical model forecasts that Biden has a 68% probability of winning the Electoral College on election day.
CONTINUE READING»
2
FORECAST
Will Democrats or Republicans win control of the Senate?
Our model depends on polls to predict vote outcomes. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the beta, weibull, and logistic distributions. The Electoral College vote and probability is estimated using 10,000 Monte Carlo simulations. With 218 days left, our poll-based statistical model forecasts that Biden has a 68% probability of winning the Electoral College on election day.
CONTINUE READING»
3
2020 ELECTION
Biden is outperforming 2016 in the Sun Belt by 9.6 points as he takes the lead in Texas and Georgia
Compared to 2012, the trend towards the Democrats in the Sun Belt is even more staggering — 10.6 points more for Vice President Biden than President Obama.
CONTINUE READING»
4
2020 ELECTION
New data show Biden making massive inroads with Trump's base, threatening Trump's re-election chance
In the general election, Biden is improving in states with non-college whites to a statistically-significant degree. This demographic shifted to Trump in 2016; a reversal bodes well for Biden. Additionally, Biden has been polling 8.4 points better in the Midwest than Clinton in 2016.
CONTINUE READING»
5
2016 ELECTION
Polls suggest that Biden is picking up some voters that opposed Clinton because of sexism
Biden is gaining more voters in states where women occupy fewer management positions, a proxy metric for sexism. Polls show Biden improving more on Clinton’s 2016 performance in states where women perform worse than men on various metrics of equality. If these metrics are viewed as proxies for sexism, this is broadly consistent with social science research indicating that hostile sexism played a role in Clinton’s loss in 2016.
CONTINUE READING»
More articles