Plural Vote - More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs
More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs
More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs. For the purpose of transparency, in this article we are detailing our methodology in-depth.
Posted under 2020 ELECTION Published 1 year ago
Based on polls and search trends, the Plural Vote presidential forecast modelfor the 2020 race forecasts Biden having a 76.4% probability of winning the Electoral College. This marks the highest probability that we have recorded since the forecast was launched on Friday, March 27th.
For this presidential race, our state projections depend primarily on polls to predict vote outcomes. In addition to polls, which constitute 2/3rds of our predictions, we incorporate a unique model based on "media partisanship". Its measurements are gathered from Google Trends in order to capture media polarization and thus predict vote margins. This search trends model tracks shifts in the relative frequency of searches for Fox News, Washington Post, MSNBC, New York Times, and Huffington Post. This model is combined with polling to form estimates of how each state will vote, which are more predictive of past election outcomes than polling averages alone.
For the purpose of transparency, in this article we are detailing our methodology in-depth. In addition, the source code for the Search Trends component of our statistical model (programmed in the R language) has been made available on GitHub. The source data for the comparison between polling averages and our estimates is also available on Github.
In 2016, polls alone had a correlation coefficient of r^2 = 0.910, whereas our model produces a higher correlation coefficient of r^2 = 0.941:
In 2012, polls had a correlation coefficient of r^2 = 0.954, whereas our model produces a higher correlation coefficient of r^2 = 0.964:
When our methodology is applied identically to the 2012 and 2016 elections, our state-by-state predictions prove more accurate than polling. Our search trends and polling model correctly predicted 46 out of 50 states in 2016 and 50 out of 50 in 2012. For reference, FiveThirtyEight called 45 out of 50 states in 2016 and 50 out of 50 correctly in 2012. As another point of comparison, raw polling averages called 46 out of 50 states correctly in 2016 and 49 out of 50 in 2012.
The mean absolute error of our model in 2012 across all states was 3.4 points. This represents 0.7 points less error than polls alone, which showed 4.1 points of error. In addition, the mean absolute error of our model in 2016 across all states was 4.3 points. This represents 0.2 points less error than polls alone, which showed 4.5 points of error.
In order to avoid overfitting, the methodology behind our model is neutrally devised and its retroactive predictions do not incorporate information available with the benefit of hindsight.
Below can be seen what our model would have predicted in the 2012 and 2016 races for each state. Empty rows indicate that the state lacked polling and was solidly Republican or Democratic.
Model’s predicted vote margins for 2012:
Search Trends + Polling
Mean Absolute Error
District of Columbia
Model’s predicted vote margins for 2016:
Search Trends + Polling
Mean Absolute Error
District of Columbia
The methodology we apply for our Search Trends model to predict outcomes in an election straightforwardly adheres to these following steps:
1) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the previous election cycle.
2) Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).
3) Create an OLS Linear Regression to fit the search trends data of the previous election cycle for the five media outlets in order to predict the state-by-state election outcomes of the election cycle prior to the previous one.
4) Gather the prediction error for each of the regression’s prediction vs. the outcomes for the previous election cycle.
5) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the current election cycle.
6) Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).
3) Apply the OLS Linear Regression used to predict outcomes for the previous election cycle to create election state-level predictions of the current election.
4) Subtract from each state the prediction error of the model in the previous cycle.
5) Normalize the predictions for the current election cycle by subtracting the median in order to set the median of the state-level vote estimates to zero.
The R code for the unique search trends portion of our estimates is available on GitHub. The prediction generated by this model is weighted 1/3rd of our state-level forecasts, with the remaining 2/3rds being polling. We average polls through a LOESS moving regression. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the Beta, Weibull, and Logistic distributions. The Electoral College vote and probability for each candidate to win the majority of electors are estimated using 20,000 Monte Carlo simulations.
Feel free to follow@plural_votefor regular updates on the current electoral state of the 2020 race. Plural Vote updates daily itspresidentialandSenaterace models.
Written by PLURAL VOTE. This article was last updated on 7/2/2020.
Our model depends on polls to predict vote outcomes. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the beta, weibull, and logistic distributions. The Electoral College vote and probability is estimated using 10,000 Monte Carlo simulations. With 218 days left, our poll-based statistical model forecasts that Biden has a 68% probability of winning the Electoral College on election day.