Plural Vote - How we forecast the 2018 House midterm elections
How we forecast the 2018 House midterm elections
We go in depth here.
Posted under PLURAL VOTE
Published 2 months ago
5
Minutes
0
Comments
Plural Vote is forecasting the electoral results for all of the nation's 435 congressional districts in the upcoming 2018 midterm elections. Plural Vote estimates what percentage of the vote Republicans and Democrats will win nationwide based on the current generic ballot polling average. From this estimated popular vote margin, Plural Vote determines the expected vote margin of every congressional district, based on each seat's House Partisan Lean - a measure of how strongly Republican or Democratic the district leans relative to the nation as a whole. Plural Vote calculates the House Partisan Lean score of every district by weighing how the district voted in past elections in a manner predictive of future elections. Relying on our seat-by-seat estimates and the past accuracy of generic ballot polling in predicting midterm outcomes, we then calculate the probability of Democrats and Republicans winning the majority of House seats in order to determine which party is favored to capture the House majority.

Our forecast, taken as a whole, is a multilevel model (MLM). Each district has its own forecast generated based on its House Partisan Lean score, and has its uncertainty (or probability of a large enough generic ballot polling average error to flip the seat) calculated around its prediction.

Every individual district's forecast and rating is based on two factors: the predicted national popular vote margin, and each seat's House Partisan Lean score. Plural Vote's measure of House Partisan Lean is an indication of how a congressional district will vote in the next House race relative to the overall national popular vote. In order to determine how Republican or Democratic each seat is expected to be, we take into account the incumbent running, the past two presidential election returns, and the past House election result.

To forecast the outcome of every House race, as we do in our House ratings and in our House forecast model, Plural Vote adds the House Partisan Lean of each district to the expected national popular vote margin (which is approximated by the generic ballot polling average).

Using Plural Vote's calculation of the House Partisan Lean for every seat proves very predictive of House election outcomes when coupled with the election's final generic ballot polling average. When applied to the past two House elections, Plural Vote's model predicts the winner of 96.78% of seats correctly. This means 97.01% of seats were forecasted correctly for 2016 (422 out of 435), and 96.55% for 2014 (420 out of 435 seats).

Because we take into account the last House vote margin of each district in addition to the last two presidential vote margins, the House Partisan Lean of every district better predicts how each one will vote than the Cook Political Partisan Voting Index.

To sum up how we calculate the House Partisan Lean of every district, we used a regression in order to determine the formula that would best predict House results. Based on this regression, we calculate the measure of House Partisan Lean for each seat by weighting them as follows:

Seats with an incumbent running:
• 1/3 (33.3%) - 2016 congressional result, relative to popular vote
• 1/3 (33.3%) - 2016 presidential result, relative to popular vote
• 1/3 (33.3%) - 2012 presidential result, relative to popular vote

Seats without an incumbent running:
• 2/8 (25%) - 2016 congressional result, relative to popular vote
• 3/8 (37.5%) - 2016 presidential result, relative to popular vote
• 3/8 (37.5%) - 2012 presidential result, relative to popular vote

Ex: The Nebraska 2nd district is assigned a House Partisan Lean score of 5.4 points. This would mean that we estimate that if Democrats were to win the national popular vote by 8 points, we would expect them to win his district by 2.6 points. Both his district and the national popular vote would shift by 8 points.


For each district in the model, we make an assumption of normality. This means that we assume a normally-distributed error for the probability distribution generated for each district. We model the probabilities of the prediction error on a cumulative distribution function based on logistic distribution. The logistic distribution curve we employ looks like the one on the left. Later, we explain more how we derived the variance of the distribution.

The assumption of homoscedasticity is also made, that the prediction error for each district is of the same variance and probability distribution.

We also assume that all of the districts' prediction errors, or observations, are not independent of one another, and rather are completely correlated to one another, based on the overall nationwide popular vote (which we derive from generic ballot polling). This means that we do not assume that each district’s probabilities are independent from one another. We do this in order to ensure that the model reflects an appropriate level of uncertainty. We could have used a Monte Carlo method, which depends on running thousands of random simulations of error across every district, but the problem with this method is that it does not take into the heavy impact the national environment has on the overall win probabilities. In 2016, models that operated on a strong assumption of independence between state-level polling error performed worse (such as those by Sam Wang and the Huffington Post) than those with a stronger assumption of correlation between the errors (such as FiveThirtyEight). Furthermore, it would not make much sense to make an assumption of independence on the district level as we have little polling data available yet for congressional races, and rather are currently relying on the parameter of generic ballot polling and the way districts are structured.

To generate each congressional district’s prediction, or estimate, we factor in four different variables. One of them is the binary variable of whether a seat is open or not. An open seat refers to a district whose incumbent is not running for re-election. The other three variables are metrics of the district’s partisan leaning based, or the strength of each party in that district, based on the last two presidential elections and the last congressional election. One variable of partisan lean is how the district voted in the 2012 presidential result relative to the nationwide popular vote. Another indicator of partisan lean is how the district voted in the 2016 presidential result relative to the nationwide popular vote. The last variable of partisan lean is how the district voted in the 2016 congressional election relative to the nationwide popular vote.

We calculate our prediction for each congressional district based on these measures of partisan lean, coupled with our estimate of the national popular vote. For each district, here is how we weight each measure of partisan strength based on whether the seat has an incumbent running or not:

Open seat:
25% - 2016 congressional partisan lean
37.5% - 2016 presidential partisan lean
37.5% - 2012 presidential partisan lean

Incumbent-challenger race:
75% - 2016 congressional partisan lean
12.5% - 2016 presidential partisan lean
12.5% - 2012 presidential partisan lean

By weighting seats with incumbents differently than open seats, we account for the incumbent's advantage or strength as a candidate. With every district factored in, the median district leans 8.9 points more Republican than the overall popular vote. The median district represents the 218th seat, which is needed to capture the majority. Thus, we assume that Democrats would have to win the popular vote by approximately 8.9 points.

The generic ballot refers to a survey that asks respondents whether they will vote for Democrats or Republicans for Congress. We use this to determine the national environment, and then calculate the forecast for every district relative to it, based on partisan leaning and incumbency. Democrats have maintained a solid lead in the generic ballot ever since Trump's election. In general ballot polling, however, there has been a pretty persistent overestimation of Democratic support in all of the recent election cycles. This is a phenomenon that is often overlooked when discussing generic ballot polling, and yet is one that FiveThirtyEight has previously acknowledged. Consequently, we account for the consistent over-estimation of Democratic levels of support in determing November House election returns as relative to the polling average in March and April.

Thus, we attempt to correct our national popular vote estimate in light of this bias. If we we took the median of the polling error since 1998, we would be applying a bias correction of 4.8 points. However, there is some evidence that the gap is narrowing recently, as 2012 was actually approximately 4 points more Democratic in outcome than March and April polling averages suggested. Thus, we are taking the median of the last three election cycles - 2012, 2014, and 2016, which represents a bias of 1.1 points.

All of these elements come together to form a comprehensive House forecast.
Written by PLURAL VOTE. This article was last updated on 4/23/2018.
Read onTRENDING
1
FORECAST
Who will control the House of Representatives?
In our statistical model, we factor in national generic ballot polling, the partisan intensity of all of the nation's 435 districts, and each incumbent's strength as a candidate to form the most comprehensive 2018 House of Representatives forecast available. Our forecast fluctuates on a day-to-day basis as we update it for newly-vacated seats, candidate primary results, and polls.
CONTINUE READING»
2
FORECAST
How we forecast the House midterm elections
Our forecast, taken as a whole, is a multilevel model (MLM). Each district has its own forecast generated, and has its uncertainty (or probability) calculated around its prediction. Our forecast, taken as a whole, is a multilevel model (MLM). Each district has its own forecast generated, and has its uncertainty (or probability) calculated around its prediction. For each district in the model, we make an assumption of normality. This means that we assume a normally-distributed error for the probability distribution generated for each district. We model the probabilities on logistic distribution. The logistic distribution curve we employ looks like the following below. Later, we explain more how we derived the variance of the distribution.
CONTINUE READING»
More articles