Unskewing the Election
Not the polls but voter impulse this time
|Cropped from source|
Today, Election Day in the USA, we discuss the state of those stating the state of the election.
FiveThirtyEight has the election much closer than most of the other forecasters do. But Silver is no “nut”—last election, in 2012, he was right about the winner of all 50 states and the District of Columbia.
As of their Tuesday morning update, they gave Donald Trump almost a 30% chance of winning, against 70% for Hillary Clinton. For contrast, the Princeton Election Consortium site of Sam Wang and Julian Zelizer has had Clinton over 99% probability in both its “random drift” and “Bayesian” measures, and the Huffington Post gave her 98.2%. Nate Cohn’s New York Times Upshot model put Trump with a 16% chance, but that is still only half what FiveThirtyEight has been giving him. The next-higher numbers in forecasts compared here gave Trump 12% and 11%. Senate forecasts have had similar disparity.
This past weekend, Silver was called out by Ryan Grim in a Huffington article titled, “Nate Silver Is Unskewing Polls—All Of Them—In Trump’s Direction.” The term “unskewing polls” means altering assumptions about the makeup of polling samples to correct perceived bias. In 2012 the complaints of bias in the data used by Silver came mainly from the Republican side and were proved wrong by the results. This year the thunder about numbers seems all on the left.
Uncertainty and Trends
The main difference cited by Silver is the higher number of voters telling pollsters they are undecided or supporting third-party candidates compared to 2012. There is also greater uncertainty about the effects of news developments such as releases by Wikileaks, the FBI investigation into Clinton’s e-mail server, Obamacare premium hikes, and scandalous past behavior by various people.
There have also been greater movements in polls. Here is the graph of Silver’s forecasts from 2012, when FiveThirtyEight was a blog of the New York Times:
The one counter-trend came after Barack Obama’s poor performance in the first debate with Mitt Romney. There is no evidence that Hurricane Sandy had any effect at the end of October 2012. Now here is the current graph of FiveThirtyEight’s odds over the past few months:
The first sharp movement was registered the week after FBI Director James Comey’s July 5 press conference characterizing Clinton’s e-mail use as “reckless” but not indictable. That brought FiveThirtyEight’s model to parity on July 30, two days after the end of the Democratic convention, but polls completed the next week shot back and continued amid Trump’s unseemly tangling with Khizr and Ghazala Khan. A long trend back to parity, perhaps accelerated by Clinton’s “bad weekend” of Sept. 9–11, bounced again following the first debate on Sept. 26th. The past four weeks have seen a rounding turn into a slide correlated with the Oct. 28 FBI letter re-opening the e-mail investigation of Clinton, and just in the past two days a 7-point jag. The New York Times shows similar movements but not as sharp:
Others have similar graphs. What go into these aggregate models are the polls, and by and large the polls have shown similar movements. Hence I think the key this time is not unskewing the polls but rather the electorate.
“Swing” and “Heave”
I’ve been musing on the possible relevance of freighted phenomena I’ve found while extending my chess model since spring. Heretofore I’ve focused on projecting the best moves; now I want to refine accurate projections for all the moves in a given position. Doing so will confer authority on statistical tests for whole categories of moves—such as captures, moves with Knights, moves that advance or retreat, and moves within a given range of inferiority.
A year ago I reported on work with my student Tamal Biswas, who is now on the University of Rochester faculty after defending his dissertation in July, on implementing a parameter for “depth of thinking.” Computer chess programs all work in rounds of increasing depth of search, and this furnishes an axis of time for human players thinking in the same positions.
Our papers linked from that post show that swings in a program’s value for a given move as the search progresses correlate mightily with the frequency of the human players choosing (or having chosen) that move. For instance, we noted that even for the world’s best players, the frequency can range from 30% to 70% depending only a numerical measure of the swing formulated by Tamal, with the ultimate value of the move in relation to values of alternative moves being held equal. The swing measure also perfectly numerically explains a puzzling “law” which I posted about four years ago.
Last year’s post, however, also reported extreme difficulties with modeling a depth-of-thinking parameter directly. Hence we’re trying a simpler tack of fitting a multiplier on the swing quantity . The ‘h’ is for “heave” by analogy with a ship riding above or below the water line. My usage is not quite “nautically correct”: a ship will heave to for stability in wavy seas, whereas my measures the tendency to be carried away by them. But my modeling supports the following interpretation:
A value h > 1 means that the player(s) are influenced more strongly by swings in values than by the ultimate objective values themselves.
Where previously I had a term relating the difference in value between a move and the machine’s best move to my model’s “sensitivity” parameter , now I have terms like
involving as well. The measure is formulated as an average of values over all depths of search, so I am confident that its units support the interpretation. There are further wrinkles according to whether the overall position value and/or the swing values are negative, and they are all immersed in only-halfway-better forms of the above-mentioned fitting difficulties, so anything I say now is preliminary. But what I am seeing seems consistent enough to report the following:
For chess players of all Elo ratings from novice levels 1050, 1100, 1150, 1200, … to the world championship standard of 2800, the h values are by-and-large all in the range 1.3 to 2.3, and concentrated in 1.5 to 2.0.
I can’t even yet say that I have a regular progression by rating, even though outside the levels 2000 through 2500 (which are most heavily populated among the millions of anthologized games), my training sets have all available games between players at each level (within 10-to-25 Elo points depending). These give tens to hundreds of thousands of data points for each level, all taken using the University at Buffalo Center for Computational Research (CCR)..
Heave Ho and Vote
My original model has neatly linear progressions in and in a second “consistency” parameter . A second indication that the “high-heave” phenomenon is real is that the three-parameter fits which I obtained in August make the progression steeper and throw the progression into retrograde as a damper. This unwelcome latter fact is a prime reason for tinkering further, besides the fitting landscape being no longer benign.
Thus I believe my model is currently being mathematically inconvenienced by people’s tendency to play moves on impulse and react to (changes in) trend. The measure ticks up when a move suddenly looks better at depth than it did at depth . Results in the papers with Tamal so far support the idea that humans considering such moves experience a corresponding uptick in their estimation. From my own games I recall times I’ve played a move when it suddenly “improved,” then regretted not thinking more on whether it was really better than alternatives.
To repeat, the chess work has not yet reached the point of fully substantiating the effect of swings in value. It is however enough to make me wonder when I see things like FiveThirtyEight’s graph of the race for party control of the Senate:
Are respondents being influenced more strongly by “political weather” than by a prior valuation of their candidates? Note especially the inflection after Comey’s Oct. 28 letter.
The polls are still open in many places as we post, and we have much less idea than we thought four and eight years ago of how things will shake out. Even after all votes are counted it may be hard to tell whether Silver was closer than the others. A strong Clinton win could be carried by the last-day upswing noted in FiveThirtyEight’s graph above, noting also its absence in the Senate graph. Let alone that the election might not be over by tomorrow, to judge by the squeaker in 2000, it will certainly take a long time to parse and “unskew” the election results.
How will we analyze the results of this election? And of course, who will win?
Update 10/9: As it shook out, Silver was merely the least wrong. The USC Dornsife / LA Times poll was distinctive in showing Trump ahead most of the time:
Likewise the Investor’s Business Daily / TechnoMetrica Market Intelligence poll. But even these need to be squared with Clinton’s evidently winning the popular vote. Update 10/10: Silver has a new article showing the effect of a 2% swing, meaning Trump’s share down 1% and Clinton’s up 1%.
[word changes, added links in intro, added update, jive->jibe in intro, added acknowledgment]