Not the polls but voter impulse this time

 Cropped from source

Nate Silver has gone out on a limb. Four years ago we posted on how the forecast of his team at FiveThirtyEight jibed with polls and forecasts by other poll aggregators. This year there is no jibe.

Today, Election Day in the USA, we discuss the state of those stating the state of the election.

FiveThirtyEight has the election much closer than most of the other forecasters do. But Silver is no “nut”—last election, in 2012, he was right about the winner of all 50 states and the District of Columbia.

As of their Tuesday morning update, they gave Donald Trump almost a 30% chance of winning, against 70% for Hillary Clinton. For contrast, the Princeton Election Consortium site of Sam Wang and Julian Zelizer has had Clinton over 99% probability in both its “random drift” and “Bayesian” measures, and the Huffington Post gave her 98.2%. Nate Cohn’s New York Times Upshot model put Trump with a 16% chance, but that is still only half what FiveThirtyEight has been giving him. The next-higher numbers in forecasts compared here gave Trump 12% and 11%. Senate forecasts have had similar disparity.

This past weekend, Silver was called out by Ryan Grim in a Huffington article titled, “Nate Silver Is Unskewing Polls—All Of Them—In Trump’s Direction.” The term “unskewing polls” means altering assumptions about the makeup of polling samples to correct perceived bias. In 2012 the complaints of bias in the data used by Silver came mainly from the Republican side and were proved wrong by the results. This year the thunder about numbers seems all on the left.

## Uncertainty and Trends

The main difference cited by Silver is the higher number of voters telling pollsters they are undecided or supporting third-party candidates compared to 2012. There is also greater uncertainty about the effects of news developments such as releases by Wikileaks, the FBI investigation into Clinton’s e-mail server, Obamacare premium hikes, and scandalous past behavior by various people.

There have also been greater movements in polls. Here is the graph of Silver’s forecasts from 2012, when FiveThirtyEight was a blog of the New York Times:

The one counter-trend came after Barack Obama’s poor performance in the first debate with Mitt Romney. There is no evidence that Hurricane Sandy had any effect at the end of October 2012. Now here is the current graph of FiveThirtyEight’s odds over the past few months:

The first sharp movement was registered the week after FBI Director James Comey’s July 5 press conference characterizing Clinton’s e-mail use as “reckless” but not indictable. That brought FiveThirtyEight’s model to parity on July 30, two days after the end of the Democratic convention, but polls completed the next week shot back and continued amid Trump’s unseemly tangling with Khizr and Ghazala Khan. A long trend back to parity, perhaps accelerated by Clinton’s “bad weekend” of Sept. 9–11, bounced again following the first debate on Sept. 26th. The past four weeks have seen a rounding turn into a slide correlated with the Oct. 28 FBI letter re-opening the e-mail investigation of Clinton, and just in the past two days a 7-point jag. The New York Times shows similar movements but not as sharp:

Others have similar graphs. What go into these aggregate models are the polls, and by and large the polls have shown similar movements. Hence I think the key this time is not unskewing the polls but rather the electorate.

## “Swing” and “Heave”

I’ve been musing on the possible relevance of freighted phenomena I’ve found while extending my chess model since spring. Heretofore I’ve focused on projecting the best moves; now I want to refine accurate projections for all the moves in a given position. Doing so will confer authority on statistical tests for whole categories of moves—such as captures, moves with Knights, moves that advance or retreat, and moves within a given range of inferiority.

A year ago I reported on work with my student Tamal Biswas, who is now on the University of Rochester faculty after defending his dissertation in July, on implementing a parameter for “depth of thinking.” Computer chess programs all work in rounds of increasing depth of search, and this furnishes an axis of time for human players thinking in the same positions.

Our papers linked from that post show that swings in a program’s value for a given move as the search progresses correlate mightily with the frequency of the human players choosing (or having chosen) that move. For instance, we noted that even for the world’s best players, the frequency can range from 30% to 70% depending only a numerical measure ${sw}$ of the swing formulated by Tamal, with the ultimate value of the move in relation to values of alternative moves being held equal. The swing measure also perfectly numerically explains a puzzling “law” which I posted about four years ago.

Last year’s post, however, also reported extreme difficulties with modeling a depth-of-thinking parameter ${d}$ directly. Hence we’re trying a simpler tack of fitting a multiplier ${h}$ on the swing quantity ${sw}$. The ‘h’ is for “heave” by analogy with a ship riding above or below the water line. My usage is not quite “nautically correct”: a ship will heave to for stability in wavy seas, whereas my ${h}$ measures the tendency to be carried away by them. But my modeling supports the following interpretation:

A value h > 1 means that the player(s) are influenced more strongly by swings in values than by the ultimate objective values themselves.

Where previously I had a term ${\delta(v_i,v_1)/s}$ relating the difference in value between a move ${m_i}$ and the machine’s best move ${m_1}$ to my model’s “sensitivity” parameter ${s}$, now I have terms like

$\displaystyle \frac{\delta(v_i,v_1) + h \cdot sw(m_i,m_1)}{s}$

involving ${h}$ as well. The measure ${sw(m_i,m_1)}$ is formulated as an average of ${\delta(\cdot,\cdot)}$ values over all depths of search, so I am confident that its units support the interpretation. There are further wrinkles according to whether the overall position value ${v_1}$ and/or the swing values are negative, and they are all immersed in only-halfway-better forms of the above-mentioned fitting difficulties, so anything I say now is preliminary. But what I am seeing seems consistent enough to report the following:

For chess players of all Elo ratings from novice levels 1050, 1100, 1150, 1200, … to the world championship standard of 2800, the h values are by-and-large all in the range 1.3 to 2.3, and concentrated in 1.5 to 2.0.

I can’t even yet say that I have a regular progression by rating, even though outside the levels 2000 through 2500 (which are most heavily populated among the millions of anthologized games), my training sets have all available games between players at each level (within 10-to-25 Elo points depending). These give tens to hundreds of thousands of data points for each level, all taken using the University at Buffalo Center for Computational Research (CCR)..

## Heave Ho and Vote

My original model has neatly linear progressions in ${s}$ and in a second “consistency” parameter ${c}$. A second indication that the “high-heave” phenomenon is real is that the three-parameter fits which I obtained in August make the ${s}$ progression steeper and throw the ${c}$ progression into retrograde as a damper. This unwelcome latter fact is a prime reason for tinkering further, besides the fitting landscape being no longer benign.

Thus I believe my model is currently being mathematically inconvenienced by people’s tendency to play moves on impulse and react to (changes in) trend. The ${sw}$ measure ticks up when a move suddenly looks better at depth ${d+1}$ than it did at depth ${d}$. Results in the papers with Tamal so far support the idea that humans considering such moves experience a corresponding uptick in their estimation. From my own games I recall times I’ve played a move when it suddenly “improved,” then regretted not thinking more on whether it was really better than alternatives.

To repeat, the chess work has not yet reached the point of fully substantiating the effect of swings in value. It is however enough to make me wonder when I see things like FiveThirtyEight’s graph of the race for party control of the Senate:

Are respondents being influenced more strongly by “political weather” than by a prior valuation of their candidates? Note especially the inflection after Comey’s Oct. 28 letter.

The polls are still open in many places as we post, and we have much less idea than we thought four and eight years ago of how things will shake out. Even after all votes are counted it may be hard to tell whether Silver was closer than the others. A strong Clinton win could be carried by the last-day upswing noted in FiveThirtyEight’s graph above, noting also its absence in the Senate graph. Let alone that the election might not be over by tomorrow, to judge by the squeaker in 2000, it will certainly take a long time to parse and “unskew” the election results.

## Open Problems

How will we analyze the results of this election? And of course, who will win?

Update 10/9: As it shook out, Silver was merely the least wrong. The USC Dornsife / LA Times poll was distinctive in showing Trump ahead most of the time:

Likewise the Investor’s Business Daily / TechnoMetrica Market Intelligence poll. But even these need to be squared with Clinton’s evidently winning the popular vote. Update 10/10: Silver has a new article showing the effect of a 2% swing, meaning Trump’s share down 1% and Clinton’s up 1%.

November 8, 2016 10:55 pm

Presidential elections can be easily predicted. It all comes down to charisma. Since Kennedy-Nixon, never has a less charismatic candidate beat a more charismatic candidate. Everything else is distraction.

• November 8, 2016 11:49 pm

That grim lesson is not lost on 21st century power-seekers.

• November 8, 2016 11:53 pm

That grim lesson is not lost on 21st century power-seekers.

November 9, 2016 12:04 pm

Democracy doesn’t promise to pick the best candidates. It promises to pick who majority of people want, and the majority of people always want a charismatic leader, regardless of policies. Charisma “trumps” policy.

November 9, 2016 2:09 am

Taleb (black swan) has been criticising Nate Silver’s use of probabilistic estimates and his “probability of winning” time series for a longwhile in this election campaign… GIGO, as we used to say…

November 16, 2016 2:36 am

Well, Taleb seemed to me like a bit of crank, really. Who (in this case) tries to convey his message in a series of angry tweets … not the most conductive way of presenting an argument.

3. November 25, 2016 8:19 pm

😮 so it looks like this was written pre-Trump-outcome. inquiring minds wanna know what you think of this. any comment?

➡ ⭐ re your concerns with fairness in chess & elsewhere. it looks like there could be some skew due to pro-Trump “fake news” aka “propaganda” and just wrote up the following blog summary with lots of research/ links. wonder if some of it was actually state-sponsored, it seems quite plausible to me. it looks like social media is defn a double edged sword. was hoping it would improve democracy but maybe its just another medum that can be skewed/ corrupted, possibly even more than the now-sometimes-hated mainstream media.

December 15, 2016 5:42 pm

Was “jive” intended, perhaps tongue-in-cheek or with some meaning that I don’t know? It seems that “jibe” may be correct here.
The concept of winning the popular vote is not defined although the intent is comprehensible. One can gain a majority or plurality of the popular vote, but the election can only be won in the Electoral College. It seems that which candidate in 2016 received the most popular votes may never be known. It is unlikely that allegations of Democrat vote fraud will be resolved.
You may want to investigate the 1960 election regarding whether the Democrats stole the election (see Mayor Daley, Chicago). In fact, even if the votes putatively cast for Kennedy in Illinois are accepted as genuine, Nixon received the most popular votes because votes cast for another Democrat in Georgia were counted for Kennedy because of an arcane rule.
Almost the entire so-called Mainstream Media routinely engages in fake news and other bias, including deliberate falsification, in favor of left-wing candidates.

• December 15, 2016 10:02 pm

Ah, you are right about “jibe” vs. “jive” so I’ve changed it. I did like the sound of “there is no jive” better though.

Can you substantiate your assertions about “Democrat vote fraud” on a scale large enough to render your statement “which candidate in 2016 received the most popular votes may never be known” anything but void—in the face of the margin over 2.8 million votes tallied here?