Skip to content

Chess Knightmare and Turing’s Dream

May 31, 2012


Plus answers to Sunday’s test on Turing

By permission of Mike Magnan, artist.

Viswanathan Anand retained his chess world championship title yesterday, by defeating challenger Boris Gelfand in a rapid-chess playoff after the twelve regulation games ended in a 6-6 tie. Yet the match—the most important in chess—was by most accounts not very exciting. Ten of the twelve games were draws, seven with agreement before Move 30 which is disallowed in many tournaments. Many of their moves had been prepared using computers, often toward or beyond Move 20.

Today I ask why computers playing among themselves have produced livelier games than recent matches of humans equipped with computer preparation.

Anand’s third straight match win since gaining the world title in 2008 is impressive. His play can often be exciting, though his results last year were lackluster. Gelfand earned the right to challenge by winning last year’s Candidates’ Match-Tournament in Kazan, Russia. In that event 27 of 30 regulation games were drawn and most matches were decided in similar fast-paced tiebreaks, which some deem akin to a penalty-kick shootout in soccer.

Fears of chess becoming “played out” go back even before the 1927 world championship match between Alexander Alekhine and José Raúl Capablanca, in which 32 of 34 games featured the same opening. What hypes the fear now is that we may have the technology to actually play much of chess out. This feeling was revealed by the number of people taken in by a hoax last month that the King’s Gambit had been exhaustively analyzed to a draw.

However the computers themselves play few draws against each other, and show a degree of human-appreciable inventiveness that Alan Turing could only have dreamed about, unless he had lived to see 85 let alone 100.

What Price High Standards?

According to my statistical model of player move choice mentioned here, this match had the highest standard in chess history. Based on computer analysis of the twelve regulation games, my model computes an “Intrinsic Performance Rating” (IPR) for Anand of 3002, and 2920 for Gelfand. Each is about 200 points higher than their current Elo ratings of 2791 and 2727, respectively. My analysis eliminates moves 1–8, moves in repeating sequences, and moves where one side is judged to have a clearly winning advantage, the equivalent of being over three pawns ahead.

To be sure, I ascribe most of this difference to their use of computer-prepared moves, in short games with relatively few “original moves.” Not only do the programs’ ratings ramp over 3200, home analysis can run them longer and stronger than the game-time settings used to compile those ratings. Still, the players had those moves in their head as they sat down computer-free at the board, and if what matters is the quality of the moves made by their hands regardless of where they came from, then this was history’s human chess pinnacle.

My work also hints that the Elo rating of perfect play may be as low as 3600. This is not far-fetched: if Anand could manage to draw a measly two games in a hundred against any perfect player, the mathematics of the rating system ensure that the latter’s rating would never rise above 3500, and if Gelfand could do it, 3400. Perfect play on both sides is almost universally believed to produce a draw, even after a few small slips. All this raises a question:

Does the higher draw tendency of recent top-level matches owe inevitably to their coming within a few hundred intrinsic rating points of perfection?

The fairest ones to ask are the computers, for they have now played far more games at this level than have we humans. And perhaps surprisingly, their answer seems to be No. The recent 19th World Computer Chess Championship received an IPR over 3000 from my model, yet 22 of 36 games ended in victory. The 2010 WCCC had 35 wins from 45 games, while the 2009 WCCC had 33 from 45, and this month’s International CSVN Tournament had only 7 draws in 28 games. Why?

Contempt and Contemplation

The reason may literally be that the computers have greater contempt for each other. The contempt factor is a term in a program’s evaluation function that makes it pretend to be a couple tenths of a pawn better off than it is, in situations where a drawing or drawish continuation is available.

The computers also have no awareness of high stakes that puts “staying in the game” ahead of maximizing one’s chance of winning. This tendency was called out by observers several times as the games were played, most notably in the 12th game when Anand preferred to trade queens and be a safe though sickly Pawn ahead, rather than play his queen to be a thorn in Gelfand’s position on risk of allowing Gelfand’s queen to gobble two of Anand’s pawns with check. Here are comments made by the now-retired Garry Kasparov and three players currently rated in the top 5:

Kasparov (after game 6): “Hopefully [the] next few days will provide more ‘fire on board’.”

Vladimir Kramnik (after Anand’s draw offer in Game 12): “What is this? Really confusing… I can only have one explanation: [Anand] just couldn’t stand the pressure of the last game… It is one of the strangest decisions I ever saw in the World Championship matches.”

Levon Aronian (on Twitter after Game 12): “Anand-Gelfand g 12 was brilliant. Anand found a great pawn sac at home, and Gelfand answered with 2 pawn sacs! Wow, can’t wait till tiebreaks!”

Hikaru Nakamura (same time on Twitter): “I must be a very bad chess player since I keep liking Anand’s positions and he keeps offering draws instead of trying to win.”

Anand himself said, “The problem with such a tight match is that every mistake has a much higher value,” while Aronian noted about his surprise early elimination from the qualifier in tiebreaks after four regulation draws, “Perhaps I didn’t quite cope with the pressure.”

Thus humans get “tight”; computers don’t. How might we level the situation?

From My Corner Square

One way is to take away time. The 4 games of the Rapid playoff, in which the players had basically one-fourth the thinking time as for a standard game, were by all accounts more exciting. They were also longer, and provided 377 moves making my analysis cutoffs, compared to only 495 for the 12 regulation games. Still they hit a combined 2710, Anand 2701 and Gelfand 2720, in my model’s judgment of their intrinsic quality. This is about equal to both players’ performances in standard games over the past year. For comparison, the tiebreak in the 2006 match by which Kramnik defeated Veselin Topalov hit 2663, with Kramnik’s 2789 marginally better than his IPR for the regulation games, while Topalov’s 2530 perhaps explains his stated desire to avoid a tiebreaker with Anand in 2010.

But almost no one wishes to see Rapid chess become the standard. That the time to make 40 moves has shrunk from 150 minutes when I was active to 120 and now 90 minutes (plus 20 in increments) is considered shortening enough. Is there another way to dial humans up a notch?

Dick and I have also talked about the idea of an “aggressiveness parameter” in other applications besides chess. Of course robots in a virtual battle can be more aggressive. Can such a parameter be used in web search, say to promote pages that have many recent updates and other signs of dynamism and risk? Even in something pure like a SAT-solving engine or theorem-prover one can implement a degree of taking risks by undoing part of present progress. I am also improving my model to formulate “challenge produced” as a measure of skill, over the present emphasis on accuracy. (This is the place to note that the boldfaced IPR’s in this post are subject to change.)

My own opinion specific to chess is that by mid-century, the game will need to be tweaked to promote a longer “mixing time” before the players can simplify by trading central pawns and heavy pieces. The huge opening books and enumerated simplifying lines are what prompted Bobby Fischer to promote “FischerRandom Chess,” which is today better known as Chess960 for the 960 possible symmetrical starting configurations, from which a random choice is made. I favor combining Fischer’s ingenious generalized castling rule with an older non-random, non-symmetrical format originally proposed by David Bronstein, the former Soviet champion whose name channeled Kronsteen for James Bond. This is described here on the great Chess Variants website originated by Hans Bodlaender, whom we all know in computer science theory.

To be sure, the game of Go has virtually no “draw problem” and humans still beat computers handily at it.

§

On a separate note, Kasparov is an invited speaker for Manchester’s Alan Turing Centenary Conference. In the second half of his talk he will describe a “Turing Test” in which he was challenged to find which of five games was played by a computer. As first related here twelve years ago, he distinguished it quickly by the other games having a tangible frequency of short-term errors. My joint work with Guy Haworth and Giuseppe DiFatta on player modeling, however, has regressed this frequency against Elo rating, and can now be used to generate artificial games between players of any desired target rating that have such “human” errors. Would an expert be able to distinguish those now?

Turing Test Answers

Here are our answers to last Sunday’s “Turing Test” post.

  1. (i) Mathison
  2. (v): note that (i) and (ii) are equivalent upon considering negated formulas.
  3. (ii) Marian Rejewski—let us not forget the Poles…
  4. (v) in the 1970’s.
  5. (v) Last month.
  6. (v) DEUCE.
  7. (iv) Kurt Gödel himself told us he never met Turing.
  8. (i), according to Turing’s formal definition of “circling,” though today we read “halting” in place of its formal antonym.
  9. (ii) Rosser.
  10. (iii) Church was Turing’s PhD advisor…
  11. (iii) …at Princeton.
  12. (i) Turing never won it—we meant the answer to be (v) none, but mis-worded (i); for (iv) note the Knuth Prize.
  13. (i) Everything down to one tape.
  14. (i) Doubting it, as mentioned in Andrew Odlyzko’s talk here.
  15. (iii) Voice encryption for phones.
  16. (ii) Frances McDormand. (We cheated with Wikipedia here.)
  17. Correction: (v) since the current number is 1, with Max Newman in the JSL, 1942. So much for what we thought we knew (while at Princeton), though in light of question 5, perhaps more will emerge e.g. with I.J. Good…
  18. (ii) Distance runner.
  19. (v) All named for Turing.
  20. (v) See the last few items on this page.

Commenters are welcome to post answers to their supplementary questions, for which we thank them.

Open Problems

How can we induce the top human players to play more like computers, so that their games will be more exciting?

Can computer chess engines be programmed to explain their analysis in a vocabulary of chess strategy prose? Would they then be said to pass a “Turing Test” at chess?

[fixed answer to question 17, gratia commenters, included co-authors; fixed ChessVariants link]

About these ads
31 Comments leave one →
  1. Rob van Stee permalink
    June 1, 2012 3:28 am

    About question 17: you should have checked DBLP :-) It mentions the following publication, which is available in JSTOR: M. H. A. Newman, Alan M. Turing: A Formal Theorem in Church’s Theory of Types. J. Symb. Log. 7(1): 28-33 (1942). Newman has other publications in JSTOR as well, e.g. A Discussion on Computing Machines quick view
    D. R. Hartree, M. H. A. Newman, M. V. Wilkes, F. C. Williams, J. H. Wilkinson, A. D. Booth (1948). With G.Kreisel he wrote a memoir about Brouwer in 1969 (See Google Scholar.)

    I will leave it to others to determine the Turing numbers (and Turing-Erdos-Bacon numbers…) of currently active researchers :-)

  2. Vincenzo permalink
    June 1, 2012 4:16 am

    Regarding Question 17, Max Newman and Turing seem to have a joint paper in the Journal of Symbolic Logic (1942):

    http://www.jstor.org/stable/2267552

  3. June 1, 2012 8:41 am

    Great article.

    If you could post the IPR of the recent games played by Carlen, Aaronian, Kramnik and Radjabov, it would help us to analyze the if the current ELO system is fair or derive our own conclusions…

    • June 2, 2012 8:23 am

      My papers include a “Compendium” where I have computed IPR’s for entire tournaments—including all events of Category 20 or higher. I can easily isolate single players, but have not had time to compile these yet. Plus while I’ve attained confidence in my model’s results for tournaments on the order of 1,500–3,000 total analyzed moves, and in the first-move-matching statistic for player-performances on the order of 150–300 moves, the IPR stat still needs tuning for the latter.

  4. June 1, 2012 10:48 am

    In tournaments you could give 1 point for draw and 3 points for a win, but obviously this woundn’t help in matches. I don’t know how much the prize money is, and how much it means for the players, but perhaps you could reduce the prize for winning the match, and give large prizes for each won game?

    • June 4, 2012 6:41 am

      Or simply impose penalties for draws. The prize is such and such minus $x per extra game.

      Better yet introduce more sponsorship money. If chess players dressed up like NASCAR drivers they would have just as much incentive to be interesting as to win.

  5. John Sidles permalink
    June 1, 2012 11:20 am

    Please let me join with binoj7 in saying “great article”. In particular, the following passage passage

    “The Elo rating of perfect play may be as low as 3600 … perfect play on both sides is almost universally believed to produce a draw, even after a few small slips”

    helped me to appreciate that ultra-strong computer algorithms can embrace “fighting-chess” strategies without substantial Elo-penalty, whereas humans training against computers are Elo-driven toward “drawing-chess” strategies … which is a beautifully counter-intuitive finding.

    Overall, this essay provided (for me) more-and-stronger insights into the complexity-theoretic properties of the game of chess, than any other single article I have ever read. Thank you, Ken! :)

    • June 2, 2012 11:40 am

      Thanks, John—Here is a current discussion on the main public Canadian chess message board about retarding effects of Elo ratings, and Marginal Revolution has picked up related aspects of this here.

  6. June 2, 2012 5:02 am

    After watching Gelfand-Annand me and my friend have the following suggestion to make matches more interesting.
    Start with a tie-breaker with its rules. You may ensure a winner of it by deciding to have a draw by Black considered as a win. Or white starting with a minute less in 5-minute blitz, but is a winner if a game is drawn.
    After the tie-break the winner of it has 0.5 points more
    Now let the players play the match. They, surely, will have to play more aggressive

    • June 2, 2012 11:21 am

      This is a very good idea which I have also seen elsewhere, perhaps in the many comments at ChessVibes.com here or here.

      I should also mention that I gave examples from other recent computer events that had only the top programs playing each other, here.

  7. June 2, 2012 11:53 am

    But wasn’t the staple complaint of human chess players that computer play shows neither innovation nor strategy?

    • John Sidles permalink
      June 2, 2012 4:35 pm

      That has changed in the past two decades. Per Garry Kasparov’s 2010 essay in the New York Review of Books, titled “The Chess Master and the Computer“, today even laptop computers play with (in Kasparov’s phrase) “godlike perfection” sufficient to “crush human grandmasters.” As Ken Regan says, the present advantage of computers is roughly 400 elo points (which is an immense advantage).

      What will be the next human-dominated game to fall to the computers? In Kasparov’s opinion … poker.

  8. John permalink
    June 2, 2012 1:55 pm

    I really have no idea what I’m talking about, but I can’t help but wonder if the issue is that minimax (which most chess engines still use, AFAIK) isn’t very good at dealing with uncertainty in the evaluation function.

    For example, suppose that through path A, at one point the best choice for the player is scored 0.9, and nine other choices are scored -1, while through path B, at one point ten choices are scored 0.8, and these choices propagate up so path A is scored as 0.9 and path B is scored as 0.8. The computer will take path A, even though when it gets down to that location it may discover that the good choice is actually worth 0.8, whereas with path B, those formerly-indistinguishable choices evaluate to a range of values from 1.0 to 0.5, so it now has the better option.

    The computer is inclined to take riskier strategies (more ways to screw up) than someone who understood the probabilities.

    • June 4, 2012 6:39 am

      Umm, but computers do outperform humans at chess, i.e., they win a greater fraction of the games when playing against top ranked players. On your model the computers would be actually choosing the worse plays and thus be playing worse than the humans (unless you really believe the computers are winning purely by waiting for human error which seems unlikely). With this in mind it seems unreasonable to claim the computer is making a worse move selection than someone who had a correct understanding of the probabilities as such a blatant flaw would no doubt be reflected in inferior play.

      Of course it is possible that the computer’s strategy has a higher variance. I wouldn’t call this a failure to understand the probabilities just a different attitude toward risk.

      Indeed, I wouldn’t be surprised if lack of risk-aversion isn’t a significant factor in the computer’s favor.

  9. repton permalink
    June 2, 2012 3:33 pm

    Make important matches have three or more observers. If you offer a draw, a majority of observers must vote in favour.

    Heck, the observers could even be computers :-)

  10. June 3, 2012 9:02 am

    > Can computer chess engines be programmed to explain their analysis in a vocabulary of chess strategy prose?

    I think nobody is able to explain some computer moves in the case of simple
    alpha-beta pruning or Monte Carlo search ;-) Human players do not use alpha-beta and Monte Carlo, and they can not play like computers. Many years ago Mikhail Botvinnik tried to induce computers to play more like top human players, but this his work was unsuccessful.

  11. Alex Lopez-Ortiz permalink
    June 3, 2012 7:59 pm

    Dick and I have also talked about the idea of an “aggressiveness parameter” in other applications besides chess. Can such a parameter be used in web search, say to promote pages that have many recent updates and other signs of dynamism and risk?

    Ironically for most computer assisted tasks the problem is that the program is too aggressive. According to lore, the original research version of Microsoft Word’s “clippy” intervened rarely, but when it did it was always correct, which made the interruption quite welcome.

    Closer to home, we have improved, as per user evaluation tests, several known search techniques by simply making them much less aggressive. Again, they quick in rarely, but when they do, they are almost always correct.

  12. June 4, 2012 7:53 pm

    KWRegan wrote:
    {“The huge opening books and enumerated simplifying lines are what prompted Bobby Fischer to promote “FischerRandom Chess,” which is today better known as Chess960 for the 960 possible symmetrical starting configurations, from which a random choice is made. I favor combining Fischer’s ingenious generalized castling rule with an older non-random, non-symmetrical format originally proposed by David Bronstein,”}
    .
    Best Practical Solution: Discard the “Random” from Fischer Random Chess!
    .
    Starting every game with a randomly chosen start setup from among the 960 is bad. It prevents at-home preparation, and worse it prevents even grandmasters from understanding the position. The quality of play would be low and thus uninteresting to millions of chess enthusiasts who properly enjoy smart systems of opening play, and the jousting with home preparation.
    .
    At-home preparation is not the cause of the suffocatingly high draw rate in elite chess.
    Instead, the cause is the excessive degree to which the one traditional setup has been over-repeated and over-studied.
    .
    ** SOLUTION WORTH A TRY ** Pick one sensible second setup from the other 959, then stick with it for a couple decades.
    That way players can prepare strong novel moves that occur early in the opening phase, instead of today’s novelties which occur almost exclusively in the early middle game phase.
    Plus, for a few years these early novelties will occur in positions that neither player understands as deeply as they currently understand all basic opening positions from the traditional setup; thus a player will be less likely to correctly solve an early novelty that his opponent springs on him in the early to mid opening phase; thus a significant advantage arises in the opening phase and the game might become decisive instead of drawn.
    .
    If neither grandmaster has any advantage after the opening phase, the chess game will likely be drawn. After the opening phase is over, chess960-FRC feels exactly like traditional chess. Strong prepared novelties are the best opportunity to create an opening advantage.
    .
    BTWay, the nonSymmetrical start setups idea is bad, for reasons to laborious to write out here.
    .
    Discard the “Random” from Fischer Random Chess!
    .
    Thanks, GeneM
    CastleLong.com

    • emde permalink
      June 5, 2012 5:18 am

      Or we could swap the king and queen of one side – e.g. White. This way the position would never be symmetrical.

  13. stefan permalink
    June 5, 2012 7:25 am

    I think that the chess variants link as a superfluous “}{” at the end which causes it to end on a “page not found” page.

    • June 5, 2012 7:30 am

      Ah, thanks! Generally what happened is that the URLs from Mark Weeks’ site on the WCCC events include a $, which made the LaTeX-to-WordPress conversion script think it was going into math mode, and the rest of the HTML needed much fixing.

  14. July 3, 2012 8:21 am

    Great piece

    ‘My work also hints that the Elo rating of perfect play may be as low as 3600.’
    You can rent a “Our current capacity is 296 physical cores,” version of Rybka which by some estimates I have seen would put it near the
    3600 standard you say could be the max. http://www.rybkachess.com/Cluster/rent.html

    It is interesting that the guy who runs the Rybka cluster could answer a possible 3600 hard limit now.

  15. jez permalink
    September 23, 2012 6:26 pm

    Hi, i would be interested to know the elo of top programmes without opening books or game databases, mating patterns, ending mates etc etc…..just raw play. thnx

Trackbacks

  1. New SEC Circuit Breakers, Dodd Frank/Davis Polk update, CFTC to tighten screws, and GLL relaxing rules « Pink Iguana
  2. Why do humans play chess in such a risk-averse manner? — Marginal Revolution
  3. Why do humans play chess in such a risk-averse manner? » GharVale.com | GharVale.com
  4. The Spamlist! » Why do humans play chess in such a risk-averse manner?
  5. See how Turing accidentally invented the computer | Blowfish12
  6. Is Chess Too Boring? | Centives
  7. The Crown Game Affair « Gödel’s Lost Letter and P=NP
  8. The Problem of Catching Chess Cheaters | Gödel's Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,914 other followers