Skip to content

Playing Chess With the Devil

July 28, 2015

A new kind of ‘liar’ puzzle using Freestyle chess

By permission of Vierrae (Katerina Suvorova), source

Raymond Smullyan is probably the world’s greatest expert on the logic of lying and the logic of chess. He is still writing books well into his 10th decade. Last year he published a new textbook, A Beginner’s Guide to Mathematical Logic, and in 2013 a new puzzle book named for Kurt Gödel. His 1979 book The Chess Mysteries of Sherlock Holmes introduced retrograde analysis—taking back moves from positions to tell how they could possibly have arisen—to a wide public.

Today Ken and I wish to talk about whether we can ever play perfect chess—or at least better chess than any one chess program—by combining output from multiple programs that sometimes might “lie.”

We will start with Smullyan-style puzzles today, but they are prompted by an amazing and serious fact. Even though human players have been outclassed by computers for over a decade, humans judging between multiple programs have been able to beat those programs playing alone. This happens even when the human player is far from being a master—someone who would get crushed in minutes by programs available on smartphones. We want to know, how can this be?

By coincidence, yesterday’s New York Times Magazine has a feature on Terry Tao that likens discovering and proving theorems to “playing chess with the devil”—quoting Charles Fefferman:

The devil is vastly superior at chess, but […] you may take back as many moves as you like, and the devil may not. … If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but—aha!—you have your first clue.

On this blog we have previously likened perfect programs to “playing chess against God”—this was quoting Ken Thompson about endgames where perfect tables have been computed. Since the programs we consider here occasionally err—one can say lie—we will reserve the “devil” term in yesterday’s Times for them.

Freestyle Chess

We, that is I and my Tech colleague Dr. Kathryn Farley, just visited Ken at his home base in Buffalo and had a great time. Of course the weather up there is near perfect this time of year and his family was wonderful to us. Plus we got to visit Wegmans—our pick for the greatest supermarket chain in the world.

One afternoon I was honored to sit in on a video conference in which Ken presented some of his research on using computer programs to evaluate the play of humans. Joining him via the Internet were two experts in so-called freestyle chess where humans are allowed access to multiple chess programs during a game. One-on-one the programs totally dominate the humans—even on laptops programs such as Stockfish and Komodo have Elo ratings well above 3100 whereas the best humans struggle to reach even 2900—but the human+computer “Centaurs” had better results than the computers alone. In the audience were representatives of defense and industrial systems that involve humans and computers.

Ken got into freestyle chess not as a player but because of his work on chess cheating—see this for example. Freestyle chess says “go ahead and cheat, and let’s see what happens…” The audience was not interested in cheating but rather in how combining humans and computers changes the game. While chess programs are extremely strong players, they may have weaknesses that humans can help avoid. Thus, the whole point of freestyle chess is:

Are humans + computers > computers alone?

That is the central question. Taken out of the chess context it becomes a vital question as computers move more and more into our jobs and our control systems. The chess context attracts interest because it involves extreme performance that can can be precisely quantified, and at least until recently, the answer has been a clear “Yes.”

At the beginning of the video conference Ken spoke about the history of computer chess, giving his usual clear and precise presentation, and then reviewed his joint paper showing the good results for human-computer teams were no fluke—they really made better moves. Ken used some of the slides from the end of his TEDxBuffalo talk. Then two Freestyle experts spoke, including a winner of three tournaments who doesn’t even have a human chess rating, and an interesting discussion followed on how they actually used the computer chess programs.

I must admit at first I was a skeptic: how could weak players, humans, help strong players, computers? Part of it was that when two or more programs disagreed on the best move the human could make the choice. This kind-of means saying the programs whose move you don’t choose are wrong.

As Ken and I mulled over the idea of freestyle chess we realized it raises some interesting puzzles. I wrote a first draft, then Ken took over adding more detail and tricks to what follows. Let’s take a look at the puzzles now.

Puzzle I

Suppose Alice has one program {X} that is perfect except that there is one position {Q} in which {X} makes a error. To simplify, let’s suppose the only outcomes are win ({W}) or loss ({L}). An error means that {Q} is a winning position for the player to move—it could be Bob not Alice—but {X} chooses a move {m_1} leading to a position {R} that is losing for that player.

Let Alice start in a position {P}. She wants to play perfect chess using {X} as guide. Can she do so? The bad position {Q} might be reached in a game played from {P}, indeed might be {P} itself.

Alice of course cannot tell by herself whether a given position has value {W} or {L}, unless the position is right near the end of a game. But she has one advantage that {X} lacks. She can play {X} against itself from any position {R}. If {R} is beyond {Q}—at least if {Q} is not reachable in a correctly-played sequence of moves from {R}—then Alice will get the correct value {V(R)} of {R}.

This is like the power of a human centaur to try a program deeper on the moves it is suggesting. In symbols, Alice executes {X^*(R)}, which generates a game sequence

\displaystyle  G_R = (R,R_1,R_2,\dots,R_n)

of positions where {R_n} is checkmate for one side. The cost of this is {\Theta(n)}. You might think we could let {X} do the same thing, but {X} is not like “Satan” in Smullyan’s story, “Satan, Cantor, and Infinity.” {X} is not trying to out-psych Alice or correct itself; {X} is just given as-is and by-hook-or-by-crook makes that error in some position {Q}.

So Puzzle I is:

Can the “centaur” Alice + X play perfectly, even though neither Alice nor X plays perfectly alone? And at what cost compared to X?

Answer to Puzzle I

The answer is, yes she can. Let the legal moves in {P} be {m_1,\dots,m_\ell} going to positions {Q_1,\dots,Q_{\ell}} with {m_1} the move recommended by {X}. Her algorithm exemplifies a key idea that bridges to the more interesting puzzles.

  • Alice runs {X^*(Q_1),\dots,X^*(Q_{\ell})}.

  • If {X^*(Q_1) = W}, or if all values are {L}, then Alice plays {m_1}.

  • If {X^*(Q_1) = L} and {X^*(Q_i) = W} for some other {i} then Alice plays {m_i}.

We claim this algorithm, whose running time {O(n\ell)} we count as linear in {n}, plays perfectly in any {P}. If all values are {L} but {m_1} is wrong then some other {Q_i} is really a winning position. But then {X} is wrong both at {P} and at some position in the game from {Q_i} which is a contradiction. If {X^*(Q_1) = W} but {m_1} is wrong anyway then {X} is wrong both at {P} and somewhere at {Q_1} or beyond.

Finally if the “switch” {m_i} is wrong then {X} errs somewhere beyond {Q_i} and {X} either errs beyond {Q_1} or {X} erred by choosing {m_1} in {P} after all. (If {X^*(Q_i)} is wrong and all other {Q_j} are losing then {V_P = L} and so {m_i} wasn’t a mistake by Alice since it didn’t matter.)

There is one loophole that needs attention. {X^*(Q_1)} and {X^*(Q_i)} could both be wrong because their games go to a common position {R} in which {X} makes an error. However, that lone error cannot simultaneously flip a true {W} to {L} in {Q_1} and a true {L} to {W} in {Q_i}, because {Q_1} and {Q_i} have the same player (Bob) to move. There is also the possibility that play from {Q_i} could go through {Q_1} (perhaps via {P)} or through some other {Q_i}, which we leave you to analyze. We intend to rule out the latter, and we could also rule out the former by insisting that the “game tree” from positions {P} really be a tree not a DAG.

Boiled down, the idea is that {X^*(Q_1) = L \;\wedge\; X^*(Q_i) = W} where {X} goes to {Q_1} is an “error signature” that Alice can recognize. If {X} errs at {P} then the signature definitely happens because {X} is perfect at each {Q_i} and one of the {Q_i} must be winning. If the signature happens yet {X} did not err at {P} then {P} must be a losing position. Hence once the signature happens then Alice can trust {X} completely. The only way the error could possibly lie in her future is if she was losing at {P} but lucks into a winning position—but then the error happened on Bob’s move not hers.

We claim also that all this logic is unaffected if “draw” is a possible outcome. Indeed, we could play with the old Arabian rules that giving stalemate is a win—counting it 0.8 points say—and that occupying the center with one’s king after leaving the opponent with a bare king is a win—worth say 0.7 points. It all works with “{W}” being the true value of position {P} (aside from a total loss) and “{L}” being any inferior value.

Puzzle II

Now suppose {X} is allowed to make errors in two positions. Can Alice modify her algorithm to still play perfectly?

First suppose the errors are related. Call two positions {Q,R} related if {R} can occur in a game played from {Q} where all intervening moves are not errors. Per above we will always suppose that two positions {Q_i,Q_j} reached in one move from {P} are unrelated (else we would say the two options {m_i,m_j} are not fully distinct). Related errors are ones that occur in related positions.

If Alice knows this about {X}, then we claim she can solve Puzzle II. She plays the game {X^*(P)} through {Q_1} as before, but now she looks for the error signature at all nodes in the game. If she never finds it then she plays {m_1}. If she does, then she lets {R} be the last position at which it occurs. Then she knows that either {X} errs at {R} or {X} errs somewhere beyond {R}. Either way, she can use this knowledge to whittle down the possibilities at {P}. Or at least we think she can.

Notice, however, what has happened to Alice’s complexity. She is now running {X^*} at every node in a length-{n} game path. Her time is now quadratic in {n}. This is still not terrible, not an exponential blowup of backtracking. But in honor of what Alberto Apostolico cared about, we should care about it here. So there is really a second puzzle: can Alice play perfectly in time {O(n)}?

If the errors are unrelated then we would like Alice to carry out the same algorithm as for Puzzle I. The logic is not airtight, however, because of the case where there were unrelated errors at {Q_1} and {Q_i}. Worse, what if Alice doesn’t know whether the errors are related?

Puzzle III: Two Programs

Here comes the “Freestyle” idea of using multiple programs. Let us have two programs, {X} and {Y}. Suppose one of them can make up to {2} errors but the other is perfect. Alice does not know which is which. Now can she play perfectly—in {O(n)} time?

If the errors are related then she can localize to {P} and have the same linear time complexity as before. For simplicity let’s suppose there are just {2} legal moves, i.e., {\ell = 2}. Here is her algorithm:

  • Run both {X^*} and {Y^*} on the positions {Q_1} and {Q_2}, getting two pairs {(V^X(Q_1),V^X(Q_2))} and {(V^Y(Q_1),V^Y(Q_2))} of values.

  • If the values for {Q_1} are both {W} then one of them comes from the perfect program, so Alice plays to {Q_1}. Likewise if they are both {W} for {Q_2} then Alice goes to {Q_2.}

  • If the values are both {L} for {Q_1} then again by perfection the true value of {Q_1} is {L}, so Alice cannot err by going to {Q_2}. If they are both {L} for {Q_2} she goes to {Q_1}.

The remaining case is that one pair is {(W,L)} and the other is {(L,W)}. This cannot happen, because it means that one of the programs is making two unrelated errors.

This is the idea Dick originally had after the videoconference on Freestyle chess. It shows the advantage of using multiple programs to check on each other like the centaur players do. But what happens if the errors are unrelated? Call that Puzzle III.

Puzzles IV and V

Now let’s allow both programs {X} and {Y} to make up to {k} errors. Can Alice still play perfectly? We venture yes, but we hesitate to make this a formal claim because Puzzles II and III are already proving harder than expected.

How much does having a third program {Z} that is perfect help?—of course not knowing which of the three programs is perfect. If instead {Z} too can make up to {k} errors, how much worse is that? Even if Alice can still play perfectly in polynomial time, we wonder if the exponent of {n} will depend on {k}. Call all of this Puzzle IV.

We can add a further wrinkle that matters even for {k = 1}: we can consider related errors to be just one error. This makes sense in chess terms because an error in a position {R} that is reachable from a position {Q} can affect the search by a program {X} in {Q}. Thus the error at {R} knocks-on and makes the play at all nodes between {R} and the root unreliable. Let {E} be the set of all positions at which {X} makes errors. Then we can define {k(E)} to be the minimum {k} such that there are positions {R_1,\dots,R_k \in E} such that {E} is contained in the union of the positions from which some {R_j} is reachable. This is well-defined even when the positions form a DAG not a tree.

Thus programs could err in multiple positions but still count as having a single branch error if those positions are all on the same branch. Anything with branch errors counts as Puzzle V. This is where our error model is starting to get realistic, but as we often find in theory, there is a lot of already-challenging ground to cover before we get there. It is time to call it a day—or a post.

Open Problems

Our puzzles have some of Smullyan’s flavor. In a typical logic puzzle of his, Alice would be confronted by {X} and {Y} that have different behaviors in telling the truth to arbitrary questions. The solutions in his case rely on the ability to ask questions like:

If you are a person of type … , then what would you say to the question … ?

Our situations seem different, but perhaps there are further connections between our puzzles and Smullyan’s. What do you think?

Can you solve the puzzles of kinds II or III or higher? If you or we find a clear principle behind them then this will go into a followup post.

Update (7/31/15) The artist Vierrae, Katerina Suvorova of Russia, has graciously contributed two new portraits of Smullyan in oil. I have used her new version of Smullyan in a ‘Magus’ robe at the top. Here is her portrait of him in more formal wear, as if he were a dinner guest at an Oxford High Table.


The originals in higher resolution are viewable on her DeviantArt page . Our great thanks to her.

10 Comments leave one →
  1. July 28, 2015 7:01 pm

    I have not seen Lipton’s post for a while. What happened? Did someone prove P=NP? Or did Dr. Lipton realize something else?

    • July 28, 2015 8:01 pm

      Pip = Lipton + Regan. He starts most of the posts; depending on how much material I add besides editing it gets our joint handle or his name.

  2. E.L. Wisty permalink
    July 29, 2015 6:57 am

    Reblogged this on Pink Iguana and commented:
    The analysis has some of the same flavor that might be applied to quant model evaluation in Net Interest Margin optimization error analysis.

  3. July 29, 2015 6:01 pm

    Going to make me long time to solve those puzzles.

  4. July 31, 2015 10:28 am

    Fascinating post. Sorry I don’t follow the math in any sophisticated understanding, but is Puzzle I still true if the moves are simultaneously conducted; that is, {X^*(Q_1) = P(W | L)} ? Apologies again if this is way off-base, but no one else is commenting!

  5. July 31, 2015 11:39 am

    I think the comparison here shouldn’t be Alice + computer X vs Alice or X; I would expect the ensemble to perform better than either solution in isolation. Instead, the comparison should be Alice + computer X vs computer Y + computer X (or, Alice ensembling a set of computer solutions vs. an algorithm ensembling that set of solutions). An ensemble of multiple independent solutions is well-known to provide an advantage over single solutions, so the idea that Alice + X outperforms both alone is not surprising. (This 2015 article discusses some methods with empirical results. Various papers explore the theory, including this 2005 paper: .)

  6. spencer permalink
    August 19, 2015 4:53 am

    Problems where human+computer seem to perform better than either alone are quite interesting. One that comes to mind is predicting protein structures. This, like chess, has a deep branching structure and resists even efforts with supercomputers or custom hardware. But the Foldit game has had remarkable success in combining human intuition with short computational searches.

    As to the reasons why humans are helpful, I’ve heard it described as human being better at generating diverse (“creative”) solutions, whereas the algorithms tend to get caught in local minima of the solution space. Why that should be the case, however, rapidly devolves into philosophy.

  7. September 6, 2015 8:59 pm

    I am right now engaged in Smullyan’s kind of retrograde analysis puzle, but with reconstructing real games. Here is my favorite example, in which I am highly confident that I have found the true reconstruction. Great credit must go to tournament officials who type up often hundreds—in this case over two thousand—games from player’s handwritten scoresheets. When a move is indecipherable sometimes the official will just enter “–” for it, as here:

    [Event “Turkish Cup 2015”]
    [Site “Manavgat TUR”]
    [Date “2015.02.04”]
    [Round “7.94”]
    [White “Fazla, O.”]
    [Black “Micoogullari, Lara”]
    [Result “1/2-1/2”]
    [ECO “D85”]
    [WhiteElo “1806”]
    [BlackElo “1625”]
    [PlyCount “82”]
    [EventDate “2015.01.31”]
    [EventType “swiss”]
    [EventRounds “11”]
    [EventCountry “TUR”]
    [Source “Mark Crowther”]
    [SourceDate “2015.04.13”]

    1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3 6. bxc3 Bg7 7. Nf3 O-O 8. Be3 c5 9. Rc1 Qa5 10. Qd2 Nd7 11. c4 Qxd2+ 12. Kxd2 cxd4 13. Bxd4 e5 14. Be3 Nf6 15. Bd3 Ng4 16. Rb1 Nxe3 17. Kxe3 b6 18. h4 Bg4 19. Ng5 f6 20. f3 Bd7 21. Nh3 Rac8 22. g4 Be6 23. Rb4 Rfd8 24. g5 f5 25. Nf2 Bf8 26. Rb3 b5 27. Rxb5 Bc5+ 28. Ke2 Bxf2 29. Rxe5 Bf7 30. exf5 Bg3 31. Re7 gxf5 32. Rb1 Bd6 33. Rxa7 Bxc4 34. Bxc4+ Rxc4 35. h5 Bc5 36. Rbb7 Rc2+ 37. Ke1 Rg2 38. — Rg1+ 39. Ke2 Rg2+ 40. Ke1 Rg1+ 41. Ke2 Rg2+ 1/2-1/2

    OK my fellow gamesleuths, can you find the missing White’s 38th move? As a hint, I think one other move was entered incorrectly. Note also that the moves leading up to 40 are often “time pressure” and the players are not masters but far from beginners either. Games from tournaments are aggregated by Mark Crowther in his incredible The Week in Chess service, but he does not venture emendations.


  1. Zeroth Law Of Semiotics • Comment 1 | Inquiry Into Inquiry
  2. Raymond Smullyan, 1919–2017 | Gödel's Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s