Can We Solve Chess One Day?
Computers play great chess—can they play perfect chess?
Ken Thompson is one of the co-inventors of UNIX, perhaps one of the greatest programmers who ever lived, and won the 1983 Turing Award for his important work.
Today I plan on having a guest author, Ken Regan, discuss Thompson’s work on chess playing programs. It is Ken on Ken, or rather Ken on : when Ken (Regan), played at the Westfield (NJ) chess club as a teen in matches against distant clubs, he knew Ken (Thompson) not by his famous programming handle but as the “club techie” who relayed his moves by telex.
I once visited Bell Labs and spent some time talking to Ken T. on his chess program. I asked why he was doing so well—his program was winning a lot. He said his program had, probably, fewer bugs than the competition. In those early days chess programs were often buggy, and a program with a bug is likely to be a program with a chess weakness. The reason, Ken T. explained the programs were so hard to get right was simple: All chess programs looked at a huge set of positions and then generated one move. As long as the move was legal the programmer had no idea the program had made an error—unless the move led to an unexpected loss.
The rest of this is the work of Ken R., who by the way is an international master with a rating of over 2400. With, of course, lots of help from Subruk.
Chess Programs and the Big Match
Unlike us, Thompson has done pioneering research in computer chess programming. He and Joe Condon created the chess computer “Belle,” which won the 1980 World Computer Chess Championship and numerous other tournaments. In 1983 Belle became the first machine to achieve a recognized Master rating. It took only 14 more years for an IBM computer, “Deep Blue,” to topple the world champion, Garry Kasparov.
Today you can buy a $50 program for your laptop computer that likely can beat Kasparov, at least according to published ratings. Weird to me, this meant that people following the just-concluded World Championship Match online could often instantly know more about the state of play than the human champions. Often but not always, as shown by today’s climactic game in which Viswanathan Anand of India won with Black to defeat challenger Veselin Topalov . At Move 40, Anand calculated 11 moves ahead to realize that a position after Move 50 with only a King and three Pawns left for each side would be winning for him. Many computer programs seeing only yea-far for minutes thought Anand’s move was a blunder allowing a draw, causing their owners to express consternation on numerous chat channels and blogs. Thus, sometimes the programs are wrong.
Yet Thompson’s other great chess creation removes even that chance of error, and ties chess to several problems in computer science we’ve been thinking about.
The Book of Chess
Thompson greatly extended algorithms and data structures for carrying out perfect analysis of chess endgame positions with at most five pieces on the board—three besides the kings. You could be better than him and better than Belle overall, but if you got down to 5 pieces, you might as well be “playing chess against God” as he called it. It was a chess version of Paul Erdős’ proofs from the “Book,” in giving you not only the answer but the shortest route to the win or draw. His database overturned centuries of belief that certain endgames were drawn. You can win them—if you have the patience to find over 100 perfect moves
In the past decade people have completed a perfect analysis of positions with six pieces, and are now working on seven. This discovered a position where it takes 517 moves to force a winning capture, and then a few more moves to checkmate. That position hasn’t come up in any real game, but others do all the time. In fact, one could have arisen in the nerve-wracking game where Anand had Topalov on the ropes, but missed knockouts that machines saw instantly. In this position
Anand, as White to move, lost his last chance by using his Rook on c3 to take Black’s pawn. Most players would take with the other Rook to form a line along the 3rd rank. Leaving both Rooks on the rim gave Topalov an easy draw by perpetual check. Topalov could still have drawn, but it would take only a little inattention to allow White’s Rooks to block checks and reach a position like this:
Here White can give checkmate—in 99 moves. I don’t think a human would even find the first move, let alone all 99. It’s not a Rook check; it’s not pushing the pawn which White needs to promote to win; it’s a move neither of us understands. You can see it by going to the endgame site. The only move to win is Kf2-g2, shunting aside the King.
If you keep clicking the best moves for both sides, you reach this position when it’s mate-in-56:
Now “The Book of Chess” says to play 56.Rd3-d2+ Kc2-c1 55.Rd2-d1+ Kc1-c2 54.Rd1-d3!, reaching the same position but with Black to move! This idea called Zugzwang is known to chess players in many immediate situations, but we have no idea why “The Book” uses it here.
White has 4 other winning moves, but the first two allow Black quickly to force White back to this position, so they are not independent wins. The other two might be independent, but the increase the “mating distance” from 56 to 72 and 74, respectively. Now we’re starting to use terms from graph theory, with positions as nodes and moves as edges, but these are not your familiar garden-variety graphs. These graphs typify alternating computation, which is a fundamental part of complexity theory. And these graphs are big.
Building Big Files
How big? Under Eugene Nalimov’s modification of Thompson’s original file format, the database for up to 5 pieces takes 7.05 gigabytes, compressed over 75% from the originals. The complete compressed 6-piece set fills 1.2 terabytes. That’s on the order of large data set examples given by Pankaj Agarwal in a University at Buffalo CSE Distinguished Speaker lecture last month, for NASA, Akamai, the LSS Telescope, and others. The complete 7-piece set has been estimated to be around 70TB, and its computation may finish by 2015. Nalimov’s distance-to-mate (“DTM”) files are built iteratively working backwards from checkmate positions. The newer ones (and Thompson’s originals) use distance-to-conversion (“DTC”), which means including all winning positions with fewer pieces in the ground set. Following chess endgame practice, let’s call the winning side White. The first iteration marks all positions where White can give checkmate—and/or make a winning capture—in one move. The next iteration finds all positions where Black can’t avoid the previously-marked positions (or in DTC, must make a losing capture). Note that this is only a subset of the usual graph neighborhood of the marked positions—it’s the subset of for which all out-neighbors are marked. But the next iteration (with White again to move) marks all the positions which has an edge to the previously-marked Black-to-move positions. The 517-move win was the only unmarked position left (up to symmetry) in the 1,034th iteration for those particular pieces, after months of runtime.
The Search Problem
You’d like to take a particular position and decide whether White can win, which means analyzing forward. However, it is not known how to save appreciably over the brute-force backwards process, just to decide the search problem for one position. This is related to asking how “fast-mixing” the chess endgames are. In that single 99-move winning line, both Kings go all over the place, Black’s as well as White’s, so a pretty wide swath of the whole KRRP vs. KQ (with White’s pawn on the f-file) search space is involved.
The tablebases of course can help in analyzing forward from a given position, because when the usual alpha-beta search reaches a node with 6 or fewer pieces, it will return a perfect answer. But herein lies a paradox: the hard-disk fetches slow the search down so much that competitive chess programs often play worse. The world champion “Rybka 3″ chess program works best with its tablebase-probe policy set to “Rarely” (the default) or “Never,” not “Normally” or “Frequently.”
The degree to which the chess search does not stay localized is reflected in poor cache performance by the tablebase files in practice. One remedy is putting selected endgames on a smaller but faster flash drive. Another is using so-called bitbases, which give only the win/draw information and use far less memory. Omitting the distance information, however, raises the problem of playing programs going around in cycles. Can we address the caching issue more directly, by changing the strategy of how they are encoded, without losing their full information?
Asymptotically, for chess on boards, the search problem is complete for or according to whether one imposes a “generalized fifty-move rule” to limit the length of individual games. Whether a player has 2 or more independent winning strategies is equally hard. Of course, for chess on the board we are talking about the concrete complexity of the problem.
Shorter Tablebase Descriptions?
Here’s one natural encoding shortcut that turns out to be wrong, exemplifying pitfalls of carrying intuition from garden-variety graphs to alternating graphs. Most chess programs today can solve Mate-in-5 problems in microseconds, looking ahead 9-10 ply where 2 ply means a move for each player. Suppose we store only the checkmate positions, then those with a DTM of 5 moves = 10 ply, then those with DTM 20 ply, 30, 40, etc. The idea is that if a chess program probes a position with a DTM of 29, then the program can do 9 ply of ordinary search, enough to hit the DTM-20 stored positions. We might expect to save 90% of the storage, on top of the 75% compression already achieved.
This idea works for an ordinary leveled graph, but fails in an alternating graph. This is because Black can make inferior moves that hop over the 10-ply, 20-ply, 30-ply boundaries. This creates game paths in which the minimax search never sees a marked node, until its own depth limit aborts it without result.
What we would like to do is to find short descriptions for whole hosts of positions, say all-drawn or all-winning for White, that we can express in lookup rules. The idea is to tile the whole position space by these rules, in a way that needs only a lower order of exceptional positions to require individual table entries. This raises a concept discussed before, namely descriptions of circuits that take an index as (auxiliary) input, and need to output (only) the -th bit of the function value. Is it feasible to produce that kind of circuit? The need for DTM or DTC info to find shortest paths and avoid cycling may make this idea difficult to realize, but it can certainly apply to bitbases.
The Tablebase Graphs
Can we organize the endgame files to make it feasible, given an initial position in the database, to retrieve and cache only the parts needed in the search problem from ? This relates to the expansion of the graph. Although the Anand-Topalov game example shows how progress from can roam all over the board, perhaps the expansion is only through relatively few lines of play. This raises the idea of graphs that are expanders locally on small scales, but have irregular expansion on larger scales. If one can identify lines that transit between phases with play localized to part of the board, then the remaining positions would break into isolated components that could be read piecemeal into main memory.
This reminds me of papers on distance and clustering in large social-network graphs. The tablebase graph is huge and natural to study. What general properties does it have? Which classic graph properties and theorems can be carried over? Note the following trick: One can have position with three winning moves . There can be three later positions such that the defender after move can go to or go to , after can go to or , and after can go to or . In such cases position remains winning even if any one of were to change from “win” to “draw.” By analogy with Menger’s Theorem one might expect this to imply has two independent winning strategies, but here that’s false. Notions of “flows” and “cuts” are hence also tricky to carry over.
The Book as “Deep” Oracle?
Ever since IBM’s “Deep Blue” beat Kasparov, multi-processing chess programs have prefixed “Deep” to their names. But here I mean a different technical use of the word: computational depth. A string is computationally deep if it has short descriptions, but every short description requires a long time to produce .
The Book of Chess has almost no information in classical terms—because it is described entirely by the rules of chess. These rules fit on half a page, and asking for the portion needed in the search from a given position adds just one more line to describe . If this portion really has no better computation than building basically the entire table, it is computationally deep.
Although the database entries for the positions you see when playing from the Anand-Topalov position are classically the opposite of random, they sure look random, which ones are wins or draws. Can they be used in applications where one desires pseudo-randomness, but also wishes to leverage the digested information of an -complete or -complete problem? There have been several papers in the past few years on the problem-solving power of deep strings, in general or specifically those that are witnesses for SAT. Certainly intense computational power has gone into building the files, so one feels they should provide power for doing something else.
The ultimate goal is to solve chess—does white win, does black win, or is it a forced draw? Even as computers become vastly more powerful than humans we still have no idea who wins this great game, even from some well-analyzed famous positions. Can we even prove that Black does not have a forced win? This may sound silly, but actually Bobby Fischer opined that in a particular position reached after five symmetrical opening moves, he would rather be Black!
A realistic goal is to try and extend the perfect tables of play to eight, nine, ten and more pieces. The underlying algorithms are simple, but there seem to be challenging computer science questions. How to handle simple graph search on such huge graphs? How to apply methods that work for graph connectivity, but seem not to work on the game graphs? How to distribute large files, and optimize the caching tradeoffs?