A technical tool of computer games thinks bigger

Mihai Pătraşcu and Mikkel Thorup are part of the great research tradition at AT&T Labs in New Jersey, which branched out from Bell Labs after the 1980’s. Thorup’s webpage lists fifteen patents and a shared award for solving a 150-year old problem about children’s blocks. Pătraşcu’s MIT page has links to his research blog, Informatics Olympiad activity, a musical composition, and personal information including fine details of how to write his name in LaTeX and HTML. Both have papers of course, but we note how all this fits together.

Today we discuss their paper “The Power of Simple Tabulation Hashing,” which was cited among notable research papers of last year.

Thorup’s paper with Mike Paterson, Yuval Peres, Peter Winkler, and Uri Zwick won the 2011 David P. Robbins Prize given by the Mathematical Association of America for advances in discrete mathematics with experimental component. They proved that an upper bound given earlier by Paterson and Zwick, on the problem of how much overhang a child can create by arranging ${n}$ identical rectangular blocks at the edge of a table, is asymptotically tight. The bound is ${\Theta(n^{1/3})}$. The following figure from their paper shows the advantage of block arrangements more general than stacking blocks atop each other in a harmonic progression, which achieves only ${O(\log n)}$ overhang.

Pătraşcu won the FOCS 2008 Best Student Paper award named for Michael Machtey, for a paper titled “Succincter.” This paper shows that investing an extra bit per word in certain optimally succinct encoding schemes gives greater algorithmic usefulness. This one-sentence synopsis fits just within the 140-char limit of Tiny ToCS; happily this blog need not obey Twitter limits.

## What Tabulation Hashing Is

We follow the notation in Wikipedia’s article on tabulation hashing, but use different language. Consider a set ${\Sigma}$ of ${R}$-many features. A scene ${x}$ can be an ordered or unordered collection of features ${x_1,x_2,\dots,x_t}$. (In the papers a scene is called a key.)

The scheme is initialized by building a table ${T}$ with ${R}$ rows whose entries are ${s}$-bit binary strings, for some ${s}$. In the ordered case the table ${T}$ has ${t}$ rows. Each entry ${T[a,i]}$ assigns a hashcode to feature ${a \in \Sigma}$ occurring in coordinate ${i}$, and the primary hash of the scene is obtained by exclusive-or on ${\{0,1\}^s}$ via

$\displaystyle h(x) = T[x_1,1] \oplus T[x_2,2] \oplus \cdots \oplus T[x_t,t].$

In the unordered case all columns are equal, and the hash of the scene is just the mod-2 sum of the hashes for each feature. This is the case of Zobrist hashing, which has been used for computer implementation of Go and chess and other board games since its invention by Alfred Zobrist in 1969.

In Wikipedia’s illustration, the features are ${r}$-bit binary words, so ${R = 2^r}$. Pătraşcu and Thorup view each column ${T_i = T[\cdot,i]}$ as a separate block in fast memory, as is practical when ${R}$ is small. This is separate from the idea that we may wish to manage a big collection of ${N}$-many scenes ${x}$.

## Hashing Chess Positions

In chess, a scene is a position on the board together with applicable rules based on previous events in the game. A position feature is a piece standing on a particular square. There are 12 different pieces, 6 white and 6 black, and 64 squares. This makes 768 position features. There is one rule feature for whether Black is to move instead of White, and one each for whether White or Black is still allowed to castle Kingside or Queenside. In case an en-passant move is legal on one of the 8 files, there is a feature for that, making 13 rules features so far. Technically there should also be features for how often the board position has occurred earlier in the game, and for the “fifty-move-rule horizon,” but chess programs often stick with the basic set of 781 features.

The primary hashcode ${h(f) = T[f]}$ of a feature ${f}$ is usually a 64-bit number, so the Zobrist tables for chess are initialized with ${781 \times 64 = 49,984}$ bits, which we can think of as 50,000 bits. A position is a subset of these features that is legal by the laws of chess, so this is the unordered case. Thus the hashcode of a position ${p}$ is the bitwise exclusive-or of the 64-bit vectors of its features. For instance the hashcode of the famous Réti position with Black to move (before playing 1…h7-h5) is

$\displaystyle h(p) = h(\text{WKh8}) \oplus h(\text{WPc6}) \oplus h(\text{BKa6}) \oplus h(\text{BPh7}) \oplus h(\text{BlackToMove}).$

## Handling Collisions—Or Not

Chess programs use the hashcodes as indices to arrays storing information about the position. If the position has already been given a value, this is stored so that later searches encountering it need not re-compute it. Typically each array entry is 16 bytes, so 1GB of RAM allocated to hash lets you store ${2^{26}}$ positions. The secondary hash is then obtained by taking 26 bits off either end or some other means.

Since a search can hit ${2^{13}}$ positions in microseconds, collisions are frequent, but rarely does a collision propagate to the root of the minimax search tree, so chess programs usually dispense with probing which would slow them down. One case where a collision did cause a chess program to blunder and lose a tournament game to a human master was noted here, but according to my reproduction of the fault which its author Stefan Meyer-Kahlen acknowledged, it occurred with the smartphone-conscious default setting of only 2MB for the hash table size.

The initialization of 50,000 table bits is done once-and-for-all, in the source code of programs whose source I know. Doing so enables pre-compiled “books” for openings and endgames to be shared in hashed rather than long form. Pătraşcu and Thorup note that while it is sensible to use fully independent random bits to initialize the tables, their results are mostly unaffected if the bits are only log-wise independent. What many chess programs actually do, however, is compose pseudo-random generators from a shorter (fixed) seed to generate the bits.

For instance here the documentation of GnuChess 5.0 states that an array of 55 unsigned 32-bit integers has been generated from the seed ${1}$ using Mathematica 2.0, and then the array seeds a pseudo-random generator known from Donald Knuth’s Art of Computer Programming text and also the Numerical Recipes series. Before we say more about this in an upcoming post, let us address independence and the results in Pătraşcu and Thorup’s paper.

## Independence

The different choices of 50,000 or however-many bits for the table(s) amount to randomly choosing a hash function ${h}$ from the entire family ${\mathcal{H}}$. A family ${\mathcal{H}}$ is ${k}$-wise independent if for all distinct arguments ${x^1,\dots,x^k}$ and not-necessarily-distinct values ${y_1,\dots,y_k}$ in the range ${S}$ (which here is ${\{0,1\}^s}$),

$\displaystyle \Pr_h[h(x^1) = y_1 \wedge \dots \wedge h(x^k) = y_k] = \frac{1}{|S|^k}.$

Note that we are talking about hashcodes of whole scenes ${x = x^j}$, not just of their feature components ${x_i}$ in column ${i}$.

Unordered tabulation (Zobrist) hashing is 2-independent. For any two different scenes, there is a feature ${f}$ in one but not the other. Since ${f}$ corresponds to a table entry that is not used in the second scene, its bits are completely independent of it, and since every bit in the hash of the first scene includes an XOR with a bit of ${f}$, the scene values are independent. However, the third scene that comprises all such features ${f}$ is not independent, because its hash is always the XOR of the hashes of the other two. Thus unordered tabulation hashing is not 3-independent.

The ordered case is 3-independent, however. The reason is that if you have three different scenes, either they have three different values in one column, or differences in two columns between different pairs of scenes. In the former case we are using three distinct table entries that are unique to their scene, so as above their hashes are independent. In the latter case the first column with a difference has one of its entries unique, making the corresponding scene independent. Then the other column must have different entries between the other two scenes, and one of those two values is unique in its column, thus making all three scenes independent.

A counterexample to 4-independence using two columns is obtained for any four features ${a,b,c,d}$ via

$\displaystyle \begin{array}{|cc|} \hline a & c\\ \hline b & c\\ \hline a & d\\ \hline b & d\\ \hline \end{array}$

because the hash of the fourth row is always the XOR of the first three.

Note that the counterexamples to higher independence are rather special-case. Many cases of a fairly small number of scenes will lend a unique feature (in a given column) to each, and then they will all be independent. More generally, even large sets of ${N}$-many scenes will offer something like ${\log N}$-wise independence over most of their moderate-sized (such as ${\mathsf{polylog}(N)}$) subsets. This is an important insight that is exploited to great lengths in their paper.

## Some Results

The main results have the character that simple tabulation-hash schemes achieve the same or similar performance criteria as hash families that have higher independence in worst case, while permitting speedier implementations. For instance, tabulation-hash schemes when employed with linear probing or cuckoo-hashing setups satisfy Chernoff-type bounds that limit the probability of a poor-performance case, over the initialization of the tables and/or the space of (collections of) scenes. The paper itself is long, and we refer readers to it for the details.

We do, however, mention one interesting technical lemma that is completely new to the paper. This is that any collection of five scenes—in the ordered case—includes one that is independent of the others. That is,

Lemma 1 For all distinct ${x^1,x^2,x^3,x^4,x^5}$ there is an ${i}$ (let’s permute to say ${i = 1}$) such that for all target values ${y_1,y_2,y_3,y_4,y_5}$,

$\displaystyle \Pr_h[h(x^1) = y_1 \wedge \dots \wedge h(x^5) = y_5] = \frac{1}{|S|} \Pr_h[h(x^2) = y_2 \wedge \dots \wedge h(x^5) = y_5].$

The idea of the proof is that the above counterexample to 4-independence somehow saturates the ways you can have dependence in two columns, while having 5 scenes does not afford any new way to have dependence across three columns. Under augmentations of simple tabulation hashing that provide fresh table bits for certain combinations, as in two of Thorup’s joint earlier papers, this lemma yields that schemes previously proved 4-independent are automatically 5-independent.

Note that this does not apply to the unordered case. Take any legal chess position and hash the White and Black pieces separately. Each of the three scenes is then the XOR of the other two, so none of them is independent. Dividing White and Black into halves makes five scenes each dependent on the others.

## 3-Independence in Chess

The same logic applies to positions in Go: Divide any legal Go position into two halves equitably and you have two legal Go positions. Thus Zobrist hashing in Go really is not 3-independent. In Chess, however, it is. The XOR case just above involves illegal positions because one or the other King is missing.

To prove that chess hashing is 3-independent, consider any three legal chess positions, call them ${p,q,r}$. If one of them has Black’s King on a different square, then this is a unique feature making it independent of the other two. So suppose they all have Black’s King on the same square. Thus ${p,q,r}$ all have the same feature ${k}$, call its hash ${a = h(k)}$. Now every other feature of ${r}$ must belong to ${p}$ or ${q}$, else it is unique to ${r}$ and again ${r}$ is independent. Let ${v}$ collectively stand for the XOR of hashes for features shared only with ${p}$, ${w}$ for those shared only with ${q}$, and ${z}$ for those besides ${a}$ shared with both. Finally observe that either one of ${p}$ or ${q}$ has a unique feature, and so is independent, or the following holds:

$\displaystyle \begin{array}{rcl} h(p) &=& a + v + z\\ h(q) &=& a + w + z\\ h(r) &=& a + v + w + z. \end{array}$

This gives ${h(r) = a + z + h(p) + h(q)}$. Since ${a}$ is independent from ${z}$, it follows that whatever values are fixed for ${h(p)}$ and ${h(q)}$, ${h(r)}$ can be any value with equal probability. This proves 3-independence.

The only property used here is that each position has a Black King. We wonder what other conditions might yield the same conclusion, or give even higher independence. Note that the ordered case is the same as the unordered case for an alphabet of pairs ${(c,i)}$ where ${c}$ is a character and ${i}$ is an index—figuratively, the pair is a piece on a certain square. The ordered case uses the property that two pieces cannot occupy the same square, but there is more to it.

## Open Problems

For what natural restricted collections of scenes does tabulation hashing enjoy higher independence?

Note that the results all assume the table entries themselves are independent. Since the entries are ${s}$ bits wide, and no more than five entries are used at the same critical juncture in any of the lemmas mentioned here, they hold under ${(5s)}$-wise independence of the bits. As mentioned above, Pătraşcu and Thorup treat this issue in an early section where they say log-wise independence suffices for the estimates in their main results. What more can be said?