How important is “backward compatibility” in math and CS?
David Darling source
Henri Lebesgue came the closest of anyone we know to changing the value of a mathematical quantity. Of course he did not do this—it was not like defining π to be 3. What he did was change the accepted definition of integral so that the integral from to of the characteristic function of the rational numbers became a definite . It remains even when integrating over all of .
Today we talk about changing definitions in mathematics and computer programming and ask when it is important to give up continuity with past practice.
Continuity is an example of such continuity. Bernard Bolzano’s original definition of 1817 amounts to much the same as the standard “epsilon-delta” definition by Karl Weierstrass a half-century later: A function between metric domains and is continuous if for every point and there is a such that for all points having , it follows that . The newer definition is that is continuous with respect to topologies on and on if for every open set of , the inverse image
is an open set of .
This is backward-compatible: The metrics uniquely define topologies on and whose basic open sets have the form and similarly for . For these topologies the newer definition is equivalent to the one. Of course one virtue of the new definition is that it can be used for topologies that do not give rise to metrics.
Lebesgue’s change to the meaning of integral was not backward compatible in terms of the upper limit. It changed the status in texts that referenced Bernard Riemann’s definition which used approximations by rectangles. For the characteristic function of the rational numbers in the Riemann upper sum is whereas the Lebesgue upper sums—using open coverings that need not induce partitions of —approach zero and so meet the lower sum. It is compatible whenever a function satisfies Riemann’s definition that the upper and lower approximations by rectangles meet in the limit. One can still talk about Riemann integrability, perhaps as modified by Thomas Stieltjes, but the difference is that you must include one or both names—if you don’t then you default to Dr. Lebesgue’s definition, and then the status for functions like is different. Most to our point, the “interface” for working with the Lebesgue integral differs—especially in employing the notion of open coverings.
Guitars and Pi
Compare whether an audio “recording” can still refer to a vinyl phonograph record. When I was a teenager we said “digital audio” or “digital recording” and there was much debate about whether the quality could ever equal that of a physical LP record. It didn’t take long for progress in recording density and playback to tilt the field toward those who feel CDs sound better. Now I don’t think the analog meaning is ever intended without a qualifier.
Digital opens new vistas of sound, much as Lebesgue’s definition founded a great expansion of real and complex analysis. The point we’re making is that it is not backward compatible in operation. You can play the same music as an LP—much as the Lebesgue integral gives the same value as the Riemann integral for most common functions—but you cannot slip an LP into a CD player. Whether CDs should be playable on DVDs, and DVDs on later optical drives, has been the burning issue on the design side.
Less clear is whether electric guitars are a break from acoustic guitars. The notation and interface for playing them are largely the same. It is even possible to bolt on an extension to make an acoustic guitar sound electric at the source. Thus electric guitars have not really changed the definition of “guitar.”
Also problematic is whether a change in nomenclature amounts to a change in definition. I support the position that “pi” should have been defined as 2π = 6.283185307… We’ve argued that the Indian pioneer Aryabhata had this value in mind in the 5th century CE. Doing so would change the look of equations but not the interface.
Physicists use more often than they use Max Planck’s original constant . Sometimes is called the “reduced” Planck constant or named for Paul Dirac, but even when “Planck’s constant” is used to mean in speech nothing is actually being redefined. The two notions of ‘pi’ or ‘‘ are completely convertible.
Definitions in Programming
I’ve been thinking of this recently because I’ve adopted two breaks from the standard definition of codes for chess positions, which I covered earlier this year. One fixes an admitted mistake by the definition’s second creator while the other is needed to adapt to more general forms of chess. Let’s see how they capture in microcosm some larger issues in research and everyday programming.
Consider the position after the reasonable moves 1. e4 e5 2. d4 exd4 3. Bc4 Bb4+ 4. Bd2 Bc5 5. Nf3 Nf6 6. e5 Qe7 7. Bb3 d5:
The FEN code for this position—minus the last two components which do not affect the position according to the laws of chess—is:
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w KQkq d6
The first part tells where the pieces are, then ‘w’ means White is to move, and ‘KQkq’ means that both White and Black retain the right to castle both kingside and queenside. The fourth part indicates that Black’s last move 7…d5 was with a pawn that jumped over the square d6, which might enable an en-passant capture. In fact there is a White pawn on e5 in a position where it could possible make such a capture, but the move “8. exd6 e/p” is not possible because it would leave White’s king in check from Black’s queen. Note that the ‘Q’ in ‘KQkq’ does not mean White can castle queenside right now, just that it is possible in the future. Castling kingside is legal right now, and indeed 8. O-O is White’s best move. Suppose however that White plays 8. Bg5 but after Black’s 8…Bb4+ chickens out of the strong 9. c3 and meekly returns the Bishop by 9. Bd2 followed by the reply 9…Bc5. The position on the board is the same:
However, the FEN code (again minus the last two fields which do not determine the position) is now:
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w KQkq –
The en-passant marker for the square d6 has disappeared, so these are different codes.
Now the 3-fold repetition rule in chess allows the side to move to claim a draw if the intended move will cause the third (or higher) occurrence of the resulting position in the game. When an en-passant move is legal on the first occurrence of the “board position” then it is a different position, so two more occurrences of the board setup do not allow the draw claim. Here, however, if White waffled again by 10. Bg5 Bb4+ 11. Bd2, Black really would be able to claim a draw with intent to play 11…Bc5. The en-passant move is not legal, so White’s immediate options were the same, and the set of options (including castling in the future) is what defines a position.
The issue is that the FEN codes do not reflect this. Computer chess programs want to detect not only 3-fold but even 2-fold repetitions to avoid time-wasting cycles in their search. It would be most convenient to tell this from identity of the FENs, without having to build the chess position and examine pins to check for legality. Unfortunately the FEN standard mandates inserts like the ‘d6’ even when there is no nearby pawn at all.
I have therefore adopted the stricter practice of Stockfish and some other chess programs by including the target square only when an en-passant capture is actually legal. Steven Edwards—the ‘E’ in ‘FEN’—advocated the same four years ago. Happily the difference does not matter to playing or analyzing chess games—so you can say it’s “compatible” for the end-user. It is however a departure at the API level. If an end user or another program submits a non-strict FEN then your program needs to convert it internally. This is painless since you are building the position from the FEN anyway. The second deviation is more consequential for users.
Bobby Plays it Forward
A mark of Bobby Fischer’s genius apart from playing games is that while many champions have mused on tweaking the rules of chess, not just one but two of Fischer’s inventions have gained wide adoption. The greater was the Fischer clock, which metes a portion of the allotted thinking time in increments with each move besides the lump amount at the start of a game session. This lessens the acuteness of time pressure and is now standard.
The second is Fischer Random chess, which is more often now called Chess960 for the 960 different possible starting positions of pieces on the back row, from which one is randomly selected at the start of a game. Variant setup rules had been proposed by many including the onetime champion José Capablanca and challenger David Bronstein, but in my mind the stroke that makes all of these “click” is Bobby’s generalized castling rule. Adding it to Bronstein’s original idea yields this.
This allows the king and rooks to start on any squares provided the king is between the rooks. The between-ness preserves the meaning of castling ‘queenside’ and ‘kingside’ and indeed the destination positions of the king and the rook used to castle are the same as for those moves in standard chess. The other conditions are the same as in standard chess: any squares between the king and the rook must be clear, neither the king nor that rook must previously have moved, and none of the squares traversed by the king may be attacked by the enemy—though this is OK for the rook. If the Chess960 position happens to be the standard starting one then the whole game rules are completely the same—this is a “conservative extension” of chess. What’s not conserved is the standard notation for castling: O-O and O-O-O in game scores, nor e1c1 or e1g1 (e8g8 and e8c8 for Black) in internal chess program code.
The first problem emerges when you think of a Chess960 position with Black’s king starting on f8 rather than e8, say with the rooks on b8 and h8, such as this one:
The notations O-O and O-O-O are still clear, but the corresponding internal notation f8g8 is ambiguous. It could be a normal King move without castling. This is solved by changing the internal notation to f8h8 in that case, figuratively “king takes own rook,” and f8b8 as the internal code for O-O-O instead. Many chess programs accept the “king takes rook” style from other programs even in standard chess, and there is no issue for the end user.
The second problem however is with the external notation when playing Chess960. It is subtle: Suppose Black’s rook on h8 moves to h6, moves later to a6 on the other side of the board, and then has occasion to retreat to its own back row on a8. The first rook move eliminated Black’s kingside castling right, so the original ‘kq’ part of the FEN would become just ‘q’. The problem is that the FEN does not preserve the game history, and at the moment of Black’s Ra8 move, it has forgotten which rook was originally resting on the queenside. If Black subsequently moves the other rook away from b8, or moves the rook on a8 yet a fourth time, how are we to tell whether the ‘q’ castling right persists?
Including the game history in the FEN is not an option—else we could also solve the repetition-count problem whose full headaches could make another post. Various “memoryless” solutions have been proposed, from which my choice is that of Stefan Meyer-Kahlen, designer of the Shredder chess program and co-creator of the standard UCI protocol for communicating with chess programs. A “Shredder-FEN” replaces the ‘KQkq’ by the files of the rooks, again using capitals for White. This eliminates the above ambiguity, as now the ‘q’ reads ‘b’ for the rook on b8.
I’ve chosen to use Shredder-FENs not just for Chess960 but also internally for standard chess, and my code—which I will say more about in a post shortly—also accepts Shredder-FENs. Using both changes, my code stores the following as the FEN for the above diagram at White’s move 8 (including now the two last fields):
rnb1k2r/ppp1qppp/5n2/2bpP3/3p4/1B3N2/PPPB1PPP/RN1QK2R w HAha – 0 8
The ‘HAha’ is no laughing matter—it also simplifies updates when playing a move, partly because unlike ‘KQkq’ the letters [A-Ha-h] do not occur elsewhere in the FEN. My program can export standard FENs—even for Chess960 notwithstanding the ambiguity—but its API committally changes the definition of a FEN.
What are your favorite changes to definitions in mathematics and computing theory? I have not even gone into the myriad definitions of universal hash functions. When is it important to make a clean break with a previous standard, without preserving backward compatibility?
[fixed introduction which confused the Riemann integral with its upper sum; added mention of Bronstein+Fischer variant.]
Oops—I hit publish while logged in on the joint ‘Pip’ handle; the ‘I’ in this post is only I, KWR.