When is a Law “Natural”?
When numbers channel minds as much as minds channel numbers
Adolphe Quetelet was a Belgian mathematician who pioneered the use of probability and statistics in the life sciences. He was a successful writer of opera librettos and poetry and literary essays while teaching mathematics at school and then college level, at the College of Ghent. There Jean-Guillaume Garnier recognized his talents as a mathematician, and invited him to do a doctorate at the just-created University of Ghent, which he completed in 1819. Then he heard the call of Interdisciplinary Research (IR), and went on to found the Royal Observatory of Belgium and the field of what he called “social physics.”
Today I (Ken) wish to talk about some statistical laws that seem to straddle the boundary between social science and “hard science.”
The term “social physics” had already been coined by the French philosopher Auguste Comte, who disagreed with the mathematical focus applied by Quetelet. Deciding to switch rather than fight, Comte re-christened his work sociology, by which name we know the field today. Comte also coined the term altruism, which did not get similarly stepped on. Quetelet’s term fell by the wayside, but his work so galvanized a field with an already-known name that he is sometimes called the “Father of Statistics.”
Quetelet always saw his work in the larger context of science, and as early as 1826 he gave courses in the history of science. He also gave popular lectures on applications of probability theory to various branches of science. We cite three of four quotations from these lectures given in his biography here:
- The more advanced the sciences have become, the more they have tended to enter the domain of mathematics…
- It seems to me that the theory of probabilities ought to serve as the basis for the study of all the sciences, and particularly of the sciences of observation.
- Since absolute certainty is impossible, and we can speak only of the probability of the fulfillment of a scientific expectation, a study of this theory should be a part of every man’s education.
His book On Man and the Development of His Faculties quantified the concept of the “Average Man” and distribution of biometric properties—this was the first application of the normal curve beyond its initial formulation to describe errors.
Quetelet also did some luring of his own, attracting Garnier’s other doctoral student away from his post-doctoral intent to collect and publish the complete works of Leonhard Euler. This was Pierre-François Verhulst, who pioneered the use of equations to model changes in populations of living things. Verhulst noted Euler’s description of exponential laws of growth, but saw the need to model constraints that would limit such growth. The simplest and most efficacious way was to modify Euler’s differential equation (with initial condition )
where is an estimable cap on the possible size of the population. The constant can be regarded as a rate of increased competition for scarce resources as the population expands, or increased vulnerability to predation, or the rate of many factors—attaching it to (as opposed to or ) seems the simplest and salient thing to do.
Here is an example of a select population of living beings whose numbers have closely followed such a logistic curve since it came into existence in 1971.
For such small numbers of specimens, the curve is not too grainy and the agreement is striking. It is not a population of exotic snails or salmon swimming past bears upstream, but one that has included me as a member for most of this time. The full legend given in my joint paper with Bartlomiej Macieja (who compiled the figure) and Guy Haworth shows this to be the number of players rated above 2200 by the World Chess Federation (FIDE) each January since the Elo rating system was adopted in 1971. That is the floor to be called a “Master.” My rating is about 2400, and I own the title of International Master which is typical of that strength. Macieja is a Grandmaster and is rated about 2600, while today’s best players are rated a little above 2800.
Now when I competed in a big room full of Masters last July, I did not observe Masters fighting over scarce chess sets or predators carrying losers out of the playing hall for dinner. There are many more books and computerized resources for improving one’s game today than in the early 1990’s when the pace of growth was steepest. So why has my population been following such a law? Is it just because it’s salient, the simplest thing that models a cap on growth?
Natural Statistical Laws
I am told that many such equations since Verhulst’s have been formulated, and they are usually followed “messily” rather than neatly as above. Note also the comment in Wikipedia’s biography of Verhulst that the logistic-curve model was rediscovered in 1920 amid promotion of “its wide and indiscriminate use.” The variety of interdisciplinary applications favors our calling it a statistical regularity, but the lack of uniqueness backs us off calling it a natural law.
When can regularities of behavior in humans or other biota be regarded as a natural law? Ironically, one requisite I suggest is that the kind of human distinction that was the focus of Quetelet’s statistical treatise should have no role in the operation of such a law. It may require some special ability to participate in the experiment that measures the law, such as skill at chess—but it is better when the law operates just as well for players of Dick’s strength and below, as for my strength and above. This past weekend I have discovered a candidate for such a law. It even has a fixed number: 58%.
The Nature of Rybka
Rybka is the commercial chess program written by Vasik Rajlich which was considered the world’s strongest from 2006 until recently, when programs of varying rumored degrees of derivation from it have nipped ahead of it in some informal competitions. Now Rybka itself has come under a cloud for alleged uncredited derivation from a free-source program called Fruit. Fruit gave rise to Toga II, whose programmer Thomas Gaksch helped me personally until late 2008, by which time Rybka 3 came out and was rated over 150 Elo points stronger than its nearest competitors then.
As with virtually all chess programs—called “engines”—Rybka’s search progresses in rounds that add one more move of lookahead to the previous round. Each round computes for each legal move a value in the standard chess units of centipawns—hundredths of a pawn. Rybka maintains the list in sorted order and uses this order to prune the search for moves recognized as inferior. In the normal Single-Line Mode (SLM) of playing only the best move or moves are guaranteed a full search, but for game analysis one can use Multi-Line Mode (MLM) which gives full treatment to the -best moves—my work sets which usually covers all legal moves in any position. The sort used for the list is standardly stable, so that a move can jump ahead of a move only when its updated value in the current round of search beats that of —if it merely stays tied then keeps its place in front.
My statistical model usues as input the move values obtained after the 13th round of search—those values are the only things specific to the game of chess. It has two parameters called for “sensitivity” and for “consistency” whose pairs of values are fitted to the Elo Rating scale of skill at chess. Given a setting of modeling a player and the values of the available moves in the order listed by Rybka, the model outputs estimates where stands for the probability that will choose move . It is an axiom of the model that if two moves have the same values, then for all players the inferred probabilities will be the same:
After all, two moves of the same value are interchangeable as far as the program is concerned. The sort doesn’t have to be stable, and the search-pruning would work as well if the order of equal-value moves were randomized at each round. My model uses only the values, so equal-value moves are peas-in-a-pod, no?
A sophisticated thought tells you that since values are rounded in the display but could be kept internally to higher precision, the knowledge that comes before means one can expect the true value to exceed by about of a centipawn, i.e. by pawns, and similarly in cases where more than two moves are tied. A difference that slight does get picked up by the model’s equations, but not enough to make the axiom intolerable for fitting. Hence it was a shock for me to discover that the axiom is not only false in practice, but markedly false—false as if the difference in value were significantly more than a centipawn.
The Law of Rybka
Cases where Rybka shows two or more moves tied for equal-best are common, indeed over 13% of positions that occur in games. Hence I found out early on that human players chose the first-listed move significantly more often than the others, approaching 58% when just one other move is tied. Since the overall spectrum has humans choosing one of the first three moves listed over 85% of the time, I resorted to a “fudge factor” only for these equal-top cases. To induce a 58%-42% split my curves said to drop the value of tied moves by about 1.5 centipawns (i.e., by 0.015), but curiously I found that doubling this worked better in the fitting. So I closed my eyes and punched a 0.03 fix into my 10,000+ lines of C++ code, getting decent results though still feeling that my model was somewhat out-of-tune.
All of my analysis has been conducted on one ordinary quad-core 64-bit home-style PC. It takes typically 6-8 hours to analyze a game—since I skip turns 1–8 and games average about 40 turns this means 30–35 moves for each player. It often takes 10–15 minutes to analyze an early turn to depth 13 in 50-line mode, scaling down to under half a minute in the endgame. Overall it’s about 10 moves (5 for each player) an hour, so allowing for some pauses, each core does about 80,000 moves per year—and in 2-1/2 years of operation I’ve amassed over 750,000 moves done this way—supplemented by over 10,000,000 moves on my other 4-core PC where almost 200,000 games representing the major history of chess have been done in SLM which takes 10-15 seconds per move (except when Rybka 3 goes into an unexplained hours-long stall, a huge and frequent headache which has impeded me from doing automatic scripting and recruiting helpers). All this has gone on in background windows while I do research, write papers, copy-edit and compose blog posts, and everything else including sleep.
Only last weekend did I realize I’d amassed enough data to test cases of equally-valued moves that are not tied for best, indeed are markedly inferior moves buried way down the list. I expected a regression to 50-50, but found instead a house-of-mirrors effect:
Whenever Rybka 3 depth-13 gives the same value to two moves—even inferior moves—the move listed first is preferred by human players 58% of the time, independent of rating or whether the turn number is 9–20, 21–40, or > 40. When three moves have the same value the splits are 44%–32%–24%, whose first and second pair have the same 58% ratio, and similarly for four-or-more tied moves. There appears to be no more deviation than comes from Bernoulli trials that have these true probabilities.
For instance, my data set includes every move of every game played in tournaments of “Category 20” and higher, meaning an average Elo rating at least 2725. Out of 127,258 moves, 7,107 of them give the same value to the 3rd-listed and 4th-listed move, where is 0.25 or more pawns worse than the best move. Out of 472 cases where the player chose one of those two moves, the 3rd-listed one was chosen 284 times, giving 284/472 = 60.2%.
I have posted most of my raw data here, with the above plus my 2700-level training set plus all recent world championship matches collected into a file TopRange.txt, games by 2200–2600 players labeled MasterRange.txt, and games mostly by 1600–2200 in MidRange.txt. In the union of those three files there are 55,306 cases with a two-move tie within the top ten moves where one of those two moves was played. The move listed first was selected 38,048 times, for 58.26%. Then I isolated cases where the move played was inferior by a full pawn or more—really a “blunder.” Such a move cannot be tied for first-best, but in every other index position I see similar results:
Super-Fishy, or Superficial?
A blunder is called a “fish move,” while ironically “Rybka” means “Little Fish” in Polish. What can possibly be causing human players to have such a uniform preference for the first-listed of two equally bad moves? When I try this on the three skill divisions of my data set, there appears to be no sensitivity to rating. Nor does it matter much when I restrict to early moves in a game—note that this basically excludes the trace of cases where promoting a pawn to Queen or Rook has equal value because the opponent will capture it regardless, but of course players will choose the Queen.
I don’t yet have a robust hypothesis on the cause, but I can offer a tentative one. I intend to extend my model from the fixed depth 13 to a probability distribution over values at all depths. That is to say, at every move, every human has a set chance of playing like a fish or playing like Fischer. For myself I know I sometimes make a move on superficial impulse, then sigh in relief that a reply I didn’t see can still be met…or not. The first-listed move may be there because of a superficial difference that the computer program picks up in its raw evaluation function as the iterative minimax search begins, with the move never dislodged owing to stable sorting, and perhaps this is what we human players pick up. The better players play like Fischer more often, but perhaps there is not much difference in our occasional “fish mode.”
My work has turned up other regularities. The vast majority of chess games nowadays impose a time control at Move 40. Right now I am following a game by my co-author Macieja via Chessdom.com’s real-time site, and it is Move 28 with both players having only minutes left to make twelve more moves. It is no surprise that the average error ramps up steadily to Move 40 then drops back like Niagara. This is an obviously human factor of procrastination and decision making under stress, and should not be confused with a natural law. In-between is my discovery that the probabilities depend on the values of moves relative to the overall value of the position, much as one should really judge price movements in stocks relative to the current price rather than absolutely. That is to say, human players follow a simple log-log law of scaling, which my program corrects for.
Not only does the 58% regularity seem more spooky and less explainable, but after I imposed it by fiat on my probabilities—and upon my also weighting moves according to the entropy of the inferred probability distribution (namely, by )—suddenly my model seems to have been tuned almost perfectly. I am still examining this of course, but I can already say it fixes a previous systematic under-estimation of the intrinsic strength level of Category 20+ tournaments that you can still see in my not-yet-updated ratings-compendium draft paper. The need to allow for the phenomenon on all tied moves, not just equal-best moves, may explain why the original 0.03 patch worked better than 0.015. It would be nice to check it for an engine not suspected of being related to Rybka—which suspicion moots my previously having observed the 58% phenomenon for equal-best moves with Toga II in 2008.
My “Fidelity” site has other updated material since last week.
Can you explain the regularity? What significance does it have?
Does it show equally for any chess engine that uses iterative deepening with a stable sort? Does the property pertain to the program, to chess, to brain biology, or something information-theoretic?
Can chess masters be said to “detect” differences of one-hundredth of a pawn in the value of a move, on grounds that they are observed to select the epsilon-better move a significant 55% of the time? Is it still the slime-mold story?