Evolution: A Complexity View
The role of sex?
Cropped from source
Christos Papadimitriou has a joint paper with Adi Livnat, Aviad Rubinstein, Gregory Valiant, and Andrew Wan that will appear soon at FOCS 2014. The conference is October 19–21 in Philadelphia, with workshops and tutorials on the 18th. Here are the accepted papers, several others of which interest me a lot. The last parallel session on Monday afternoon before my own award lecture has three of them in one room, including a paper co-authored by my recent student Danupon Nangonkai, and three on quantum—it would be nice to be in a quantum superposition and attend both sessions.
Today Ken and I want to discuss their paper, which is on complexity-theoretic aspects of evolution.
Yes, evolution. Theory’s long reach has in the past touched diverse fields of science, engineering, and many other areas of human endeavor. But evolution is now getting more and more interest from theorists. Wikipedia begins its description with the sentence:
Evolution is the change in the inherited characteristics of biological populations over successive generations.
This doesn’t sound too far different from the idea of a computation “evolving” over successive steps. What distinguishes the biological kind is Charles Darwin’s identification of natural selection as the main mechanism, specifically for adaptation after changes in the environment. The discovery that genes in DNA are describable by strings, likewise their mutations, powerfully suggests a mating of biology and the “it from bit” view from theory. Can we use theory to turn Darwin inward and explain how complexity and diversity can arise from a simple Boolean model?
Theory and Fact and Paradigm
For many reasons that we will avoid, of all theories in science, evolution gets the most push-back. There are not many popular attacks on the inverse square law of gravity, even though string theory provides contexts in which it is false at short ranges. That matter is made of atoms used to be contentious, but no more, and there is no argument over how chemical bonds work. Evolution seems to get more than its share of attacks. Way more.
There is even a Wikipedia article on why this is so:
Many scientists and philosophers of science have described evolution as fact and theory, a phrase which was used as the title of an article by Stephen Jay Gould in 1981. He describes fact in science as meaning data, not absolute certainty but “confirmed to such a degree that it would be perverse to withhold provisional assent.” A scientific theory is a well-substantiated explanation of such facts. The facts of evolution come from observational evidence of current processes, from imperfections in organisms recording historical common descent, and from transitions in the fossil record. Theories of evolution provide a provisional explanation for these facts.
Ken thinks Gould’s point is better stated as evolution being not just a theory but also a paradigm. Within it, theories of particular mechanisms are evaluated; outside, the paradigm is applied via “evolutionary approaches to…” just about everything. The former give the impression that there is disagreement about the fact of evolution itself, while the latter include enough overreaches and oversimplifications to muddy the whole enterprise. The fact of evolution is also confused with questions about its rate and hill-climbing ability when “left to itself,” though experience in hospitals and our current lack of knowledge about complexity should forestall negative pronouncements on the latter.
What compounds the matter is that evolution can be enlarged to the study of how complex organized information systems arise. Indeed, several books by Richard Dawkins emphasize information content in biology and the quest for a “crane-lifting” mechanism for information in physics analogous to Darwin’s in biology. There the paradigm intersects directly with computer science:
The serious question arises of how fast evolutionary-information processes can work as measured by standard yardsticks of complexity, and how much large-scale change they can effect. This makes a context for the paper.
Their paper is titled, “Satisfiability and Evolution.” Its abstract is:
We show that, if truth assignments on n variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in n and the inverse of the initial satisfaction probability) will end up almost surely consisting exclusively of satisfying truth assignments. We argue that this theorem sheds light on the problem of the evolution of complex adaptations.
Ken and I really like this work. Really. One reason is that it is a positive result. Often—not always—but often when we apply theory to areas from other parts of science we run into negative results. Telling people in another area of research that they cannot do X is not usually helpful. In many cases they need to do the work anyway, and so telling them a negative is really less than useful.
Another reason is that they work uses the evolution setup to show that a certain reasonable type of behavior happens. This is a quite pretty result about Boolean functions, independent of any connection to evolution. Further it shows that there are likely to be more such results in the near future. I think this paper is potentially quite important.
I am sorry that I missed the evolution workshop last winter at the Simons Theory Center—I got a bad cold and could not travel. I imagine that some of the work that is in this paper came out of that event. However, Ken heard a related talk invited to the Algorithmic Aspects of Information Management (AAIM 2014) conference in Vancouver in July, where Ken also had a paper. So we can turn the rest of this post over to Ken.
Benefits With Benefits
I, Ken, felt like I was following Christos around in summer 2012. He gave one version of this talk at the Turing Centennial in Princeton in May, which we covered. Then I heard much the same talk invited to AAAI-2012 that July in Toronto, where I was battling a robot. Then I was honored to speak in the University of Bergen’s Turing Centennial series in September a couple weeks after he did. His talk there was related more to his graphic novel Logicomix, which later appeared under our Christmas tree.
So, this past July, I was curious to see how his talk (later video) had evolved. Well, that is a mis-use of “evolved” unless he and his co-authors introduced processes to mutate slides and select—over successive generations—the ones with the highest research fitness. Perhaps they did. I was glad that a genetic example from the 2012 talks survived, since I had advertised—in encouraging my wife to join for the 8:20am start—that the talk would include sex.
We reproduce his first main example here. Consider two genes, and , with 4 and 5 alleles (mutable variants) respectively occurring in a population. Suppose the benefits of each of the 20 possible combinations are given by this matrix:
The argument is that global optimization techniques, such as simulated annealing, will tend toward the combinations with highest fitness unto themselves, here scoring and scoring . These are likened to asexual propagation. However, row has the best fitness expectation over a partner drawn randomly from the partition . Likewise columns and “mix best” when combined over the range of gene . The sexual recombination algorithm gives a random partner according to the population’s distribution, and each offspring gets or , likewise or , with 50-50 probability.
In a 2011 paper with Livnat and Marcus Feldman formalizing this process, he showed that isolated “peaks” like and tend to wear down in frequency, but “plateaus” like row and some of the columns survive. This showed why we need sex—or at least gave an algorithmic mechanism to answer the question:
Why is recombination (i.e., sexual reproduction) more successful than asexual reproduction?
The Missing Link to Complexity
Despite this answer, Christos admitted that “the role of sex is to this day not understood.” His talk went immediately on to a question that is more eye-opening than eyebrow-raising:
How can this crude mechanism and model account for the miracle [of complexity that] we see around us?
As stated in the new FOCS paper:
Given the reshuffling of genomes that occurs through recombination, how can complex traits that depend on many different genes arise and spread in a population?
The “complex traits” are approached and modeled by standard tools in complexity, most notably Boolean functions. Here is a mutated version of an example in the talk: Suppose we have monotone predicates of genetic factors , where enables a trait to be possible and guarantees it. Let be Hamming weight, and let be a numerical function of the environment . Suppose a trait follows a sigmoid probability function with high slope around a threshold as follows:
If the environment changes in a way that both increases and makes advantageous, then not only will the incidence of go up, but the above recombination algorithm will favor organisms that possess all or almost all of the factors. Then even if reverts to a lower level, the new population may have so many members with high that not only does remain satisfied but also . Then the trait is locked in to much of the population. This yields an interpretation of a famous experiment by Conrad Waddington which is common to the talk and paper.
The FOCS paper uses none of , , , , , or —these characters (some by me, some by them) die out or maybe stay latent. What it preserves is an abstraction of the counting advantage that having more factors gives in the above analysis. Let be a general Boolean function whose satisfaction confers a advantage in the following way: Define ; that is, is on satisfying assignments, on unsatisfying ones. Let the probability distribution on at time be defined by independent probabilities for each bit, and updated by the rule
We can re-cast this using the original and probabilities rather than expectations:
The question is, does the trait of satisfying come to dominate the population? Let be the initial satisfaction probability under whatever initial distribution is given, say uniform. They prove:
Theorem 1 If is monotone, then
The bulk of the paper, however, is devoted to how far this extends when is not monotone. The process has to be modified, the numerator in equation (1) is rewritten to confer possible advantage when too (according to and the variance of bit in ), is replaced by a multilinear extension with values in the interval , the size of the population needs to be large enough, more-complicated polynomial functions of , , and are involved, and there are interlocking bounds on , , , and the success probability. As usual see the paper for details, but we can abbreviate their result drastically as follows:
Theorem 2 Taken over all ,
We suspect that no human relationship counselor would disagree.
The paper includes a long discussion of possible variants and improvements to the model and results. Insofar as , namely the initial probability of a random assignment satisfying , is a factor in the denominator, the results lose force when is tiny. Indeed the authors disclaim relevance to the task of finding satisfying assignments to a hard formula with small . What they call “weak selection” owing to the smallness of in their main results becomes an interesting problem of how far can be increased, and maybe this informs general issues about the complexity of Boolean functions and applicability of Fourier techniques, not just evolution.
[changed order of names in first sentence from paper’s alpha order]