# Self-Play Is Key?

*Self-play and Ramsey numbers*

[ Talking about worst case ] |

Avrim Blum is the CAO for TTIC. That is he is the Chief Academic Officer at the Toyota Technological Institute of Chicago. Avrim has and continues to make key contributions to many areas of theory—including machine learning, approximation algorithms, on-line algorithms, algorithmic game theory, the theory of database privacy, and non-worst-case analysis of algorithms.

Today I want to discuss a suggestion of Avrim for research on self-play.

Self-play is the key to many recent AI results on playing games. These results include essentially solving the games Chess, Go, Shogi, forms of poker, and many others. They were solved by algorithms that start with no knowledge of the game, save the rules. The algorithm then learn the secrets of playing the game by self-play: by playing games against itself. For example, the AI chess programs did not know that “a rook is worth more than a pawn.” But they discover that by playing the game over and over. Impressive.

For example, David Sweet on his hacker site says referring to self-play:

This is mysterious to me. If it only played against itself, where did new information come from? How did it know if it was doing well? If I play a game of chess against myself, should I say I did well if I beat myself? But when I beat myself, I also lose to myself. And how could I ever know if I’d do well against someone else?

## Planted Clique Problem

I was at TTIC last month and over lunch we discussed self-play possibilities for theory problems. I suggested that the planted clique problem might be a potential example. Recall the planted clique problem is the task of distinguishing two types of graphs:

- Random graphs on vertices generated with , the probability that there is an edge between each pair of nodes;
- Random graphs on vertices generated with , with a clique of size added, the planted clique.

If is large, it is easy to tell these apart—just count the number of edges. If is small enough, it is open if one can tell them apart. The largest clique in a random graph typically has size near . This implies that there is a quasi-polynomial time average-case algorithm: just try all subsets of size around .

My intuition was that a program might be able to exploit self-play to solve planted clique problems. The point is that it is easy, by definition, to generate “yes” and “no” examples for this problem. Note, this is not known for SAT problems—generating hard instances there is not clear. This was my point. Could the AI methods somehow divine the planted clique version of “a rook is worth more than a pawn”? Could they use self-play to solve planted clique problems?

I wondered to the lunch group about all this. It left the group unexcited.

## Ramsey Problem

Then Avrim asked a better question. He wondered if self-play methods could be used to solve a long standing problem concerning Ramsey numbers. Recall the Ramsey number is defined as the smallest such every red-green coloring of the edges of the complete graph has either a red or a green subgraph of size at least .

The exact value of is unknown, although it is known to lie between and . See a post by Gil Kalai on his blog for some discussions. Joel Spencer quotes Paul Erdős:

Erdős asks us to imagine an alien force, vastly more powerful than us, landing on Earth and demanding the value of or they will destroy our planet. In that case, he claims, we should marshal all our computers and all our mathematicians and attempt to find the value. But suppose, instead, that they ask for . In that case, he believes, we should attempt to destroy the aliens.

Aliens are not attacking currently, but Avrim’s idea is that perhaps we could organize a self-play attack on the problem. The idea would be to try to build a “game” version of this question. The algorithm would try to create a strategy that finds a red/green coloring for the complete graph so that no -clique is all red or all green.

We need to arrange the computation of the Ramsey number as the result of some type of game. The paradox to me is that the play of a game suggests , while the Ramsey calculation is clearly of complexity . So how do we make the Ramsey calculation into a game? Ken and I wonder if there is a natural game so that playing the game well yields insight into the value of a Ramsey number.

Is there a game with simple rules so that playing it well yields bounds on general Ramsey numbers?

## Attacks on Ramsey Numbers

There have been several attempts to use non-standard methods to compute Ramsey numbers. See the following:

- In this article, quantum methods studied for the future, when hopefully quantum computers will be available.
- In this project genetic methods are used on related Ramsey numbers.
- In this approach, AI methods are used on related Ramsey numbers.

See also this survey on computational methods in general.

The asymmetry between the upper and lower bounds shapes approaches. The lower bound of 43 was proved by finding a two-coloring of the edges of without a green or red -clique. Once a single coloring is guessed its property is easy to prove. The improvement of the upper bound from 49 to 48 two years ago needed checking two trillion cases in order to fence-away all possible colorings. This has led to a common belief that 43 is the answer—if it were higher then a coloring of would have been found by now.

Can we use self-play to turn this belief into something more concrete? The training would begin on running self-play on the known cases . This should create a neural net that is highly skilled at finding colorings that are free of small monochrome cliques. The question is how to leverage its presumed failure once we hit size .

## Open Problems

Perhaps someone should take Avrim’s suggestion and try it out. A natural idea would be to see if this approach could compute the known smaller Ramsey numbers—getting their exact upper bounds.

Was it missed since ancient times or it is

well-known fact:

http://vixra.org/abs/1909.0028

⭐ lol! it excites me if not your lunch group! have been working on ML approaches to theory for over ½ decade, many refs in my blog. traditionally ML approaches came from outside of theoretical CS but theres increasing overlap nowadays. think this will all change with some kind of theoretical breakthru solved by ML and think its on the near horizon, there are nearby examples. can ML build proofs? think the answer is yes but an entire new theoretical edifice has to be explored/ built up etc.

self-play is not so mysterious if you see it as

explorationwhich is closely tied withcuriositya key aspect of future ML advances, and the community is finally hot on the trail. it will get even hotter soon.Dear vznvzn:

Thanks again. Can you explain more what you are up to?

Best

Dick

“These results include essentially solving the games Chess, Go, Shogi, forms of poker, and many others.”

As far as I know, none of the games you’ve listed have really been solved, and “essentially” seems to be serving a pretty weasely role in that sentence. Yes, these algorithms seem to play well against imperfect opponents, but so do the best human players. It’s plausible that there’s some “objective” sense in which these algorithms play better than those humans, but that can only tell us so much.

Here is a fun trick with finite automata that I figured out the other day while thinking about your old (10 years old!) post on the DFA intersection problem. The trick is: it is possible to compute Chaitin’s constant for a DFA in polynomial time.

Recall that Chaitin’s constant for the language L over the alphabet {0, 1} can be defined as the sum over all strings x in L of 2^(-|x|-1). We can use this to turn the DFA into a Markov chain.

In any state of the DFA, either we are at the end of the input string, or there is another input symbol to read. Under our measure on strings, exactly half of input suffixes end here, while the other half continue. Of the strings that continue, the next symbol is 0 in half of the strings, and 1 in the other half. So the next step from each state is chosen as “end of input” (accepting or rejecting as appropriate for the state) with probability 1/2, “next symbol is 0” with probability 1/4, and “next symbol is 1” with probability 1/4.

Then you just find the steady state of the Markov chain, which can be done in polynomial time by solving a system of linear equations using the Bareiss algorithm.

Wondering if anything useful can be done with this.