When Min-Max is Wrong
Playing games with large variance and the small support principle
John von Neumann is one of the great mathematicians of the twentieth century, who is credited with creating the architecture of all modern computers. Besides solving many important problems, he was a genius at creating new fields such as game theory, a mathematical theory of quantum mechanics, a theory of self-reproducing machines, and a theory of fault-tolerant computation; the list goes on.
Today I will talk about a question that relates to von Neumann’s seminal work on game theory, that is developed in his famous book with Oskar Morgenstern. Curiously, it is a question that their theory of how to play zero-sum games seems to get “wrong.” I will explain the question, give some partial results due to Deeparnab Chakrabarty, Subhash Khot, Nisheeth Vishnoi and myself; and relate all this to the Small Support Principle. There are many open questions that remain: I hope that someone is able to advance our understanding and solve some of them.
There are many interesting stories about von Neumann, especially concerning his tremendous ability to solve problems quickly. Not all great mathematicians are fast, but he had one of the quickest minds of all times. Once, while visiting Los Alamos during the Manhattan Project, he was asked for help in evaluating a certain complex formula. No one had even come close to computing its value, so they asked von Neumann if he could help them figure out an approach to computing an approximate value. He asked why, and was told this formula was central to the atomic bomb’s design. He seemed puzzled and again asked why they wanted an approximate value? The scientists stated they could not calculate the value, so they were trying to determine a computational method for getting an approximate value. In those days, before even crude computers existed, computing values of certain expressions could be daunting. They were completely stumped.
John responded that he did not know the approximate value, nor did he have a good idea how to compute the approximate value, but he did know the exact value. They were shocked, for he had solved their problem–not approximately but exactly, and all in his head.
When Min-Max Goes Wrong
One of the simplest zero-sum games is “matching pennies.” Two players, call them row and column, each have a penny. They secretly place their pennies down on a table, covered by their hands: then, they reveal their pennies, if the pennies are both heads or both tails, then row wins; if they are different column wins. The winner gets the other player’s coin.
This is a simple example of a zero-sum game that requires mixed strategies. The best that each player can do is to select heads or tails with probability . The expected payoff to each player is : it is a “fair” game. As usual the game can be represented by a matrix:
Let’s change the game to this:
This is also a fair game, but what strategy would you use? Column has no choice, except to continue to play each column with probability . However, row now has many choices. If row plays any of the bottom two rows, there is a chance that he will win a million dollars, but there also is an equal chance that he will lose a million dollars.
If I had to play this game as row, I would restrict my play to the top two rows; thus, I would avoid any chance that I could lose a million dollars. Classic min-max strategy theory gives no reason for doing this, yet I claim that most of us would play to avoid the huge swings. Do you agree?
Let’s consider a slight variation of the last game:
where is a small positive amount. Then, min-max theory predicts that the game is still fair, but now there is one unique strategy for row. Row must play each of the bottom two rows with equal probability .
I would not follow the min-max theorem–with due respect to von Neumann. I would still play only the top two rows. I would be happy to play a slightly losing game, rather than play a scary fair game. Obviously, I want to avoid huge potential swings, I would be very unhappy if I lost a million dollars. The key is that the min-max theorem only considers the expected payoff, and not the variance of the payoff.
In order to examine how to take variance into account, I need to explain some results about games.
Small Support Principle
Suppose that is an by matrix that represents a zero-sum game. If is huge, then even writing down a mixed strategy is impossible. Imagine a game with strategies; clearly, there is no way to even consider writing down any mixed strategy.
If is a strategy vector, then define the support of the vector as the number of so that We would like to have strategies with small support, but this cannot be done in general, since there are games that require a player to play all the strategies with a positive probability. However, if a player is willing to give up a small amount of the expected payoff, then there are always strategies that have small support. This is a theorem of Neal Young and myself, that was also proved independently by Ingo Althöfer. Let’s call this the Small Support Principle (SSP):
Theorem: Let be a by matrix with entries in the interval . Also let . Then, there is a strategy for row (column) that has support at most
and row (column) get the value of the game within an additive error.
The reason Neal and I proved this theorem had nothing to do with game theory. Nothing. We were working on a problem in pure complexity theory, and needed to be able to write down strategies. But, that was impossible, since they were exponentially large. I remember after discussing the complexity problem with Neal for a few days, he came in the following day with essentially the above theorem all worked out. I was very impressed: Neal can be quiet, yet is a very strong researcher.
SSP for Non-Zero Sum Games
The work with Neal on zero-sum games was done while I was at Princeton. When I got to Georgia Tech in 2000, computational game theory had become a “hot” topic–studied for its own sake. One day I suggested to two graduate students, Evangelos Markakis and Aranyak Mehta, that we try to see if non-zero sum games also had an SSP. As it turns out the SSP holds for Nash equilibrium, as it does for zero-sum games. The proof is really close to the proof of the zero-sum case; however, initially, we were stuck. Somehow, we could not prove the theorem, and we almost gave up trying. Luckily we did not stop, since the result is quite nice.
Theorem: Consider any two player non-zero sum by game with payoffs in and let . Then, for any Nash equilibrium, there are strategies for the players with support at most
- the strategies are -equilibriums;
- the row player gets the same payoff as the in the Nash equilibrium within additive error ;
- the column player gets the same payoff as the in the Nash equilibrium within additive error .
I will not define the machinery of non-zero sum games, see our paper for the technical details.
If you are completely new to non-zero sum games, here is a short summary: The min-max theorem is no longer is true for non-zero sum games. However, John Nash proved that there always are mixed strategies for row and column, with the no defection property: row will never want to change his strategy provided column does not change, and column will never want to change his strategy provided row does not change. This “I am happy, if you are” is the reason these pairs of strategies are called an equilibrium. The surprise is that such strategies always exist–Nash won the Nobel Prize for his definition of equilibrium and the proof of its existence.
This theorem has an immediate corollary about computing approximate Nash equilibrium:
Corollary: There is a quasi-polynomial time algorithm that finds an -Nash equilibrium, for any non-zero sum game.
Here quasi-polynomial time is time bounded by for some constant that can depend on .
I have stated the last two results for two players, but everything holds for any number of players. The above corollary is a mixed blessing–in my opinion. On the one hand, it is neat that finding approximate Nash equilibrium is “almost” computable in polynomial time; on the other hand, it means that proving approximate Nash equilibrium is NP-hard would have surprising consequences. I am sure some researchers wish this corollary was not true, but do not blame us.
SSP and Variance
Finally, I will talk about zero-sum games and variance. A couple of years ago when we where trying to get Adam Kalai to join our faculty at Tech, I asked Adam about the variance issue with the min-max theorem. I basically went through the simple examples that I have already gone over today. The next day he was still visiting, and he told me that he had talked to his father–Ehud Kalai. It turns out that his dad is an expert on classic game theory, and said that he had not heard about these variance questions.
Adam also suggested that we think about proving an SSP for zero-sum games that included both expectation of the payoff, and the variance. He even worked out a sketch of the argument. Immediately, Vishnoi started to work on the details, and soon we had a team of us working on the problem. The final paper is the work of Chakrabarty, Khot, Vishnoi, and myself: Adam dropped out for a variety of practical reasons, but we owe him many thanks for suggesting this idea. Why didn’t we think of trying to see if the SSP worked on variance too? Sometimes one is too close to the forest to see the trees. It may be a cliche, but it is often true.
The main theorem that is proved in our paper is:
Theorem: Given bounds
on the payoffs and
on the variance, if there is a Nash equilibrium satisfying these bounds, then in time we can find an -Nash equilibrium such that for every player the payoff is at least and the variance is at most .
This theorem is at least a partial answer to the question of variance. The theorem can be used to get, in quasi-polynomial time, approximate good strategies for the players.
We still do not understand zero-sum games with respect to variance. Given a zero sum game, for example, how hard is it to determine if there is a strategy for row that guarantees an expected payoff of and a variance of at most ? Is this in polynomial-time, or is it NP-hard? This is a major open problem.
I have no idea if this problem is easy or hard. An obvious approach is to try to encode the problem as an optimization problem. This could work, since zero-sum games are reducible to Linear Programming. However, adding a constraint on the variance does not seem to fit into linear constraints–at least not in an obvious manner.
The next open problem is to try to obtain good approximation algorithms for this problem. We have, of course, a quasi-polynomial time approximation algorithm, but that is unlikely to be the final word. Also can one prove lower bounds on how well the problem can be approximated?
More generally, how does adding constraints on variance to any game-economic problem change things? What happens for other types of games? For example, what happens when we repeatedly play a game? Can we add variance as an additional constraint?
There seem to be plenty of interesting open questions that arise once we start to include variance. I hope you can solve some of these in the future.