Playing games with large variance and the small support principle

John von Neumann is one of the great mathematicians of the twentieth century, who is credited with creating the architecture of all modern computers. Besides solving many important problems, he was a genius at creating new fields such as game theory, a mathematical theory of quantum mechanics, a theory of self-reproducing machines, and a theory of fault-tolerant computation; the list goes on.

Today I will talk about a question that relates to von Neumann’s seminal work on game theory, that is developed in his famous book with Oskar Morgenstern. Curiously, it is a question that their theory of how to play zero-sum games seems to get “wrong.” I will explain the question, give some partial results due to Deeparnab Chakrabarty, Subhash Khot, Nisheeth Vishnoi and myself; and relate all this to the Small Support Principle. There are many open questions that remain: I hope that someone is able to advance our understanding and solve some of them.

There are many interesting stories about von Neumann, especially concerning his tremendous ability to solve problems quickly. Not all great mathematicians are fast, but he had one of the quickest minds of all times. Once, while visiting Los Alamos during the Manhattan Project, he was asked for help in evaluating a certain complex formula. No one had even come close to computing its value, so they asked von Neumann if he could help them figure out an approach to computing an approximate value. He asked why, and was told this formula was central to the atomic bomb’s design. He seemed puzzled and again asked why they wanted an approximate value? The scientists stated they could not calculate the value, so they were trying to determine a computational method for getting an approximate value. In those days, before even crude computers existed, computing values of certain expressions could be daunting. They were completely stumped.

John responded that he did not know the approximate value, nor did he have a good idea how to compute the approximate value, but he did know the exact value. They were shocked, for he had solved their problem–not approximately but exactly, and all in his head.

When Min-Max Goes Wrong

One of the simplest zero-sum games is “matching pennies.” Two players, call them row and column, each have a penny. They secretly place their pennies down on a table, covered by their hands: then, they reveal their pennies, if the pennies are both heads or both tails, then row wins; if they are different column wins. The winner gets the other player’s coin.

This is a simple example of a zero-sum game that requires mixed strategies. The best that each player can do is to select heads or tails with probability ${1/2}$. The expected payoff to each player is ${0}$: it is a “fair” game. As usual the game can be represented by a matrix:

$\displaystyle \left( \begin{array}{rr} 0.01 & - 0.01 \\ - 0.01 & 0.01 \end{array} \right)$

Let’s change the game to this:

$\displaystyle \left( \begin{array}{rr} 0.01 & - 0.01 \\ - 0.01 & 0.01 \\ 10^{6} & - 10^{6} \\ - 10^{6} & 10^{6} \end{array} \right)$

This is also a fair game, but what strategy would you use? Column has no choice, except to continue to play each column with probability ${1/2}$. However, row now has many choices. If row plays any of the bottom two rows, there is a chance that he will win a million dollars, but there also is an equal chance that he will lose a million dollars.

If I had to play this game as row, I would restrict my play to the top two rows; thus, I would avoid any chance that I could lose a million dollars. Classic min-max strategy theory gives no reason for doing this, yet I claim that most of us would play to avoid the huge swings. Do you agree?

Let’s consider a slight variation of the last game:

$\displaystyle \left( \begin{array}{rr} 0.01- \epsilon & - 0.01 \\ - 0.01 & 0.01- \epsilon \\ 10^{6} & - 10^{6} \\ - 10^{6} & 10^{6} \end{array} \right)$

where ${\epsilon}$ is a small positive amount. Then, min-max theory predicts that the game is still fair, but now there is one unique strategy for row. Row must play each of the bottom two rows with equal probability ${1/2}$.

I would not follow the min-max theorem–with due respect to von Neumann. I would still play only the top two rows. I would be happy to play a slightly losing game, rather than play a scary fair game. Obviously, I want to avoid huge potential swings, I would be very unhappy if I lost a million dollars. The key is that the min-max theorem only considers the expected payoff, and not the variance of the payoff.

In order to examine how to take variance into account, I need to explain some results about games.

Small Support Principle

Suppose that ${M}$ is an ${n}$ by ${n}$ matrix that represents a zero-sum game. If ${n}$ is huge, then even writing down a mixed strategy is impossible. Imagine a game with ${10^{100}}$ strategies; clearly, there is no way to even consider writing down any mixed strategy.

If ${v}$ is a strategy vector, then define the support of the vector as the number of ${i}$ so that ${v_{i} > 0.}$ We would like to have strategies with small support, but this cannot be done in general, since there are games that require a player to play all the strategies with a positive probability. However, if a player is willing to give up a small amount of the expected payoff, then there are always strategies that have small support. This is a theorem of Neal Young and myself, that was also proved independently by Ingo Althöfer. Let’s call this the Small Support Principle (SSP):

Theorem: Let ${M}$ be a ${n}$ by ${n}$ matrix with entries in the interval ${[-1,1]}$. Also let ${\epsilon>0}$. Then, there is a strategy for row (column) that has support at most

$\displaystyle O\left({\frac{\ln n}{\epsilon^{2}}}\right)$

and row (column) get the value of the game within an ${\epsilon}$ additive error.

The reason Neal and I proved this theorem had nothing to do with game theory. Nothing. We were working on a problem in pure complexity theory, and needed to be able to write down strategies. But, that was impossible, since they were exponentially large. I remember after discussing the complexity problem with Neal for a few days, he came in the following day with essentially the above theorem all worked out. I was very impressed: Neal can be quiet, yet is a very strong researcher.

SSP for Non-Zero Sum Games

The work with Neal on zero-sum games was done while I was at Princeton. When I got to Georgia Tech in 2000, computational game theory had become a “hot” topic–studied for its own sake. One day I suggested to two graduate students, Evangelos Markakis and Aranyak Mehta, that we try to see if non-zero sum games also had an SSP. As it turns out the SSP holds for Nash equilibrium, as it does for zero-sum games. The proof is really close to the proof of the zero-sum case; however, initially, we were stuck. Somehow, we could not prove the theorem, and we almost gave up trying. Luckily we did not stop, since the result is quite nice.

Theorem: Consider any two player non-zero sum ${n}$ by ${n}$ game with payoffs in ${[0,1]}$ and let ${\epsilon>0}$. Then, for any Nash equilibrium, there are strategies for the players with support at most

$\displaystyle O\left({\frac{\ln n}{\epsilon^{2}}}\right)$

so that

1. the strategies are ${\epsilon}$-equilibriums;
2. the row player gets the same payoff as the in the Nash equilibrium within additive error ${\epsilon}$;
3. the column player gets the same payoff as the in the Nash equilibrium within additive error ${\epsilon}$.

I will not define the machinery of non-zero sum games, see our paper for the technical details.

If you are completely new to non-zero sum games, here is a short summary: The min-max theorem is no longer is true for non-zero sum games. However, John Nash proved that there always are mixed strategies for row and column, with the no defection property: row will never want to change his strategy provided column does not change, and column will never want to change his strategy provided row does not change. This “I am happy, if you are” is the reason these pairs of strategies are called an equilibrium. The surprise is that such strategies always exist–Nash won the Nobel Prize for his definition of equilibrium and the proof of its existence.

This theorem has an immediate corollary about computing approximate Nash equilibrium:

Corollary: There is a quasi-polynomial time algorithm that finds an ${\epsilon}$-Nash equilibrium, for any non-zero sum game.

Here quasi-polynomial time is time bounded by ${n^{c\ln n}}$ for some constant ${c}$ that can depend on ${\epsilon}$.

I have stated the last two results for two players, but everything holds for any number of players. The above corollary is a mixed blessing–in my opinion. On the one hand, it is neat that finding approximate Nash equilibrium is “almost” computable in polynomial time; on the other hand, it means that proving approximate Nash equilibrium is NP-hard would have surprising consequences. I am sure some researchers wish this corollary was not true, but do not blame us.

SSP and Variance

Adam also suggested that we think about proving an SSP for zero-sum games that included both expectation of the payoff, and the variance. He even worked out a sketch of the argument. Immediately, Vishnoi started to work on the details, and soon we had a team of us working on the problem. The final paper is the work of Chakrabarty, Khot, Vishnoi, and myself: Adam dropped out for a variety of practical reasons, but we owe him many thanks for suggesting this idea. Why didn’t we think of trying to see if the SSP worked on variance too? Sometimes one is too close to the forest to see the trees. It may be a cliche, but it is often true.

The main theorem that is proved in our paper is:

Theorem: Given bounds

$\displaystyle \Pi_1,\dots,\Pi_m$

on the payoffs and

$\displaystyle V_1,\dots,V_m$

on the variance, if there is a Nash equilibrium satisfying these bounds, then in ${n^{O\left(\frac{\ln m}{\epsilon^2}\right)}}$ time we can find an ${\epsilon}$-Nash equilibrium such that for every player ${i}$ the payoff is at least ${\Pi_i - \epsilon}$ and the variance is at most ${V_i + \epsilon}$.

This theorem is at least a partial answer to the question of variance. The theorem can be used to get, in quasi-polynomial time, approximate good strategies for the players.

Open Problems

We still do not understand zero-sum games with respect to variance. Given a zero sum game, for example, how hard is it to determine if there is a strategy for row that guarantees an expected payoff of ${\Pi}$ and a variance of at most ${V}$? Is this in polynomial-time, or is it NP-hard? This is a major open problem.

I have no idea if this problem is easy or hard. An obvious approach is to try to encode the problem as an optimization problem. This could work, since zero-sum games are reducible to Linear Programming. However, adding a constraint on the variance does not seem to fit into linear constraints–at least not in an obvious manner.

The next open problem is to try to obtain good approximation algorithms for this problem. We have, of course, a quasi-polynomial time approximation algorithm, but that is unlikely to be the final word. Also can one prove lower bounds on how well the problem can be approximated?

More generally, how does adding constraints on variance to any game-economic problem change things? What happens for other types of games? For example, what happens when we repeatedly play a game? Can we add variance as an additional constraint?

There seem to be plenty of interesting open questions that arise once we start to include variance. I hope you can solve some of these in the future.

June 6, 2009 9:41 am

Of course the way that economists deal with the variance issue is to work with utilities instead of dollars, and then postulate that participants want to maximize expected utility. This actually seems like a reasonable assumption.

You don’t state what your motivation is, though. The disadvantage of working with utilities is that you might not know what the utility functions are. There is lots of research on this so you might have a very good guess, but you are still pulling it out of the air. On the other hand, your variance constraint seems like it must be pulled out of the air as well, so I do not see a particular advantage.

Do you have motivating applications in mind? Perhaps when the players are corporations instead of individuals, the utility functions are more difficult to figure out, while variance constraints are more natural? A problem with variance constraints is that they are hard, which I think is always unnatural except when they are imposed legally. A more satisfying answer would relate variance to certain kinds of utility functions, perhaps showing that you can efficiently construct utility functions with certain properties, including that the game play depends smoothly on the utility functions (so small errors don’t matter).

June 6, 2009 11:19 am

Great to see such an interesting introduction to what looks like a great sequence of work. Makes me think that we all should instead of writing compressed abstracts write two papers: the long one and the blog entry.

3. June 6, 2009 11:28 am

LOL
I am delighted (and dumfounded) to observe how mathematicians are clueless about practical matters even in their own domain.
The min-max isn’t wrong in selecting a “scary game” the risk was NOT PART of the original problem definition and what you are suggesting is about a DIFFERENT problem, with the human risk aversion somewhat included in the balance.
This reminds me of Lakatos’ Proofs and refutations which purports to be about mathematical discovery whereas it is really about going from murky ill defined words (polyhedra, polytope) to actual formalised definitions, not by “discovery” but by arbitrary refinements of the definitions.
Mathematicians are definitely blind to the most critical part of their practice which is more about language and metamathematics than about math technicalities proper.

I refrained from commenting on your previous post but can’t stand it anymore and I must say that it was yet another blatant case of “getting high” on the technicalities while forgetting what the supposed subject matter is really about: The Junta Problem IS NOT an Artificial Intelligence problem!
It is not an AI problem because, even if properly solved, it will be the researcher which would have solved the problem, NOT THE COMPUTER.
This will only be yet another trick in the infamous “bag of tricks” which pass for AI.
How often in real life do you look for the ins/outs of a black-box function?
(hint: only AFTER you framed the “problem” as function learning)

So please math junkies get high as much as you want on “nice problems” but DON’T muddy the waters about what Artifical Intelligence is about, some of your colleagues already do a fair job of that .

June 6, 2009 12:57 pm

kevembuangga:
Any theory cannot hope to be perfect for practical use, but researchers, be it mathematicians or others, strive to make it as close to practice as they can. As you can see, the original min-max theorem, beautiful and wonderful as it is, still fails to capture the practical notion of risk in a real life situation.

Lipton is suggesting an alternative, a more practical one here, one which takes risk into account. I don’t see how he can be accused of being “clueless” of practical matters. von Neumann’s min max theorem itself, made it much easier for mathematicians to comprehend 2-player games. Moreover, it proved to be a stepping stone to many extensions as well, proving to be crucial in the development of game theory as a research area. There might be other math areas which are less practical, but I would have to disagree with you if it is about game theory. Game theory has its applications in economics, business, social behavior to name a few.

About the comment on Junta’s : If you read the post carefully, you would notice that nowhere in the post is it claimed that Junta problem is an AI problem.

June 6, 2009 11:42 am

There is another way that economists explain the “huge swings” game, using something called utility functions. We suppose that players do not try to maximize money directly, but to maximize the utility derived from money. We imagine that associated with each person is a utility function, money on the nonnegative x-axis and utility on the nonnegative y-axis. There are no units for utility, and we cannot assign exact values to it. What matters is that the function is concave; its derivative is shrinking with larger values of x. This is reasonable; a broke person derives greater marginal utility from finding a $100 bill than does a millionaire. Here is a related (1-player) game: you are told you can either walk away now with$1 million (strategy 1), or you can choose to open one of two boxes, one of which contains $2 million and the other of which is empty (strategy 2). Of course strategy 1 is better, even though is has payoff equal to the expected payoff of strategy 2. So why do you strongly prefer to take the certain$1 million instead?

Let f : R -> happiness be your utility function. Your utility for strategy 1 is f(1000000). Your expected utility for strategy 2 is (f(0) + f(2000000)) / 2. Let X be the random variable 0 with probability 1/2 and 2000000 with probability 1/2. Since the payoff function is concave, by Jensen’s inequality, we have f(E(X)) >= E(f(X)). Note that the left hand side is the utility with strategy 1 and the right-hand side is the expected utility with strategy 2. If the function is “very concave”, then the inequality is “very strict”; i.e., your utility will be much higher for strategy 1. More generally, for any two strategies with equal expected monetary payoff, the strictly better strategy (in terms of utility) is the one with lower variance in monetary payoff.

Economists can use similar reasoning to show, for instance, that a corporation teetering on the edge of bankruptcy should *not* be given a loan, since they will be more reckless with the money, using it for higher-variance strategies, since bankruptcy laws and incorporation laws protecting shareholders from owing the debts owed by their corporations, implies that near the origin, corporations have a *convex* utility function. For negative x, f(x) = 0, but f(x) looks more like a positive slope line just to the right of the origin. This is a formalization of the idea that someone with nothing left to lose will take big risks, which is undesirable to someone loaning the money; they still have lots left to lose.

June 6, 2009 4:20 pm

Lipton is not wrong here. Classic game theory does not include risk (it is pure expectation based) and you can’t use utility functions to wish your way out of it. This is because adding a utility function will often kill the zero-sum property.

For example. Suppose both you are I are risk adverse. So we each value losing $100,000 the same as gaining$1,000,000 (a not unreasonable utility). The game where we match coins for a pay off of $1,000,000 (if you guess if I put a coin to heads or tails I pay you$1,000,000 else you pay me) is zero-sum in dollars but not in utility. In all cases the game has negative net-utility.

June 6, 2009 10:41 pm

Is the bound in your Theorem with Young tight?

June 7, 2009 7:56 am

Good question, I believe so. But I do not recall working that out…nice idea

June 8, 2009 5:45 pm

I think so too, and if I recall correctly, the example given by Freund/Schapire in “Adaptive Game Playing Using Multiplicative Weights” to show the MW strategy can be tight really only uses the fact that the strategies are sparse.

• June 9, 2009 2:46 pm

Yes, it’s known to be tight with some caveats.

See “On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms” by Phil Klein and Young (1999), and The Freund/Schapire paper mentioned above.

Roughly, the Klein/Young paper says that if you take a random n x n^2 0/1-matrix, any ε-optimal mixed strategy for the row player has to have support of size at least Omega(log(n) / ε^2). The caveat is that the lower bound only holds up to support of size O(n^(1-δ)) for any constant δ>0.

I think the Freund/Schapire lower bound is similar but a bit different in that it lower bounds the loss of an adaptive game player (but it also uses a random matrix).

Neal

June 10, 2009 9:34 am

I am not sure I understand what is the caveat in the Klein/Young paper… So are there n X n matrices that require support \log n / \epsilon^2 to get a row strategy within \epsilon of optimal?

• June 10, 2009 2:45 pm

p.s. The caveat is just that ε has to be at least Ω(1/n^(1/2-δ) for some constant δ>0.

Neal

• June 10, 2009 2:46 pm

Also, note that the result is shown not for n x n matrices, but for n x n^2 matrices.

June 10, 2009 4:53 pm

And can’t you just add n^2-n rows of 0 and thus get an n^2 X n^2 matrix? The log kills the exponent in any case.

June 6, 2009 11:41 pm

jmount, after adding in the variance constraint the game is also no longer zero-sum, in the sense that you can no longer apply theorems that only apply to zero-sum games.

So we have two ways of modeling risk-aversion, either with an unnatural and nonstandard hard variance constraint introduced by computer scientists, or with a standard utility function approach introduced by economists. When it comes to modeling game-theory problems, I trust the economists more than computer scientists. Even as a mathematician, I find this variance constraint ugly and unnatural. If it led to beautiful theorems that could not be proved in a more general setting, then I might my mind, but I am not seeing that here yet.

June 7, 2009 8:05 am

The point of the post was two fold. One to raise the variance question generally. You can solve it any way that you like. Second to highlight the SSP which seems to be quite powerful and useful.

On variance. I do not see that utility alone “solves” the variance issue. Even scaled properly I just do not see that makes variance go away.

8. June 7, 2009 3:41 am

I had not seen the Small Support Principle before but its form reminds me of the Johnson-Lindestrauss Lemma (except the errors are multiplicative rather than additive). Is my pattern matching apparatus too sensitive or is there a connection there that is deeper than a notion of sparsity and the common O(ln m / ε^2) bound?

June 7, 2009 8:02 am

Interesting connection. I have to think about this idea. I do not see any direct connection, but I like the connection.

June 7, 2009 8:33 am

Interesting post! The SS Principle also appears (though not explicitly as a theorem) in Freund and Schapire’s paper “Game theory, online prediction and boosting” (COLT ’96). The proof uses the weighted majority algorithm of Littlestone and Warmuth. They show how to compute a d(T)-approximate maximin strategy that has support at most T. Further, d(T) is O(log n/T), so the SS Principle follows.

June 7, 2009 8:36 am

Bug in my definition of d(T): I should have said “Further d(T) is O( \sqrt{ \log n/T }), so the SS Principle follows.”

11. fx15 lida yılan yağı karınca yumurtası xacc permalink
June 7, 2009 1:15 pm

Even scaled properly I just do not see that makes variance go away

June 8, 2009 6:14 am

I do not understand the claim by some commenters that economists are only interested in maximizing utility (rather than reducing variance).
For instance, derivatives trading is supposed to be a way of managing risk.

13. June 8, 2009 10:30 am

Clearly low variance has a value, and a strategy with low variance will often be quite rationally favoured over one with high variance. The utility function approach allows this to be captured, and behavioural economics studies this in detail.

My question is: what can be done if the variance is infinite or not defined? The cumulative payoff after repeated plays may then tend to a stable but non-Gaussian distribution. I think some notion other than the variance might be better in this case — one would intuitively still prefer a fixed payoff, but there might be some kind of hierarchy.