A great conjecture too

 Alternate photo by Quanta

Thomas Royen is a retired professor of statistics in Schwalbach am Taunus near Frankfurt, Germany. In July 2014 he had a one-minute insight about how to prove the famous Gaussian correlation inequality (GCI) conjecture. It took one day for him to draft a full proof of the conjecture. It has taken several years for the proof to be accepted and brought to full light.

Today Ken and I hail his achievement and discuss some of its history and context.

Royen posted his paper in August 2014 with the title, “A simple proof of the Gaussian correlation conjecture extended to multivariate gamma distributions.” He not only proved the conjecture, he recognized and proved a generalization. The “simple” means that the tools needed to solve it had been available for decades. So why did it elude some of the best mathematicians for those decades? One reason may have been that the conjecture spans geometry, probability theory, and statistics, so there were diverse ways to approach it. A conjecture that can be viewed in so many ways is perhaps all the more difficult to solve.

Even more fun is that Royen proved the conjecture after he was retired and had the key insight while brushing his teeth—as told here. Ken recalls one great bathroom insight not in his research but in chess: In the endgame stage of the famous 1999 Kasparov Versus the World match, which became a collaborative research activity later described by Michael Nielsen in his book, Reinventing Discovery, Ken had a key idea while in the shower. His idea, branching out from the game at 58…Qf5 59. Kh6 Qe6, was the Zugzwang maneuver 60. Qg1+ Kb2 61. Qf2+ Kb1 62. Qd4!, which remains the only way for White to win.

## The Germ

Although solutions often come in a flash, the ideas they resolve often germinate from partial statements whose history takes effort to trace. One thing we can say is that the GCI does not originate with Carl Gauss, nor should it be considered named for him. A Gaussian measure on ${\mathbb{R}^n}$ (centered on the origin) is defined by having the probability density

$\displaystyle \frac{1}{\sqrt{(2\pi)^n |\det{R}|}} \exp(-\frac{1}{2}x^T R^{-1} x),$

where ${R}$ is a non-singular covariance matrix and ${x^T}$ just means the transpose of ${x}$. Its projection onto any component ${x_i}$ is a usual one-variable normal distribution.

Suppose ${a_i,b_i}$ is a 90% confidence interval for ${x_i}$ and ${a_j,b_j}$ a 90% confidence interval for another variable ${x_j}$. What is the probability ${p}$ that both variables fall into their intervals? If they are independent, then it is ${0.81}$.

What if they are not independent? If they are positively correlated, then we may expect it to be higher. If they are inversely related, well…let’s also suppose the variables have mean ${0}$ and the intervals are symmetric around ${0}$: ${a_i = -b_i}$, ${a_j = -b_j}$. Do we still get ${p \geq 0.81}$? This—extended to any subset of the variables with any smattering of correlations and to other shapes besides the products of intervals—is the essence of the conjecture.

## Who Dunnett?

Charles Dunnett and Milton Sobel considered some special cases, such as when ${R}$ is an outer product ${r_{i,j} = b_i b_j}$ for some vector ${(b_1,\dots,b_n)}$, which makes it positive definite. Their 1955 paper is considered by some to be the source of GCI.

But it was Olive Dunn who first posed the above general terms in a series of papers that have had other enduring influence. The first paper in 1958 and the second in 1959 bore the like-as-lentils titles:

1. Estimation of the Means of Dependent Variables.

2. Estimation of the Medians for Dependent Variables.

These seem to have generated confusion. The former is longer and frames the confidence-interval problem and is the only one to cite Dunnett-Sobel, but it does not mention a “conjecture.” The latter does discuss at the end exactly the conjecture of extending a case she had proved for ${n=2}$ to arbitrary ${n}$, but relates a reader’s counterexample. Natalie Wolchover ascribed the 1959 paper in her article linked above, but Wikipedia and other sources reference the 1958 paper, while subsequent literature we’ve seen has instances of citing either—and never both.

Dunn became a fellow of the American Statistical Association, a fellow of the American Association for the Advancement of Science (AAAS), and a fellow of the American Public Health Association. In 1974, she was honored as the annual UCLA Woman of Science, awarded to “an outstanding woman who has made significant contributions in the field of science.” Her third paper in this series, also 1959, was titled “Confidence intervals for the means of dependent normally distributed variables.” Her fourth, in 1961, is known for the still-definitive form of the Bonferroni correction for joint variables. But in our episode of “CSI: GCI” it seems we must look later to find who framed the conjecture as we know it.

Sobel came back to the scene as part of a 1972 six-author paper, “Inequalities on the probability Content of Convex Regions for Elliptically Contoured Distributions.” They considered integrals of the form

$\displaystyle P(R) = \frac{1}{\sqrt{|\det(R)|}}\int_{-a_1}^{+a_1}\cdots\int_{-a_n}^{+a_n} f(x^T R^{-1} x),$

for general functions ${f(z)}$ besides ${e^{-z}}$ and for general positive definite ${R}$. GCI in this case then has the form ${P(R) \geq P(I)}$ where ${I}$ is the identity matrix. They call ${f}$ elliptically contoured provided ${\int_0^{\infty} z^{n-1} f(z^2) dz}$ is finite. Writing about the history, they say (we have changed a few symbols and the citation style):

Inequalities for ${P(R)}$ perhaps originate with special results of Dunnett and Sobel (1955) and of Dunn (1958), in which it is shown that ${P(R) \geq P(I)}$ for special forms of ${R}$ (with ${r_{ii} = 1}$) or for special values of ${n}$.

They mention also an inequality by David Slepian and what they termed “the most general result for the normal distribution” by Zbynek Šidák, still with special conditions on ${R}$. Their main result is “an extension of Šidák’s result to general elliptically contoured densities [plus] a stronger version dealing with a convex symmetric set.” This is where the relaxation from products of confidence intervals took hold. At last, after their main proof in section 2 and discussion in section 3, we find the magic word “conjecture”:

This suggests the conjecture: if ${x = (x_1,x_2)}$ is a random vector (with ${x_1}$ of dimension ${p}$ and ${x_2}$ of dimension ${q}$) having density ${|\det R|^{-1/2} f(x^T R^{-1} x)}$ and if ${C_1 \subseteq \mathbb{R}^p}$ and ${C_2 \subseteq \mathbb{R}^q}$ are convex symmetric sets, then

$\displaystyle P_R\{x_1 \in C_1,\;x_2 \in C_2\} \geq P_{\bar{R}}\{x_1 \in C_1,\;x_2 \in C_2\},$

where

$\displaystyle R = \begin{bmatrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{bmatrix},\qquad \bar{R} = \begin{bmatrix} R_{11} & 0 \\ 0 & R_{22} \end{bmatrix}.$

Clearly by iteration this implies the inequality with regard to ${\Pr_I}$. Here symmetric means just that ${-x}$ belongs whenever ${x}$ belongs. Any symmetric convex set can be decomposed into strips of the form ${\{x: |x^T u| \leq c\}}$ for fixed ${u \in \mathbb{R}^n}$ and ${c > 0}$, which their generality set them up to handle, and proving the inequality for strips suffices. This is considered the modern statement of GCI. The rest of their paper—over half of it—treats attempts to prove it and counterexamples to some further extensions.

Finally in 1977, Loren Pitt proved the case ${n = 2}$, referencing the 1972 paper and Šidák but not Dunnett-Sobel or Dunn. Wolchover interviewed Pitt for her article, and this extract is revealing:

Pitt had been trying since 1973, when he first heard about [it]. “Being an arrogant young mathematician … I was shocked that grown men who were putting themselves off as respectable math and science people didn’t know the answer to this,” he said. He locked himself in his motel room and was sure he would prove or disprove the conjecture before coming out. “Fifty years or so later I still didn’t know the answer,” he said.

So as for framing GCI, whodunit? Royen ascribes it to the 1972 paper which is probably what popularized it to Pitt, but Dunn’s orthogonal-intervals formulation spurred the intervening work, accommodates extensions noted as equivalent to GCI by Royen citing this 1998 paper, and still didn’t get solved until Royen. So we find these two sources equally “guilty.”

## The Conjecture and Solution

The 1972 form of GCI has a neatly compact statement and visualization:

For any symmetric convex sets ${K,L}$ in ${\mathbb{R}^{n}}$ and any Gaussian measure ${\mu}$ on ${\mathbb{R}^{n}}$ centered at the origin,

$\displaystyle \mu(K \cap L) \ge \mu(K)\mu(L).$

That is, imagine overlapping shapes symmetric about the origin in some Euclidean space. Throw darts that land with a Gaussian distribution around the origin. The claim is that the probability that the a dart will land on both shapes is at least the probability that it will land in one shape times the probability that it lands in the other shape.

 UK Daily Mail source

George Lowther, in his blog “Almost Sure,” has an interesting post about early attempts to solve GCI. He notes the following partial results from the above-mentioned 1998 paper:

1. There is a positive constant ${c}$ such that the conjecture is true whenever the two sets are contained in the Euclidean ball of radius ${c\sqrt{n}}$.

2. If, for every ${n}$, the conjecture is true whenever the sets are contained in the ball of radius ${\sqrt{n}}$, then it is true in general.

The first statement proves GCI in a “shrunken” sense, while the second makes that seem tantamount to solving the whole thing. Lowther explained, however:

Unfortunately, the constant in the first statement is ${{c=1/(2\sqrt{e})}}$, which is strictly less than one, so the second statement cannot be applied. Furthermore, it does not appear that the proof can be improved to increase ${{c}}$ to one. Alternatively, we could try improving the second statement to only require the sets to be contained in the ball of radius ${{c\sqrt{n}}}$ for some ${{c<1}}$ but, again, it does not seem that the proof can be extended in this way.

Royen did not use this idea—indeed, Wolchover quotes Pitt as saying, “what Royen did was kind of diametrically opposed to what I had in mind.” Instead she explains how Royen used a kind of smoothing between the original ${R}$ matrix and ${\bar{R}}$ (with off-diagonal entries zeroed out as above) as a quantity ${C}$ varies from ${1}$ to ${0}$, taking derivatives with respect to ${C}$. For this he had tools involving transforms and other tricks at hand:

“He had formulas that enabled him to pull off his magic,” Pitt said. “And I didn’t have the formulas.”

Royen’s short paper does need the background of these tricks to follow, and the fact that the same tricks enabled a further generalization of GCI makes it harder. The proof was made more self-contained in this 2015 paper by Rafał Latała and Dariusz Matlak (final version) and in a 2016 project by Tianyu Zhou and Shuyang Shen at the University of Toronto, both focusing just on GCI and cases closest to Dunn’s papers. Rather than go into proof details here, we’ll say more about the wider context.

## Why GCI Is Important

Independent events are usually the best type of events to work with. Recall if ${A}$ and ${B}$ are independent events then,

$\displaystyle P(A \cap B) \ge P(A)P(B).$

Of course actually more is true: ${P(A \cap B) = P(A)P(B)}$. But we focus on the inequality, since it can hold when ${A}$ and ${B}$ are not independent. In general without some assumption on the events ${A}$ and ${B}$ the above inequality is not true: Consider ${A}$ the event fair coin is heads and ${B}$ that it is tails. Then ${P(A \cap B) \ge P(A)P(B)}$ becomes ${0 \ge 1/2 \cdot 1/2}$.

Since independence is not always true for two events ${A,B}$ it is of great value to know when ${P(A \cap B) \ge P(A)P(B)}$ is still true. Even an approximation is of great value. Note, a simple case where it still is true is when ${A=B}$, then the inequality is trivial, ${P(A) \ge P(A)^{2}}$.

GCI reminds us of another inequality that intuitively cuts very fine and was difficult to prove: the FKG inequality. Ron Graham wrote a survey of FKG that begins with a discussion of Chebyshev’s sum inequality, named after the famous Pafnuty Chebyshev.

Chebychev’s sum inequality states that if

$\displaystyle a_{1} \ge a_{2} \ge \cdots \ge a_{n}$

and

$\displaystyle b_{1} \ge b_{2} \ge \cdots \ge b_{n},$

then

$\displaystyle \frac{1}{n}\sum_{k=1}^{n}a_{k} \cdot b_{k} \ge \left( \frac{1}{n}\sum_{k=1}^{n} a_{k} \right) \cdot \left( \frac{1}{n}\sum_{k=1}^{n} b_{k} \right).$

Wikipedia’s FKG article says how the relevance expands to other inequalities:

Informally, [FKG] says that in many random systems, increasing events are positively correlated, while an increasing and a decreasing event are negatively correlated.

An earlier version, for the special case of i.i.d. variables, … is due to Theodore Edward Harris (1960) … One generalization of the FKG inequality is the Holley inequality (1974) below, and an even further generalization is the Ahlswede-Daykin “four functions” theorem (1978). Furthermore, it has the same conclusion as the Griffiths inequalities, but the hypotheses are different.

We wonder whether the new results on GCI will spur an over-arching appreciation of all these inequalities involving correlated variables. We also wonder if in the complex case there is any connection between Royen’s smoothing technique and the process of purifying a mixed quantum state.

## Open Problems

The amazing personal fact is that a retired mathematician solved the problem and did it with a relatively simple proof. What does this say about our core conjectures in theory? I am near retirement from Georgia Tech—does that mean I will solve some major open problem? Hmmmmmmm.

Also, which of you have had key insights come in the bathroom?

[nonsingular R–>positive definite R, other tweaks]

1. May 1, 2017 1:48 am

You’re risking your reputation, but you think it’s worth it.
“All right, he was a good mathematician, but at the end of his life he lost his marbles.”

https://www.quantamagazine.org/20160303-michael-atiyahs-mathematical-dreams/

• May 1, 2017 9:26 am

Thank you very much for bringing that great article to our attention.

2. May 1, 2017 12:14 pm

I would still ask you what is really essential in quantum computing? Superposition of states? You have it in any linear system. Operation on a sum gives a sum of operations. Just use 2 orthogonal directions for each bit. Sum of these vectors is a superpositiion of 2 states. Sum of all vectors for all bits is a superposition of all possible binary values. Measurement is an application of linear operator, say, projection. The result of its aplication to the sum above is the total number of solutions. The only problem is that a universal classical gate is not possible to implement as a linear operator on the original space. So the design of the universal gate is what makes quantum computer different. The main question is what os entanglement, which is connecting the result of universal gate application to the original set of vectors. Once it is technically solved parallel computation on a same peice of hardware would be possible without quantum computers.

3. May 1, 2017 12:19 pm

Oh sorry. This was a project for retirement.

May 3, 2017 11:19 am

you think P=NP will be proved this easily?

• May 10, 2017 11:44 pm

The P=NP is about parallel computation on the same piece of hardware at the same time. If P=NP the proof should be easy! We just do not have an intuition. All linear operators that are coming in mind are loosing the identity of operand. That is what entanglement is supposed to overcome. So the question is to construct one or show that it is impossible. One candidate for the representation is a linear combination of vectors with some kind of vector product that keep the identity of operand after aplication.

• May 10, 2017 11:53 pm

The complicated way is to show that it is possible to break the space into polynomial number of pieces where it takes polynomial time to find satisfability. This approach is taken in general numerical slieve, except that it is not polynomial.