The Erdős discrepancy problem and more

 source — in memoriam

Raymond Redheffer was an American mathematician. He worked for his PhD under Norman Levinson, who was famous for his work on the Riemann Hypothesis. Redheffer spent his whole career at UCLA and was a great teacher—see this for more details. Redheffer wrote the mathematical content for the IBM-sponsored 50-foot poster “Men of Modern Mathematics,” which IBM last year released as a free I-Pad app.

Today Ken and I wish to talk about a problem that Gil Kalai spoke last month at Tech, and suggest an approach to solving the problem that draws on a matrix studied by Redheffer.

Unfortunately I was not able to hear Gil’s talk since I was at the RSA annual conference on security—see here. But the talk featured an old problem of Paul Erdős that is one of those terrible problems: easy to state, easy to think about, impossible to make progress on.

I have two things to say about this classic problem. First, it has a nice statement as a question about matrices. Second, it may be related to the Riemann Hypothesis. Erdős was known for offering monetary rewards up to $1,000 for problems he liked, and EDP itself carried a reward of$500. The Riemann Hypothesis, however, is one of the million-dollar Clay Millennium Prize Problems.

## The Problem

The Erdős Discrepancy Problem (EDP) is ultimately about how strongly the class of arithmetical progressions suffices as a lens for viewing random behaviors in infinite sequences. Let ${x}$ be an infinite sequence ${x_{1},x_{2},\dots}$ where each ${x_{i}}$ is ${\pm 1}$. The EDP asks whether, for any constant ${C}$, we can always find ${n}$ and ${d}$ such that

$\displaystyle \left| \sum_{i=1}^{n} x_{id} \right| > C.$

Of course this is true for any “random” sequence with ${d=1}$: the sums up to ${n}$ will often be ${\pm}$ the standard deviation, which is of order ${\sqrt{n}}$, and some sums will go a slight bit higher. We can of course build a non-random sequence in which these sums stay bounded, such as

$\displaystyle +1,-1,+1,-1,+1,-1, \dots$,

but then taking ${d=2}$ creates huge discrepancies. To falsify the conjecture, one would have to create an ingenious ${\pm 1}$ sequence in which all the sums stay bounded for all ${d}$.

There do exist sequences that keep all the sums lower than what happens for random sequences. An elementary way is to define

$\displaystyle x_i = (-1)^{b_3(i)}$

where ${b_3(i)}$ is the lowest-order non-zero digit of ${i}$ written in base 3. This sequence is multiplicative, meaning ${x_{id} = x_i x_d}$. The EDP is open even for multiplicative sequences, for which only the case ${d=1}$ matters. For this multiplicative sequence, the sums ${\sum_{i=1}^n x_i}$ stay within ${\pm (1 + \log_3 n)}$. For a proof of this and much more information about the EDP, see this original post by Timothy Gowers, a more-recent post by Gil, and this Polymath page.

## A Linear Algebra Approach

We offer another way to view the famous EDP, and wonder how widely it is known. Define the ${n \times n }$ 0-1 matrix ${A_{n}}$ by setting the ${(i,j)}$ entry to ${1}$ only if ${i}$ divides ${j}$. Here is ${A_{6}}$:

$\displaystyle \left( \begin{array}{ccccccc} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array} \right)$

Then we have the following observation:

Theorem 1 The EDP is true—i.e., for every ${C>0}$ and infinite vector ${x}$ of ${\pm 1}$ there are ${d}$ and ${m}$ so that

$\displaystyle \left| \sum_{i=1}^{m} x_{id} \right| > C,$

(where we call the sum ${s_{d,m}}$)—if and only if for every such ${C}$ and ${x}$, there is an ${n}$ so that

$\displaystyle ||A_{n}x^{(n)}||_{\infty} > C.$

Here ${x^{(n)}}$ is the vector of ${x_{1},\dots,x_{n}}$.

The key to the proof of this is quite simple. Note that the ${d^{th}}$ row of ${A_{n}x}$ is

$\displaystyle \sum_{id \le n} x_{id}.$

The sup-norm, ${||\dots||_{\infty}}$, is the maximum over all rows ${d}$.

The proof itself requires just some care with bounds and indexing. For the forward direction, given ${d}$ and ${m}$ we take ${n = md}$. Then the ${d}$th entry of the vector ${A_n x^{(n)}}$ is exactly the value ${s_{d,m}}$, no more and no less. Hence its sup-norm is at least ${C}$. In the other direction, given ${n}$ such that ${y = A_n x^{(n)}}$ has ${||y||_\infty > C}$, let ${d}$ be the entry that achieves the sup-norm, i.e., ${|y_d| > C}$. Then for ${m = \lfloor n/d \rfloor}$ we have that ${\sum_{i=1}^m x_{id}}$ includes the same terms, so ${|s_{d,m}| > C}$. This ends the proof.

Hence the norm assertion is exactly the same as the EDP. What Ken and I find interesting is that the matrix ${A_{n}}$ is upper triangular and has a number of interesting properties. The question is, can these properties be used to make progress? Let’s discuss this next.

## Some Ideas

The above connection may be well known, but it is new to me. In any event it suggests a number of possible approaches to proving EDP. Linear algebra has solved many problems in the past that did not immediately seem to be about linear algebra. This is the hope—that we may be able to exploit linear algebra to solve the EDP. Or at least make some small progress.

The matrix ${A_{n}}$ has several properties that make it attractive. The matrix is almost recursive in structure. Let ${n}$ be even, and consider the four principal ${n/2 \times n/2}$ sub-matrices. The upper-left one is ${A_{n/2}}$. At lower left we have the all-zero matrix, and at lower right we have the identity matrix. This leaves the matrix at upper right, call it ${B_n}$. That is, we have:

$\displaystyle A_{n} = \left( \begin{array}{cc} A_{n/2} & B_{n/2} \\ 0 & I_{n/2} \end{array} \right).$

The matrices ${B_{n/2}}$ record the way the integers in the “bottom half” ${[1,\dots,n/2]}$ divide integers in the top half. Especially for ${n}$ a power of 2, we might hope this has a quantifiable, regular pattern. Understanding this seems to be all we need to understand ${A_n}$ and thus make progress on EDP.

## EDP And The Riemann Hypothesis

Ray Redheffer seems to have understood this, because his matrix is the same as ${A_n}$ except for the first column. That is, ${R_{n}}$ is the ${0,1}$ matrix whose entries ${r_{ij}}$ are ${1}$ if ${i}$ divides ${j}$ or if ${j = 1}$; otherwise, the entry is ${0}$.

$\displaystyle \left( \begin{array}{cccccccccccccc} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \mathbf{1} & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \mathbf{1} & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ \mathbf{1} & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \mathbf{1} & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \mathbf{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array} \right)$

Actually what Redheffer did, in his source paper published in 1977, was fill the first column with variables ${z_i}$ ranging over complex numbers in the unit disc. He showed that the problem of maximizing the norm of the determinant of the resulting matrix is equivalent to maximizing over ${z_i}$ taking only the real values ${-1}$, ${0}$, or ${1}$, and the maximum is obtained by setting ${z_i}$ equal to the value of the Möbius function ${\mu(i)}$. That is:

1. ${\mu(i) = 1}$ if ${i}$ is square-free and has an even number of prime factors;

2. ${\mu(i) = -1}$ if ${i}$ is square-free and has an odd number of prime factors;

3. ${\mu(i) = 0}$ if ${i}$ is divisible by a square number, i.e., has a repeated prime factor.

Two years ago, we covered the relationship between ${\mu}$ and the Mertens function

$\displaystyle M(n) = \sum_{k \le n} \mu(k).$

In particular, the determinant of the Redheffer matrix—with all ${z_i}$ set to 1—is exactly ${M(n)}$—see also this. Thus the question of the growth of ${M(n)}$ connects the matrix and the EDP to the prime number theorem and also the Riemann Hypothesis. The first connection, which is proved, is that ${M(n)}$ grows less than linearly: the prime number theorem is equivalent to ${M(n) = o(n)}$. The second, which is one of the great open problem in mathematics, is:

Does ${M(n)}$ grow as if it were a “random” sum, for instance as

$\displaystyle M(n) = O(n^{1/2 +\epsilon})$

for any ${\epsilon > 0}$?

This is equivalent to the Riemann Hypothesis. This connection is noted in terms of the related Liouville lambda function on the EDP PolyMath page at the end of this section. We wonder how far it can be widened.

## A Further Idea

Let ${R_{n}}$ be the ${n \times n}$ Redheffer matrix. We note that it has the following block structure:

$\displaystyle R_{n} = \left( \begin{array}{cc} R_{n/2} & B_{n/2} \\ C_{n/2} & I_{n/2} \end{array} \right).$

The difference from ${A_n}$ is that now ${C_{m}}$ is a the ${m \times m}$ matrix of all ${1}$‘s in the first column and zeroes everywhere else, ${m = n/2}$. In Redheffer’s more general case, ${C_{m}}$ has variables in the first column, but in any event it is a rank-1 matrix. Now let

$\displaystyle S_{n} = \left( \begin{array}{cc} R^{-1}_{n/2} & 0 \\ 0 & I_{n/2} \end{array} \right)$

and ${T_{n} = S_{n} R_{n}}$. Then ${T_{n}}$ is equal to

$\displaystyle T_n = \left( \begin{array}{cc} I_{n/2} & R^{-1}_{n/2}B_{n/2} \\ C_{n/2} & I_{n/2} \end{array} \right).$

We note that ${\det(T_{n}) = \det(S_{n})\det(R_{n})}$ which is equal to ${\det(R_{n/2})^{-1}\det(R_{n})}$. But by a standard block rule for determinants, we get that ${\det(T_{n})}$ is equal to the determinant of

$\displaystyle I_{n/2} - C_{n/2}R^{-1}_{n/2}B_{n/2}.$

This is of the form ${I_{n/2} + W_{n/2}}$ where ${W_{n/2}}$ is a rank-one matrix.

Putting this all together yields that

$\displaystyle \det(R_{n/2})\det(I_{n/2} +W) = \det(R_{n}).$

But this means that

$\displaystyle M(n) = M(n/2) \det(I +W_{n/2}),$

provided ${M(n/2) \neq 0}$ and ${n}$ is even.

The key question is, can we use this to get a bound on the Mertens function, or to get some congruence results? The obvious idea that ${M(n/2)}$ divides ${M(n)}$ is false—the obstacle is that ${W_{n/2}}$ is not an integer matrix. Its denominators are due to ${R^{-1}_{n/2}}$ not being an integer matrix. But if it were an integer matrix, then we would get some non-trivial relationship. Does this lead anywhere?

## Open Problems

Does the linear algebra approach to EDP seem useful? Also the Redheffer matrix is “almost” what we need to understand to solve EDP. Does this suggest a connection between EDP and the Riemann Hypothesis? Could that explain the reason EDP is so hard?

March 25, 2013 4:45 pm

We used a linear algebra approach in this recent paper: http://www.tandfonline.com/doi/abs/10.1080/10586458.2013.746616 .

March 26, 2013 2:29 am

Typo: the multiplicative sequence example should be $x_i = (-1)^{b_3(i)-1}$.

• March 26, 2013 9:47 am

True for conformance to the source, but does it matter?

April 1, 2013 8:12 am

Without the -1 in the exponent the math doesn’t work to me: take x_{10}. The lowest order digit of 10 in base 3 is 1, so according to your definition x_{10} is (-1)^1 = -1. But still according to you definition x_{5} is (-1)^2 = 1 and x_{2} is (-1)^2 = 1.

March 26, 2013 8:21 am

What you suggest is actually a linear _programming_ approach, right? Maybe it would be more eye-catching if you called it like that. Is the stronger version of EDP where m and d must be a powers of 2 disproved?

4. March 26, 2013 3:27 pm

As the first impression your version of EDP seems like a question about (right) eigenvector of $A_{\infty}$ with largest eigenvalue. In particular, whether it can be partitioned (I mean classical NP-complete partition problem). If it cannot than EDP is true, otherwise it may be a bit more complicated. As usual I can be mistaken.

5. March 26, 2013 4:30 pm

other typo? in your 1st formula dont you mean $x_i d$ and not $x_{id}$

• March 27, 2013 11:30 am

Not a typo, it’s $x_{id}$

6. March 26, 2013 4:38 pm

also, bernard chazelle has a great online book on the discrepancy method, ~500p, 1999

7. March 27, 2013 11:26 am

This is an interesting post, I was not aware of the Redheffer matrix and the connection to the Mertens function.

Some notes:

Your observation (Thm 1) is very well known in combinatorial discrepancy theory. The strongest tools we have in giving discrepancy lower bounds do rely on linear algebra. For example an easy lower bound on discrepancy is to $\sigma_{\min}(A_n)$: $\|Ax\|_\infty \geq \frac{1}{\sqrt{n}} \|Ax\|_2 \geq \frac{1}{\sqrt{n}} \|x\|_2 \sigma_{\min}(A) = \sigma_{\min}(A)$. However, this method is hopeless for EDP, because it would also work against $x$ which are zero on a constant fraction of coordinates, and there exist such x that have discrepancy a constant. This starts to give an intuition why EDP is so hard!

On the other hand, a famous theorem of Klaus Roth asserts that the discrepancy for arbitrary arithmetic progressions (i.e. ones that do not need to start at 0) is $\Omega(n^{1/4})$. One proof (I think the original one, but there is a nice presentation both in Chazelle’s and Matousek’s books on discrepancy) does use the smallest singular value observation. However, when we have arbitrary APs, we can reduce the problem to evaluating the eigenvalues of circulant matrices, and that’s relatively easy with a bit of elementary Fourier analysis.

Another proof of Roth’s theorem is due to Lovasz, and it lower bounds the value of the natural SDP relaxation of discrepancy. The SDP relaxation is stronger than the singular value relaxation, and there is hope it might be strong enough to bypass the nasty issue of the existence of x that are 0 on a constant fraction of coordinates. I.e., unlike the singular value relaxation, it might have super constant value for EDP. I would conjecture that the value of the SDP relaxation for EDP is $\Omega(\sqrt{\log n})$, with very little evidence to show for it. A lot of effort in the final parts of polymath 5 went towards finding a suitable relaxation of the SDP relaxation that is amenable to analysis. In particular, there are several threads that explore the approach: look at the SDP dual, assume special properties of a dual solution, look for a dual solution with such properties. Of course, by (weak) convex duality, a dual solution certifies a lower bound.

Finally, let me get back briefly to determinants. If we strengthen discrepancy to hereditary discrepancy (for EDP this means first fix a subset S of the integers and then look at the discrepancy of arithmetic progressions intersected with S), then $|\det(A_n)|^{1/n} = M(n)^{1/n}$ lower bounds hereditary discrepancy (this is a beautiful theorem of Lovasz, Spencer, and Vesztergombi). But it seems that this is too weak again: $M(n)^{1/n}$ is basically 1. However, there is a way to cleverly take submatrices of $A_n$ and successfully use the determinant lower bound to give very strong lower for hereditary discrepancy in the EDP setting: see this writeup by myself and Kunal Talwar (sorry for the shameless plug): http://paul.rutgers.edu/~anikolov/Files/EDP.pdf

March 28, 2013 8:36 am

Thanks for great comment.

The plug is cool…I was not aware of this work and it looks quite interesting.

dick lipton

PS: The EDP can work with the Redheffer matrix and not the divisor matrix. Can we exploit this?

• April 20, 2013 5:44 pm

Thank you, Dick!

I think you are right that for EDP it’s enough to look at the Redheffer matrix.

Let me define $X = X_n$ to be the matrix that consists of the Redheffer matrices $R_1, \ldots, R_n$ stacked on top of each other (the smaller matrices are filled with zeros so that the dimensions match). I think the way to go is to give a lower bound by exhibiting a dual SDP solution for $X$. That amounts to finding positive diagonal matrices $P$ and $W$ such that $\mathsf{trace}(P) = \mathsf{trace}(W) = 1$ and $X^T P X - d W$ is a PSD matrix, where $d$ is the discrepancy lower bound. We’d like $d = \omega(1)$, but I think even $d = 3$ would be a new result.

8. March 27, 2013 1:50 pm

Regarding the Riemann hypothesis, one thing that I would be happy to see is a blog detailed post and discussion of the state of the art and various approaches similar to Tao’s post and subsequent discussion on the Navier Stokes equation.

9. March 28, 2013 10:44 am

maybe a long shot but this reminds me a little of the hurst coefficient