Why the recent breakthrough is important

 photo sources: UNH and Simons article

Yitang Zhang, of the University of New Hampshire, has apparently proved a finite approximation to the famous Twin Prime Conjecture. This is a result of the first order. After ten days of progressively more detailed news, including today’s article in the New York Times, Zhang’s 56-page preprint has just been released on the Annals of Mathematics journal website. This may be a revision of the original submission, which was said in a story in Nature last week to have needed “a few minor tweaks.”

Today Ken and I want to explain important aspects of the Twin Prime Conjecture.

Recall that the Twin Prime Conjecture states that there are an infinite number of primes ${p}$ and ${q}$ that are as close as possible: ${p-q = 2}$. Well ${3}$ and ${2}$ are closer, but that can only happen once, so the best one can hope for is primes that are within two of each other.

Zhang’s beautiful result is that there are an infinite number of primes ${p}$ and ${q}$ so that ${p-q = 2N}$ and ${N}$ is bounded by an absolute constant. The constant is large—in the tens of millions—but it is a constant. Perhaps we should call these “Cousin Primes,” since they are not within two; I will leave the naming of them to the number theorists. Whatever you call them, his result is a huge improvement to what was previously proved, and is a singular advance.

The proof is long, which is not unexpected. Ken saw the news of the paper’s release earlier today on Terry Tao’s Google+ page about the work, which gives some idea of how the proof goes. There are many links and comments in a post by Peter Woit that also mentions a recently announced proof by Harald Helfgott of the “ternary Goldbach conjecture” that every odd number above 5 is the sum of three primes.

So what can we possibly add to the discussion about Zhang’s breakthrough? Nothing on the proof right now. Something, however, on why the Twin Prime Conjecture can be really useful. This is all from a personal point of view, but one that I hope you will enjoy. Let’s first take a quick look at what was known before his work, and then discuss what it may be useful for.

## Before Zhang

One measure of the density of the primes is that the summation

$\displaystyle \sum_{p} \frac{1}{p}$

does not converge. That is, the sum

$\displaystyle \sum_{p

tends to infinity as ${x}$ tends to infinity. The growth is slow, but the sum does diverge. In 1915, Viggo Brun used sieve methods to prove that twin primes were rarer in a precise sense: the summation over twin primes

$\displaystyle \sum_{p} \frac{1}{p}$

converges. Indeed his result can be improved to show that the number of twin primes less that ${x}$ is bounded above by

$\displaystyle O\left(\frac{x}{(\log x)^{2}}\right).$

Using heuristic arguments, Godfrey Hardy and John Littlewood guessed that not only are there an infinite number of twin primes, but that there density is close to what a “random” model would predict. Let ${\pi_{2}(x)}$ be the number of twin primes less than ${x}$—recall ${\pi(x)}$ is used to denote the number of primes less than ${x}$—then the Hardy-Littlewood Conjecture is that

$\displaystyle \pi_{2}(x) \approx C\left(\frac{x}{(\log x)^{2}}\right),$

for an explicit constant

$\displaystyle C = 0.6601618158468695739278121100145\dots$

Tao is on record as saying that certain approaches based on sieve theory cannot resolve the Twin Prime conjecture—see this for a short discussion. Mark Lewko, in a comment to a MathOverflow thread on Zhang’s paper, indicates that its mechanism alone cannot reduce the gap under ${16}$, and it does not circumvent a more general obstacle to gaps below ${6}$. However, even if Zhang’s new techniques do not overcome such general barriers, at least they push against them with a lot more oomph.

## Another Problem

Years ago I worked on a problem that had nothing directly to do with the Twin Prime Conjecture. The question is a fundamental one about the complexity of Boolean functions. It is not classic, has not been open for a very long time, and could be trivial. But like many problems about Boolean functions it turned out to fight back hard, and the best we could do was to make a small dent in the problem.

The work was joint with Mihail Kolountzakis, Evangelos Markakis, Aranyak Mehta, and Nisheeth Vishnoi. See the paper for details. Indeed while I helped start the research with Markakis, Mehta, and Vishnoi, without Kolountzakis the work would have never been completed. Our result was then greatly improved by Amir Shpilka and Avishay Tal, as we covered here.

The problem is concretely about Boolean functions ${f}$ of ${k}$ variables, and seems not to involve prime numbers at all. For any subset ${S}$ of the coordinates, the corresponding Fourier coefficient is given by

$\displaystyle \hat{f}(S) = \frac{1}{2^k} \sum_{x \in \mathbb{Z}_2^k} f(x)\chi_S(x)$

where ${\chi_S(x)}$ is ${-1}$ if ${\sum_{i \in S} x_i}$ is odd, and ${+1}$ otherwise. The problem is:

What is the smallest value ${\tau(k)}$ such that for every symmetric 0-1 function ${f(x_1,\dots,x_k)}$ on ${\mathbb{Z}_2^k}$ that is not affine linear—by symmetry, this means neither constant nor a parity function—some ${\hat{f}(S)}$ with ${1 \leq |S| \leq \tau(k)}$ does not vanish?

## The Prime Gap Trick

A heuristic that I have used before is this: When trying to prove some theorem in number theory, assume any reasonable property that you need about primes. Either you will at least get a conditional theorem, or you might later be able to weaken the assumption you made. The latter is what happened to us. Or you might be luckier and get your theorem to follow both from the assumption and its negation, so that you don’t need it at all. I noted once how Littlewood did this with the Riemann Hypothesis—incidentally that post was about a theorem of Ravi Kannan, and I am attending a birthday workshop in his honor later this week.

In this case we reduced the problem to showing that a certain integer-valued polynomial is constant over the set ${\{0,1,\dots,k\}}$. Then we expressed the connection in the paper in these terms:

First, we show that ${P}$ is constant over the union of two small intervals ${\{0, ..., t\} \cup \{k - t, ..., k\}}$. This is obtained by looking at ${P}$ modulo carefully chosen prime numbers. One way to prove this (at least infinitely often) would be to assume the twin primes conjecture… We manage to replace [this] by choosing four different primes in a more involved manner…

Thus we actually did something else besides use twin primes, but this is how we got the idea. Moreover, Shpilka and Tal used gaps between consecutive primes in a different way, obtaining bounds in terms of the largest such gap in the interval ${1\dots k}$, which is known to be ${O(k^{0.525...})}$. If we can now get our hands on enough cases where the gaps are small, maybe we can improve the estimates further. Why is it useful to have primes with small gaps?

We have covered the use of the Chinese Remainder Theorem for analytical results before. Usually for complexity-theoretic applications such as this one, we want the primes themselves to be small—and don’t mind having a bunch of primes. For logspace we can work with polynomially-many primes in a sequential manner, so long as each needs only ${O(\log n)}$ bits to store.

When we don’t need size to be so constrained, however, it can be more useful to have the gap between primes ${p,p+c}$ in the representation be small. Then if we know in advance that values ${m = P(x)}$ are below ${p^2/c}$, we know that the values ${m \bmod{p}}$ and ${m \bmod{p+c}}$ have to be close in an absolute sense. In particular, they cannot be closer than ${c}$ to each other unless ${m < p}$, making them equal—and in our case we would get them all to be zero. For higher ranges of ${m}$ one retains a lot of this control. The larger ${p}$ and the smaller ${c}$, the bigger the effective range.

We will hence be further interested in how dense the pairs of “cousin” primes must be, and how efficiently they can be generated. Anytime there is a breakthrough, it is time to revisit old ideas and see whether they too can profit.

## Open Problems

How far can the gap between consecutive primes, for infinitely many such pairs, be reduced? Do this and Helfgott’s result on Goldbach herald a more general breakthrough in number theory?

[hedged asserting that the posted preprint, which is undated, is a revision of the original April 17 submission; sourced photo and linked to Simons Foundation article]

Can we help avoid parallel repetition of mistakes?

Irit Dinur has recently again shown a wonderful skill at re-conceptualizing an area that had seemingly been well worked out. A notable previous instance was her re-casting the proof of the PCP Theorem as a progressive amplification. Now she and David Steurer have posted a new paper whose title, “Analytical Approach to Parallel Repetition,” introduces a new framework. The subject of parallel repetition, which they call “a basic product operation for one-round two-player games,” is distinctive in having its genesis in a mistake made in a paper—a trap of automatic thinking. Lance Fortnow and Mike Sipser thought that executing multiple instances of a randomized protocol in parallel would have the same power-law reduction in the probability of overall failure as executing them independently in sequence, overlooking how the strategies in the parallel cases could be dependent and exploit this.

Today we talk about similar traps that people can fall into even at advanced graduate level. How might they be avoided?

Finally progress on this annoying problem

David Rosenbaum is right now the world expert on one of my favorite problems, group isomorphism. He is a third-year PhD student at the University of Washington in Seattle under Paul Beame, and has been visiting MIT this year to work with his other advisor, our familiar friend Aram Harrow. He presented a paper on this at SODA 2013, and recently posted a successor paper to the ArXiv. He grew up in the Portland area where he won prizes in scholastic chess tournaments.

Today I want to talk about his work, which not only advances our understanding of this problem, but also makes progress on other ones.

The discrete log and the factoring problem

Antoine Joux is a crypto expert at Versailles Saint-Quentin-en-Yvelines University. He is also one of the crypto experts at CryptoExperts, having joined this startup company last November. His work is featured in all three of the company’s current top news items, though the asymptotic breakthrough on the exponent of finding discrete logs in small-characteristic fields which we covered last month is not among them. In its place are concrete results on two fields of medium characteristic (between a million and a billion) whose elements have bit-size 1,175 and 1,425. The news release on this concludes (emphasis in original):

[We] recommend to all cryptographic users to stop using medium prime fields.

Today I want to talk about a mystery, which I find the most puzzling problem in all of complexity theory, but which Ken thinks is “only” a sign of youth of the field.

How linear algebra can make databases go really fast

Mike Stonebraker is one of the world’s leading expert on database technology. He started in academe at Berkeley, is now again in academe at MIT, and has launched a raft of successful companies. He is currently the co-founder and chief technology officer of at least three startups in the database area. One is called “Data Tamer” and is a joint venture with researchers from QCRI—Qatar Computing Research Institute—see this release.

Today I would like to talk about his presentation at the TTI Vanguard meeting on “Ginormous Systems” in DC. His talk was on “Seven Big Data Megatrends.”

By the way the word “Ginormous” is a real word–see here for the formal definition. I initially thought the Vanguard organizers had made up the word, but it is real. It should be obvious that Ginormous means large, actually really Large. This Vanguard meeting was dedicated to Ginormous systems of all kinds: from huge data centers, to city-wide systems, to supercomputers, and much more.

In Mike’s wonderful talk he made seven points about the past, present, and the future of database technology. He has a great track record, so likely he is mostly right on his guesses. One of his predictions was about a way of re-organizing databases that has several remarkable properties:

• It speeds up database operations 50x. That is to say, on typical queries—ones that companies actually do—it is fifty times faster than classical database implementations. As a theorist we like speedups, especially asymptotic ones. But 50x is pretty cool. That is enough to change a query from an hour to a minute.
• It is not a new idea. But the time is finally right, and Mike predicts that future databases will use this method.
• It is an idea that no one seems to know who invented it. I asked Mike, I asked other experts at the conference, and all shrugged and said effectively: “I have no idea.” Curious.

Let’s look quickly at the way databases work, and then consider the trick.

## Some Background

Modern databases store records—lots of them, usually on disks. A record is a fixed-size vector of information. The vector is divided into fields, where a field stores a type of information. An example is:

$\displaystyle [name, address, home\text{-}number, cell\text{-}number, age, \dots ].$

Thus the first field is the person’s name, the next the address, and so on.

In a sense the data is really stored in a table—or an array if you wish to be mathematical—call it ${D}$ for data. The rows contain each record, and the columns store the fields.

The issue is how the array is stored on the disk. Each record is stored one after the other on the disk. The records are stored as

$\displaystyle R_{1},R_{2},\dots,R_{N}.$

Here each ${R_{i}}$ is the ${i^{th}}$ row.

This is a reasonable method, it puts each record together, and allows fast access of all of the records. Thus, a query can scan over all the records by reading the disk one track at a time. This is not a bad way to use a disk-like device.

Mike points out that all the classic database systems—well at least most—store records in this manner. Their code, which also is huge (if not ginormous) is tuned to handle data that is stored in this manner. Let’s call it the “record ordered method” (ROM). As a mathematical idea it is just storing the array in row-major order. Not only is this a perfectly fine way to organize the data, and to store the array, it respects principles that go back to COBOL in the 1950′s: Each data object should be conceptually and physically together.

But there is a better way.

## The Trick

The trick to the 50x speedup is based on the deep, advanced, complex operation that we in math call the transpose of a matrix. Just kidding. It is based on the simple idea that instead of storing the matrix ${D}$ we store the matrix ${D^{\intercal}}$. Recall ${D^{\intercal}}$ is just the matrix defined by

$\displaystyle D^{\intercal}(j,i) = D(i,j).$

Let’s call this the column ordered method: COM. Now the data on the disk contains

$\displaystyle C_{1},c_{2},\dots,C_{M}.$

Here each ${C_{j}}$ is the ${j^{th}}$ column.

So why is this method so much faster than the ROM? The answer is how the data is accessed by the queries. The data is read much more than it is written, so the key is to speed up the ability to read the data. But the critical insight is this:

A query is likely to use only a few columns.

For example, suppose the query is:

Select all the records with age in the range [21,31] and cell phones with area code 404.

Then the query needs only to look at two columns. All the other fields are completely un-needed.

Now suppose the records have a hundred fields. Since the query only looks at two fields there is a huge speedup. Then the speedup is ${2:100}$ roughly. In the COM the database algorithm only reads the data that it needs to use to answer the query. In the ROM method it reads all the data and that tremendously slows down the query. Note, things can be even worse, since the size of fields can vary widely. So the true speedup depends on the ratio of

$\displaystyle \frac{ \text{number bits used in query}}{\text{ number of bits in the whole record}}.$

Clearly if a record has even one large field that is not used in the query, the speedup could be very large.

How did people not realize this simple idea: replace the table ${D}$ by its transpose ${D^{\intercal}}$? Well they did not actually miss it, but its power was not realized until relatively recently.

## Whose Trick?

As I stated earlier no one seems to be able to say who exactly discovered the COM. Maybe as a default we could call it the Gauss Database Method, since most things are named for him. I did track down a system called TAXIR that was essentially a COM storage system with focus on information-retrieval in biology in 1969. The paper describing it is by George Estabrook and Robert Brill. Maybe they invented it. Perhaps their focus on biology made it hard for those in databases to notice their work? Especially years ago before powerful on-line search engines. Perhaps.

Ken adds that in a textbook used years ago for Buffalo’s course on programming-language concepts, the COM idea was called “parallel arrays” and was frowned upon. The main reason given was that this structure was hard to maintain, as a single off-by-one indexing error in one array could damage the entire set of records. However, a high-level system can maintain the data in-sync, while modern machine architectures increase the reward for keeping just the data you need in caches.

## Open Problems

Okay, maybe the trick is not worth a trillion dollars. But the total amount invested yearly in data systems suggests that the column idea could over the next few years be worth quite a few dollars.

A simple thought: Is the column method the best way to store records? Can we in theory prove it is best in some sense, or is there an even better method? So forget the million-dollar Clay prizes and go after the real money.

Wang on Gödel, and Gödel on Wang

 source, with more Wang quotes

Hao Wang was a logician who made many important contributions to
mathematics and especially logic. His creation of the now famous tiling problem was of great importance. He did seminal work on mechanical mathematics, getting in 1983 the first Milestone Prize for Automated Theorem-Proving. Perhaps one of his best “results” was being the PhD advisor to Stephen Cook, Shimon Even, Stål Aanderaa, Joyce Friedman, and a recent invention-prize co-winner whom we mention below.

Today Ken and I wish to point out that it is Kurt Gödel’s birthday. Read more…

Okay, no sex, but a discussion about quantum computers.

Steven Soderbergh directed the famous movie: Sex, Lies, and Videotape. This 1989 movie won the Palme d’Or at the 1989 Cannes Film Festival, and is now in the United States Library of Congress’ National Film Registry for being “culturally, historically, or aesthetically significant.”

Today I want to explore claims about quantum computation.