Skip to content

How Hard, Really, is SAT?

September 4, 2016

A new longest computer proof makes us wonder about things from security to the Exponential Time Hypothesis


Marijn Heule, Oliver Kullmann, and Victor Marek are experts in practical SAT solvers. They recently used this ability to solve a longstanding problem popularized by Ron Graham.

Today Ken and I want to discuss their work and ask a question about its ramifications.

The paper by them—we will call it and them HKM—is titled, “Solving and Verifying the boolean Pythagorean Triples problem via Cube-and-Conquer.” The triples problem is a “Ramsey”-like question raised years ago by Graham. Cube-and-Conquer is a method for solving large and complex SAT problems. Sandwiched in between is a clever new tuning of resolution SAT methods called “DRAT” which we discuss in some detail.

The Pythagorean Triples Problem

Ron was interested in a problem that generalizes Schur’s Theorem, due to Issai Schur. Suppose we color the numbers {\{1,\dots,N\}} red and green. Can we always find {a,b,c} three distinct numbers of the same color so that

\displaystyle  a + b = c?

Schur’s theorem says that provided {N} is large enough this is true. Note that another way of putting this is that with {S = \{a,b\}}, all elements of the set {\mathsf{nes}(S)} of nonempty sums from {S} are the same color. Several mathematicians independently proved the extension that there are arbitrarily large sets {S} with this property—indeed for any number of colors:

\displaystyle  (\forall k)(\forall m)(\exists C)(\forall \chi\!:\!\mathbb{N}\!\rightarrow\![k])(\exists S\!\subseteq\![C], |S|\!=\!m)(\exists j\!\in\![k])(\forall s\!\in\! \mathsf{nes}(S))\chi(s)\!=\!j.

All the sums of course are linear. What happens if we go to higher powers {p}?

If we simply look at {p}-th powers of sums from {S} then we tie into the same theorem via {\chi'(x) = \chi(x^p)} for all {x}. Taking sums of {p}-th powers such as {a^p + b^p} is different. We can map that case into the simple sums problem with {S' = \{a^p,b^p\}} in place of {S = \{a,b\}}, but it is not clear how to argue similarly with mapped colorings {\chi'}. Sets of the form {S'} are special.

We can make them even more special by requiring {a^p + b^p} to be a perfect {p}-th power too. OK, for {p \geq 3} we are kidding, but the case {p = 2} and {m = 2} is the famous one of Pythagorean triples. Suppose we color the numbers {\{1,\dots,N\}} red and green. Can we always find three distinct numbers {a,b,c} of the same color so that

\displaystyle  a^{2} + b^{2} = c^{2}?

This is the question Ron asked. In the spirit of Paul Erdős, he offered $100 for a solution.

The answer from HKM is that this extension is true. Perhaps that is not too surprising, since many problems can be generalized from from linear to non-linear cases. But what is really perhaps the most interesting part is that HKM found a proof via using SAT solvers.

The Theorem

The exact theorem HKM prove is:

Theorem 1 The set {\{1,\dots,7824\}} can be partitioned into two parts, such that no part contains a Pythagorean triple, while this is impossible for {\{1,\dots,7825\}}.

Note, this shows that Schur’s theorem does extend from {a+b=c} to {a^{2}+b^{2}=c^{2}}.

What is special about {7,\!825}? According to this list of triples up to {c = 10,\!000}, which is linked under “Integer Lists” from Douglas Butler’s TSM Resources site, there are seven Pythagorean triples involving {7,\!825}:

\displaystyle  \begin{array}{rcl}  625^2 + 7800^2 &=& 7825^2\\ 1584^2 + 7663^2 &=& 7825^2\\ 2191^2 + 7512^2 &=& 7825^2\\ 2784^2 + 7313^2 &=& 7825^2\\ 4180^2 + 6615^2 &=& 7825^2\\ 4695^2 + 6260^2 &=& 7825^2\\ 5180^2 + 5865^2 &=& 7825^2 \end{array}

There are five distinct entries for {c = 7,\!820}, various others before it, and quite a few for {7,\!830} after it. Nothing, however, shouts why {7,\!825} is a barrier. It seems better to think of it as a tipping point.

The Proof

There are {2^{7,\!825}} colorings for the numbers up to {7,\!825}. This immediately stops any simple brute-force approach. What must be done is to break the immense number of cases down to a more manageable number. HKM did this by clever use of known SAT methods with at the addition of heuristics that are tailored to this question.

The previous best positive result had been a 2-coloring of {[1 .. 7,\!664]} with no monochromatic triple, so HKM had a good idea of how large an {N} to try. The SAT encoding is simple: use variables {x_i} for {2 \leq i \leq N}, and for every {(a,b,c)} such that {a^2 + b^2 = c^2}, include the clauses

\displaystyle  (x_a \vee x_b \vee x_c) \wedge (\bar{x}_a \vee \bar{x}_b \vee \bar{x}_c).

If we give {x_i = 1} the meaning that the number {i} is colored green, then this says that for every Pythagorean triple, at least one member must be green and at least one must be red. 3SAT remains NP-complete for clauses of all-equal sign, as follows by tweaking the proof at the end of this post.

Now with {N = 7,\!825}, what HKM needed to do was to prove the formula unsatisfiable. Proving satisfiability is easy when you know or guess a satisfying assignment—in this case, a coloring. The following graphic from the Nature article on their work shows a coloring for {7,\!824} in which the white squares are “don’t-cares”—they can be either color:

HKMsolution-7824 copy

The top row goes 24 squares; the cell after them is the sticking point. How to prove there is no consistent way to color it? Given the formula {F = C_1 \wedge C_2 \wedge \cdots \wedge C_m}, it may be hard to recognize that it entails a contradiction. The general idea, roughly speaking, is to add more clauses {C} to make a formula {F'} so that:

  1. Unsatisfiability of {F'} implies unsatisfiability of {F}.

  2. {F'} is “stuffed enough” with clauses (and pared down in other regards) so that a contradiction can be derived after feasible effort.


Besides good guesses for {N}, HKM were armed with the latest knowledge on well-performing heuristics. A 2012 paper by Matti Järvisalo with Heule and Armin Biere includes an overview of resolution-related properties involving a big array of acronyms such as AT and HBC and RHT. The AT stands for “asymmetric tautology” and the ‘R’ prefix applied to a formula property {P} enlarges {P} by adding cases where a certain kind of resolution yields a formula with {P}. Combining these two yields the following definition—we paraphrase the newer paper’s version informally:

Definition 2 Given a formula {F} and a clause {C} not in {F}, say {C} has RAT via a literal {z} in {C} if for all clauses {D} of {F} containing {\bar{z}} the following happens: when you make the other literals in {C} and {D} false, remove {D}, and simplify, you get an immediate contradiction.

We should say more about “simplify”: Suppose {z_1,\dots,z_k} are those other literals. Making them false is the same as making the formula

\displaystyle {F_0 = F \setminus \{D\} \wedge \bar{z}_1 \wedge \cdots \wedge\bar{z}_k,}

which has their negations as unit clauses. We simplify by removing for each {i} all clauses with {z_i} (those were satisfied) and deleting {\bar{z}_i} from other clauses. After doing this, there may be other unit clauses, whereupon we repeat. If we get both {y} and {\bar{y}} for some variable {y}, that’s the immediate contradiction we seek. What’s important is that this unit resolution process, while “logically” inferior to full resolution, stops in polynomial time.

Now suppose {F} is satisfiable and {C} has RAT via {z}. If there is a satisfying assignment {a} of {F} that sets {z} or one of the other literals in {C} true, then {F' = F \wedge C} is also satisfied. So suppose {a} sets them all false. Now there must exist a clause {D} in {F} containing {\bar{z}} such that {a} sets the other literals in that clause false—if none then we could have set {z} true after all. Then {F_0} above is satisfied by {a}, but this is a contradiction of the (immediate) contradiction.

Note also that if {F' = F \wedge C} is satisfiable then of course so is {F}. This isn’t important to the unsatisfiability proof but is good to know: RAT clauses can be added freely. The trick is to find them. Definition 2 was crafted to make it polynomial-time recognizable that {C} is RAT when you have it, but you still have to find it. A particularly adept choice of {C} may allow simplifications that delete other clauses, yielding a technique called DRAT for “Deletion Resolution Asymmetric Tautology” proofs.

This is where the other ingenious heuristics—tailored for the triples problem but following a general paradigm called “cube-and-conquer”—come in. We’ll refer those details to the paper and its references, but this breakthrough should make one excited to read more about the state of the art.

Dart the Art?

The problem took “only” two days of computing on a supercomputer—the Texas Advanced Computing Center. The computation generated 200 terabytes of raw text output. It is not clear to us whether even more intermediate text was generated on-the-fly as unsuccessful moves were backtracked-out, or how much. HKM say in their abstract:

…Due to the general interest in this mathematical problem, our result requires a formal proof. Exploiting recent progress in unsatisfiability proofs of SAT solvers, we produced and verified a proof in the DRAT format, which is almost 200 terabytes in size. From this we extracted and made available a compressed certificate of 68 gigabytes, that allows anyone to reconstruct the DRAT proof for checking.

As with all computer proofs we still would like a human-readable proof. It is not that we do not trust the validity of the current proof, but rather that we would like to “understand” if possible why Ron’s problem is answered. Can we possibly extract from the certificate a dart of reasoning that yields a shorter explanation? It might be a numerical potential function whose values {v} in this case are guessable and verifiable, such that {v > t} for some threshold {t} analytically implies unsatisfiability.

We also wonder why the size-{N} formulas treated here should be any more difficult than ones you can get for factoring {N}-bit numbers. As we noted above, the all-signs-equal condition on the literals comes without loss of generality. So the degree of ease that allowed solving on a university center in two days must come from how the Pythagorean pattern gave a leg up to “cube-and-conquer.” For factoring there might be other legs—and the “{N}” from current security standards might yield even smaller formulas.

Last, as noted in the papers we’ve linked, the DART condition has universality properties with regard to resolution in general, yet builds on steps that are in polynomial-time {p(n)}. It was an incremental liberalization of previously used steps, and this makes us wonder whether it can be enhanced further while still yielding proofs that take up {g(n)p(n)} length and time. Perhaps we can get the {g(n)} part to be {2^{o(n)}}? That would refute some forms of the “Exponential Time Hypothesis,” which we last discussed here.

Open Problems

The most immediate questions raised by this wonderful work are: what about other equations, and what about allowing more colors? Does having three colors zoom the problem beyond any hope of attack by today’s computers, or will the practical breakthroughs continue a virtuous cycle with advances in theory that bring more cases into the realm of feasibility? Is there an asymptotic analysis that might guide our ability to forecast this?

[fixed typo in SAT encoding and struck “not” between “does” and “extend”]

11 Comments leave one →
  1. September 5, 2016 3:48 pm

    Ah, pardon—I deleted a typo-query comment as requested, but that also deleted the reply. There was no typo of the kind queried; in another section after the theorem statement I did delete (a minute after posting) a “not” which I’d incorrectly inserted between “does” and “extend” while doing last-minute edits.

  2. September 6, 2016 10:17 am

    I wonder are generalizations like {a^{2}+b^{2}+c^{2}=d^{2}} interesting?

    For example it looks like {105} is unsat for this, and {37} is unsat for
    the sums of four squares, then
    {23, 18, 20, 20, 15, 16, 20, 23, 17, 21, 26, 17, 23, …} (diverge?)

    And cubes etc?

  3. September 7, 2016 8:09 pm

    💡❗⭐😮😎😀❤ havent analyzed this in detail yet but its clearly breakthru level work & think it heralds a deeper reality with more general techniques whereas nearly nobody else seems to be on to this yet. maybe the tip of a future iceberg. (need to blog on this asap!) the key issue is how finite instances of SAT can lead to a sort of machine-based induction. also think machine learning techniques will play a big role here. theres also the task of attempting to simplify/ streamline the proofs which will always involve further human conceptual/ aesthetic work. this post from 2014 describes "SAT induction"

  4. September 7, 2016 8:12 pm

    oh yeah, just remembered, coincidentally revisted the topic recently & was trying for a proof of bertrands postulate via SAT induction & got the beginnings but needs more work. any takers? think it surely cant be that hard & could point the way toward deeper theory. also one wonders if this might work on other notorious unresolved number theory problems eg Collatz

  5. September 8, 2016 11:54 pm

    Typo: where you introduce the SAT encoding you say $a^2+b^2+c^2$, but it should be $a^2+b^2=c^2$.

  6. Curious permalink
    September 10, 2016 2:41 pm

    So is SAT easy?


  1. math celebration 2016: pythagorean triples empirical attack conquest | Turing Machine
  2. A Proof Of The Halting Theorem | Gödel's Lost Letter and P=NP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s