Cropped from src1, src2

Arturs Backurs and Piotr Indyk are student and advisor. The latter, Piotr, is one of the leading algorithms and complexity theorists in the world—what an honor it must be for Arturs to work with him as his advisor.

Today Ken and I want to talk about their paper on edit distance and an aspect that we find puzzling.

The paper in question is, “Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false).” It is in this coming STOC 2015.

What we find puzzling is that the beautiful connection it makes between two old problems operates between two different levels of scaling, “${n}$” and “${2^n.}$” This messes up our intuition, at least mine.

I, Dick, have thought long and hard, over many years, about both the edit distance problem and about algorithms for satisfiability. I always felt that both algorithms should have much better than the “obvious” algorithms. However, I was much more positive about the ability for us to make a breakthrough on computing the edit distance, then to do the same for satisfiability.

The way of linking the two problems is to me quite puzzling. Quoting my favorite band, the Talking Heads:

… Now don’t you wanna get right with me?
(puzzling evidence)
I hope you get ev’rything you need
(puzzling evidence)

Puzzling evidence
Puzzling evidence
Puzzling evidence …

## The Problems

The edit distance between two strings is defined as the minimum number of insertions, deletions, or substitutions of symbols needed to transform one string into another. Thus CAT requires three substitutions to become ATE, but it can also be done by one insertion and one deletion: pop the C to make AT and then append E to make ATE. Thus the edit distance between these two words is 2. The problem of computing the edit distance occurs in so many fields of science that it is hard to figure out who invented what first. The case of strings of length ${N}$ is easily seen to be computable in time quadratic, ${O(N^{2})}$, by a dynamic programming algorithm that builds up edit distances between initial substrings.

Chak-Kuen Wong and Ashok Chandra proved this is optimal in the restricted model where one can only compare characters to each other. There are algorithms that beat quadratic by logarithmic factors—they essentially treat blocks of characters as one. But it remain open after much research whether there is an algorithm that runs in time of order ${n^{1.999998}}$, for example.

The ${\mathsf{SAT}}$ problem is the usual question of testing Boolean clauses to see if they can all be satisfied at the same time by the same assignment to the Boolean variable. ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ restricts to formulas in conjunctive normal form, and ${k\text{-}\mathsf{SAT}}$ restricts to clauses with at most ${k}$ literals per clause.

## The Connection

Backurs and Indyk prove that if there exists ${\delta > 0}$ such that edit distance can be decided in time ${O(N^{2 - \delta})}$, then there exists ${\epsilon > 0}$ such that ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ for formulas with ${n}$ variables and ${m}$ clauses can be solved in time ${m^{O(1)}2^{(1-\epsilon)n}}$. They build on a connection to SETH showed ten years ago by Ryan Williams in part of a wider-ranging paper.

The basic idea is how ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ behaves with regard to partitions of the ${n}$ variables of a formula ${\phi = C_1 \wedge\cdots \wedge C_m}$ into two size-${n/2}$ subsets, call them ${X_1}$ and ${X_2}$. Let ${A}$ be the set of assignments to ${X_1}$ and ${B}$ to ${X_2}$. For every assignment ${a \in A}$, let ${S_a}$ be the set of clauses it satisfies, and ${T_a}$ the remaining clauses which it does not satisfy. Similarly define ${S_b}$ and ${T_b}$ for ${b \in B}$. Then

$\displaystyle \phi \in \mathsf{SAT} \iff (\exists a,b)[|S_a \cup S_b| = m] \iff (\exists a,b)[T_a \cap T_b = \emptyset].$

Now let us identify ${a}$ with ${T_a}$ regarded as an ${m}$-bit vector and similarly ${b}$ with ${T_b}$, also re-labeling ${A}$ to be the set of ${N = 2^{n/2}}$-many ${T_a}$‘s, ${B}$ for ${T_b}$‘s. Then as Williams observed, ${\phi}$ is satisfiable if and only if we have a yes-instance of the following problem:

Orthogonal Vectors Problem (${\mathsf{OVP}}$): Given two sets ${A,B}$ of ${N}$-many length-${m}$ binary vectors, are there ${a \in A}$ and ${b \in B}$ such that ${\sum_{i=1}^m a_i b_i = 0}$?

It is obvious to solve ${\mathsf{OVP}}$ in time ${O(N^2 m)}$ by trying all pairs. The nub is what happens if we achieve anything slightly better in the exponent than quadratic, say time ${N^{2-\epsilon} m^{O(1)}}$. Then with ${\epsilon' = \epsilon/2}$ we get time

$\displaystyle \left(2^{n/2}\right)^{2-\epsilon} m^{O(1)} = 2^{n - \epsilon' n} m^{O(1)}$

for ${\mathsf{CNF}\text{-}\mathsf{SAT}}$, which contradicts SETH.

What’s puzzling is that the evidence against doing better than quadratic comes when ${N}$ is already exponential, ${N = 2^{n/2}}$. Moreover, the ${\mathsf{OVP}}$ instances ${(A,B)}$ involved are ridiculously large, exponential sized, and we don’t even care that they have a succinct genesis in terms of ${\phi}$. (Note that we have swapped the letters ${n}$ and ${N}$ from their paper—we find it helpful to keep “${N}$” the larger one.)

Backurs and Indyk itemize several problems to which this connection was extended since 2010, but we agree that Edit Distance (${\mathsf{ED}}$) is the most striking addition to this list. Their new result is a kind of “SETH-reduction” from ${\mathsf{OVP}}$ to ${\mathsf{ED}.}$ Can we capture its essence without referencing ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ each time?

## The Reductions

The results here and before all use an unusual type of reduction. Ken and I think it would be useful to formalize this reduction, and try to understand its properties. It is not totally correct to call it simply a quasi-linear time reduction because multiple parameters are involved—we can call them ${N}$ and ${m}$ quite generally.

In the above case with ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ and ${\mathsf{OVP}}$, if the clause size is fixed then we have ${m = n^{O(1)}}$ so ${m = \mathsf{polylog}(N)}$. It hence suffices to have a reduction from ${\mathsf{OVP}}$ to ${\mathsf{ED}}$ that is computable in quasi-linear time, here meaning time ${N\mathsf{polylog}(N)}$. Indeed, we can allow time ${N\cdot g(N)}$ for any function ${g(N) = N^{o(1)}}$.

When talking just about the problems ${\mathsf{OVP}}$ and ${\mathsf{ED}}$, however—without reference to ${\mathsf{CNF}\text{-}\mathsf{SAT}}$${m}$ and ${N}$ are separate parameters with no relation specified. It suffices to say that the reduction is polynomial in ${m}$ and quasi-linear in ${N}$. This is essentially what Backurs and Indyk do. Their “${m}$” is called “${d}$“; then they define ${\ell_0 = 1000d}$, ${\ell_1 = \ell_0^2}$, and ${\ell_2 = \ell_0^3}$; then they multiply ${\ell_0 d}$, and so on. The details in their paper are considerable, involving an initial reduction from ${\mathsf{OVP}}$ to a problem they call ${\mathsf{PATTERN}}$, and this is one reason we’d like to streamline the reduction concept.

If we assume ${m = \Omega(\log N)}$, then “quasi-linear in ${N}$ and polynomial in ${m}$” is the same as “linear in ${N}$ and polynomial in ${m}$.” Perhaps the latter phrase is the simplest and best way to define the reduction? However, we do not need to specify “polynomial in ${m}$” either. It is enough to have a suitable sub-exponential time in ${m}$. For instances with ${m = O(n)}$ we would need ${2^{o(m)}}$ not ${2^{m - \epsilon m}}$, while for ${m = O(n^3)}$ we would need ${2^{o(m^{1/3})}}$.

Parameterized complexity defines reductions with two parameters, but the simplest ones are not exactly suitable. Finally, we wonder whether it would help to stipulate any of the specific structure that comes from ${\mathsf{SAT}}$ including that the instances ${(A,B)}$ are succinct. Note that we once covered succinctness and a hypothesis roughly related to SETH (see this comment for a circuit form). This paper highlighted by Backurs and Indyk works from ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ but says it could have used ${\mathsf{OVP}}$, while still not formalizing the reduction concept. Likewise their other major references; some work from ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ and others not. The latest of them, by Karl Bringmann and Marvin Künnemann who show “SETH-hardness” for ${\mathsf{ED}}$ on binary strings, defines a framework for the gadgets used in all these reductions.

## Power and Puzzlement

The remarks just above about the reduction time in “${m}$” make us recall the three most common levels of exponential hardness. The power index of a language ${L}$ was coined by Richard Stearns and Harry Hunt in a 1990 paper. It is the infimum of ${\epsilon > 0}$ such that ${L}$ belongs to time ${2^{O(n^\epsilon)}}$. Their “Satisfiability Hypothesis” (SH) about the power index of satisfiability is much weaker than SETH, though not as weak as conjecturing merely a lower bound of ${2^{n^{\Omega(1)}}}$.

• SH: ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ has power index 1.

• ETH: All algorithms for ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ require time ${2^{\Omega(n)}}$.

• SETH: All algorithms for ${\mathsf{CNF}\text{-}\mathsf{SAT}}$ require time ${2^{n - o(n)}}$.

The latter two come in slightly different versions bringing ${m}$ and/or a fixed ${k}$ in ${k}$${\mathsf{SAT}}$ into the picture, and of course all these versions might be false. SETH is distinguished as the closest to the upper bound. There are randomized algorithms for ${k}$${\mathsf{SAT}}$ that run in time ${2^{n - (\epsilon/k)n}}$ for some ${\epsilon > 0}$ and all ${k}$, and ${k}$ can be replaced by ${\log(m/n)}$ in general.

Stearns and Hunt emphasized the effect of reductions on SH and the power index in general. The same can be done for ETH. But we remain equally puzzled about the issue of the size of the problems used in the reductions. We start with a SAT problem that uses ${b}$ bits in its description. This is viewed then eventually as an edit distance problem that uses exponential in ${b}$ bits. The ${N}$ is the edit problem is extremely large. Of course this is just fine, since we only claim to get an exponential time algorithm.

The point is that our intuition about the edit distance problem is all on problems that have modest size ${n}$. I, Dick, actually had a student build a hardware machine that did modest size such problems many times faster than any computer. So all my intuition was—is—about small size edit problems. When the size of ${n}$ becomes astronomical my intuition may fall apart. Could this explain the reason that the result here seems puzzling?

## Open Problems

So what is the complexity of computing the edit distance? Can we really not do better than the obvious algorithms? This seems hard, no puzzling, to us; but it may indeed be the case.

The 3SUM problem, which we recently covered, is attacked by some of these papers but has not of yet been brought within the scope of the reductions from ${\mathsf{OVP}}$. The ${\tilde{O}(n^{3/2})}$ decision tree upper bound has not yet yielded an algorithm that actually runs in time ${n^{2-\epsilon}.}$ Yet perhaps the above kinds of reductions also generally preserve the decision-tree complexity? This would make the decision-tree result a real obstacle to showing “SETH-hardness” in that manner.

Our condolences

 Awesome Stories source

John Nash and his wife Alicia were killed in a taxi accident on the New Jersey Turnpike Saturday afternoon. They were on their way back from Norway where he and Louis Nirenberg had just accepted the 2015 Abel Prize. Besides meeting the king of Norway, Nash had also expressed a desire to meet world chess champion Magnus Carlsen during remarks at Princeton’s celebration of his Abel Prize in March, and that was also granted this past week.

Today Dick and I join many expressing shock and offering condolences.

Or rather, what can the shapes of proofs tell us about them?

 April CACM source

Juris Hartmanis did much to lay the landscape of computational complexity beginning in the 1960s. His seminal paper with Richard Stearns, “On the Computational Complexity of Algorithms,” was published 50 years ago this month, as observed by Lance Fortnow in his blog with Bill Gasarch. It is a great achievement to open a new world, but all the more mysterious that after 50 years so much of its landscape remains unknown.

Today we ask what might determine the unseen topography and how much some recent large-data discoveries may help to map it.

The polynomial hierarchy is infinite for a random oracle

Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.

Today we congratulate them on their achievement and describe part of how their new result works.

Problems beyond brute force search

 Cropped from Wikipedia source

Hans-Joachim Bremermann was a mathematician and biophysicist. He is famous for a limit on computation, Bremermann’s limit, which is the maximum computational speed of a self-contained system in the material universe.

Today Ken and I wish to talk about the limit and why it is not a limit.

 “Dr. Kibzwang” source

Colin Potts is Vice Provost for Undergraduate Education at Georgia Tech. His job includes being a member of the President’s Cabinet—our president, not the real one—and he is charged with academic policies and changes to such policies. He is also a College of Computing colleague and fellow chess fan.

Today I want to state a conjecture about the behavior of faculty that arose when Tech tried to change a policy.