A circuit based model of descriptive complexity

Andrey Kolmogorov was one of the greatest mathematicians of the last century. He created whole new areas of mathematics, he created new directions in existing areas, and he solved long standing open problems.

Today I wish to talk about a problem, that is related to his notion of complexity of a string—Kolmogorov Complexity. The problem I am interested is a natural problem, but would have strong consequences if we could solve it.

Kolmogorov is famous for his model of the complexity of a string as the length of the shortest program that can generate the string. Thus, the string of length ${n}$

$\displaystyle 0101010101010101010 \dots 01$

has complexity ${\log n + O(1)}$, while a “random” string

$\displaystyle 0101001110100101000 \dots 11$

has complexity close to ${n}$.

One of Kolmogorov’s great results was the partial solution of Hilbert’s Thirteenth problem, which was completed by his great student Vladimir Arnold. The final result was again done by Kolmogorov. He is reported to have said that proving this theorem was one of the most technically challenging proofs of his entire career.

This remarkable result it shows that conventional wisdom can be wrong, since David Hilbert appeared to believe that his Thirteenth problem would go the other way. There is some issue of exactly what Hilbert intended as the statement—so it is unclear if conventional wisdom was or was not overturned.

The final theorem that was proved by Kolmogorov states:

Theorem: For each ${n \ge 2}$, there exists ${\phi_{pq}:[0,1] \rightarrow \mathbb{R}}$ continuous functions such that every continuous function ${f:[0,1]^{n} \rightarrow \mathbb{R}}$ is represented as

$\displaystyle f(x_{1},\dots,x_{n}) = \sum_{q=1}^{2n+1} g_{q} \left( \sum_{p=1}^{n} \phi_{pq}(x_{p}) \right),$

where the ${g_{p}: \mathbb{R} \rightarrow \mathbb{R}}$ are continuous functions.

The import of the theorem is that for continuous functions there is no function of many variables. Note, the theorem shows that any function of ${n}$ variables can be expressed using only unary functions and addition. The only multi-variable function needed is addition, which seems surprising.

Let’s turn to talking about the complexity of strings from a slightly different point of view.

A Classic Problem

Let me introduce our complexity notion by looking at the problem of solving a linear system of equations,

$\displaystyle Ax = b \ \ \ \ \ (1)$

where ${A}$ is a ${n}$ by ${n}$ integer matrix and ${b}$ is an integer vector. This is, of course, a classic problem: find an ${x}$ so that (1) is true, if one exists.

For this classic problem much is known, yet some questions remain open:

1. The problem is in polynomial time—use Gaussian elimination, with some care. See my earlier post on the bits required for solving linear systems.
2. The problem is in ${\mathsf{NC}^{2}}$. That is a linear system can be solved with polynomial processors in ${O(\log^{2} n)}$ time.
3. The problem is also easily seen to be in ${\mathsf{DET}}$, the class of problems that are reducible to the determinant.
4. It is not known if the problem lies in lower complexity classes such as ${\mathsf{NC}^{1}}$.

How Do We Describe A String?

Instead of Kolmogorov’s general notion of the complexity of a string, we will use a particular notion based on circuits. Suppose that ${C}$ is a circuit and ${x=x_{1},\dots,x_{n}}$ is a string. Say that ${C}$ describes ${x}$ if for all ${i=1,\dots,n}$,

$\displaystyle C(i)=x_{i}.$

Let us use ${C \rightarrow x}$ to denote this. We will use the size of the circuit ${C}$ to measure how complex the string ${x}$ is. Denote the size of the smallest circuit ${C}$ that describes ${x}$ as ${{\cal K}_c(x)}$.

This is a natural generalization of the notion of sparse (we can also say succinct). Clearly, if a string is sparse, then there is a small circuit ${C}$ so that ${C \rightarrow x}$. More precisely,

Lemma: Suppose ${x}$ is a string of length ${n}$ with at most ${k}$ 1’s. Then, there is a circuit ${C}$ so that ${C \rightarrow x}$ and the size of ${C}$ is ${O(k \log n)}$. Thus,

$\displaystyle {\cal K}_c(x) = O(k \log n).$

For general circuits ${C}$, this notion was addressed by Eric Allender, Harry Buhrman, Michal Koucký, Dieter van Melkebeek, and Detlef Ronneburger in their paper Power from Random Strings, where they proved it polynomially equivalent to their modification of Leonid Levin’s variation of Kolmogorov’s original idea. See also this recent paper. However, we are interested when ${C}$ has additional properties, such as being a constant depth circuit of threshold gates.

Describing a Solution

Consider the linear system again,

$\displaystyle Ax = b \ \ \ \ \ (2)$

where ${A}$ is an ${n}$ by ${n}$ invertible integer matrix and ${b}$ is an integer vector. The twist is that now we limit ${A}$ and ${b}$ to have ${{\cal K}_c}$ description complexity of at most ${m}$.

The key open question is simple: is there a solution ${x}$ to the linear equation so

$\displaystyle {\cal K}_c(x) = \mathsf{poly}(m,\log n).$

This does not ask for an algorithm to find the solution ${x}$, but asks how complex it is to describe a solution.

By Kolmogorov’s original notion, there is one solution that has a particularly short description. It is the unique ${x_0}$ in the solution space that minimizes the Euclidean distance to ${0}$. The description of ${x_0}$ can take our size-${m}$ circuit describing ${A}$ and ${b}$, but then it appears we need to do iterative calculations to compute ${x_0}$ exactly or to a desired precision. The iterations are not so easy to describe—not by a constant-depth circuit. So can we describe this ${x_0}$, or some other solution ${x_1}$, another way?

The issue resembles that of computational depth. We have a very short description, but it is computationally substantial to “unpack,” hence deep. We want a description that is literally less deep to unpack. Where we are being “fair” is we only ask this for equations ${Ax = b}$ that are themselves not deep to specify. Those equations come from translating nondeterministic logspace Turing machines and their counting functions into linear algebra.

Describing an Approximate Solution

Consider the same linear system again. Given an ${\varepsilon>0}$ can we describe a solution ${x}$ so that

$\displaystyle {\cal K}_c(x) = \mathsf{poly}(m,\log n, \log 1/\varepsilon)$

where

$\displaystyle \lVert Ax - b \rVert \le \varepsilon.$

I can weaken this question even further, but let’s leave that for another time.

Open Problems

If one can show that the last question can be answered in the affirmative, with variously restricted circuits describing the solution, then Ken Regan and I believe that we can prove new complexity class separation theorems. More on this in the future.

[fixed some typos]

1. December 14, 2009 1:07 pm

The link to your previous post on the bit complexity of Gaussian elimination is broken.

December 14, 2009 1:37 pm

Thanks. I fixed it.

December 14, 2009 1:54 pm

There is a small typo. I believe $NC^2$ means “polynomial processors and $log^2(n)$ (parallel) time”.

3. December 14, 2009 9:33 pm

Right after making the post, we learned from a colleague that Ray Solomonoff passed away two weeks ago. He, Kolmogorov, Gregory Chaitin, Jorma Rissanen, and Leonid Levin all deserve to be mentioned in the same breath for their concepts of description complexity and algorithmic probability. Solomonoff’s Wikipedia page has a historical description, but more detail is at Scholarpedia’s article Algorithmic Probability. The Solomonoff-Levin distribution is also called the “Universal Prior”, and I wonder if using it in place of continuous distributions would help resolve the cosmological weirdness described here. Our condolences to his family.

Here is a CS-theory open problem roughly analogous to Hilbert’s 13th Problem. The analogy is made best by these sentences from the Wikipedia entry on Hilbert’s 13th, which is linked in the main post above: “Hilbert considered the general seventh-degree equation…and asked whether its solution x, a function of the three variables a, b and c, can be expressed using a finite number of two-variable functions. A more general question is: can every continuous function of three variables be expressed as a composition of finitely many continuous functions of two variables?”

The open problem is: Can every three-tape finite-state transduction be written as a composition of finitely many two-tape finite transductions?

To explain this a little, one-tape finite transductions are commonly encountered in CS-theory courses as Mealy machines or Moore machines that compute functions. To generalize to functions f(x,y,z), put x,y,z on separate read-only input tapes, provide a single write-only output tape, and stipulate that each tape’s head may either advance one cell or stay stationary in a given step. No head may move left. One may assume that x,y,z have endmarkers ^ and \$ around them. Such a machine can compute f(x,y,z) = xyz, but this can easily be written as g(g(x,y),z) where g(x,y) = xy. The shuffle function s3(x,y,z) = x_1 y_1 z_1 x_2 y_2 z_2… (padding with # characters if desired) looks harder, but can be managed most simply by allowing a binary shuffle s2(x,y) to have a larger output alphabet that combines x_1,y_1 into one character. Or not—we can shuffle x_1 y_1 x_2 y_2… with z by running the tape with z at half-speed. I included this last April in a departmental seminar I gave while visiting the University of Montreal on sabbatical, and it is open as far as anyone knew.

Finally, some of the typos were mine, while NC^2 escaped notice since we weren’t trying to give more than an informal definition. To be formal we should specify bounded fan-in Boolean circuits of polynomial size and O(log^2 n) depth—and also specify a uniformity condition. The strongest common one, called DLOGTIME uniformity (for paths not just connections), is applicable here. For machines with processors running in parallel, while the CRCW PRAM captures the AC^k classes precisely as CRCW O(log^k n) time with poly-many processors, the NC^k classes aren’t handled as neatly.

4. December 16, 2009 6:46 pm

“Thus, the string of length n 0101010101…01 has complexity log n + O(1)”

Is this right? Surely this is only true for complex values of n; for simple values of n the string would have lower complexity.