The Surprising Power of Rational Functions

2009 October 7


Newman’s theorem on rational approximations and complexity theory

images

Donald Newman was not a theorist, but was a mathematician who worked on many topics during his career. One of his results is a lovely theorem that shows that the approximation of continuous functions by rational functions can be very different from the approximation by polynomials.

Today I want to talk about this famous theorem, and discuss some applications to theory. If you do not know this theorem you may enjoy seeing it. Even if you do know it, you may enjoy seeing some thoughts I have on a potential application of this theorem.

I never had the honor of meeting Newman, but I just read a nice tribute to him, while preparing this, which contained a number of interesting stories. One explained how he once introduced the great Paul Erdös to a seminar audience in a “novel” way—you will have to read it yourself. Another was on Newman’s performance on the Putnam mathematics competition—he was clearly a very fast and powerful problem solver. I quote:

“Ordinarily, six or seven solutions are sufficient for a student to win the competition. In his freshman year at CCNY, Newman won the Putnam with nine correct solutions. As a sophomore, he again won the competition with an unprecedented perfect 12 solutions, a fete (sic) he repeated in his junior year. He didn’t bother taking the exam as a senior.”

I recall taking the Putnam exam as a freshman—I did okay—nothing like Newman. I did worse as a sophomore, even worse as a junior, and finally was thrown off the team as a senior. For me, the more mathematics I learned the worse I got. Oh well.

I will not discuss whether or not the “world is digital or not.” I think the last post generated a lot of fun comments—many at my expense. I make mistakes and perhaps that post was one. If so I apologize to you all; I hope that the many comments, while most disagreed with me, were interesting to you. I somehow did not get my point across. So I will fold up my tent on that problem for a bit—no pun intended—and move on to Newman’s classic theorem. Wait, I now see how to explain my point {\dots} stop, I better not go there today.

Newman’s Theorem

It has long been known that any continuous function on the interval {[-1,1]}, or any compact interval will do, can be approximated arbitrarily well by a polynomial. There are many proofs of this fundamental fact, many refinements, and many generalizations.

The function {|x|}, the absolute value of {x}, while continuous on the interval {[-1,1]} is not easy to approximate by a polynomial. To get to an error bound of roughly {1/t} requires a polynomial of degree {\Omega(t)}. The reason, at least the intuition, is that the function {|x|} has a sharp corner: at zero its derivative is undefined. This behavior makes it hard for a low degree polynomial to “fit” the absolute value function.

What Newman did in 1962—the paper was published in 1964—is to prove that one can do exponentially better with rational functions instead of polynomials in approximating the absolute value function. I am not an expert in approximation theory, but I believe that this result came as a surprise. See Herbert Stahl’s paper, for instance. Another surprise in mathematics.

Let’s now turn to Newman’s Theorem. We need some notation first. Let {a = \exp(-1/{\sqrt n})} and {p(x) = \Pi_{k=0}^{n-1}(a^{k}+x)}. Then,

Theorem: For all {x \in [-1,1]} and {n \ge 5},

\displaystyle  \mid |x| - r_{n}(x) \mid \le 3\exp(-{\sqrt n})

where {r_{n}(x)} is the rational function,

\displaystyle  x \frac{p(x)-p(-x)}{p(x)+p(-x)}.

The theorem is remarkable in a number of respects: First, it shows that the degree of the approximating rational function can be much smaller than the degree of any approximating polynomial. Previously, to get an error bound of order {1/t} a polynomial had to have degree order {t}; now the degrees of the numerator and denominator are only order {\log^{2} t}. This is a huge decrease. Second, the rational function is not just constructive, but is expressible in a simple closed form. Very neat.

Both of these are wonderful for applications—as we will discuss in the next section.

Applications

I knew the Newman theorem back when I was a graduate student, yet I never could find an application of it to some theory problem. The theorem seemed to be perfect: great error bound and constructive. Yet I never found any “use” of his theorem.

It took a great insight from Richard Beigel, Nick Reingold, and Daniel Spielman to realize that Newman’s theorem could be used to solve an important open problem from complexity theory. They showed in a brilliant paper that the complexity class PP is closed under intersection and union. This is one of the great proofs in complexity theory. It is very sad that Reingold passed away about a year ago.

Later Mitsunori Ogihara used similar ideas to prove that the PL “hierarchy” collapses. There are now other uses of Newman’s theorem: for example, it is used in parts of learning theory. However, I will not attempt an exhaustive survey of its applications.

The PP result solved a well known open problem that had been raised as soon as PP was defined as a complexity class. Recall PP is the complexity class that contains sets that are accepted by a polynomial-time bounded probabilistic Turing machine that accepts with probability {>0.5}. It trivial that this class is closed under complements, but closure under intersection/union is not obvious at all.

The PP Proof

Here is a high level view of their proof. They need to detect the sign of an integer: the sign of {-11} is {-1} and the sign of {123} is {1}. The problem is that they need to do this with a low degree polynomial, which is not obvious how to do it. Newman’s theorem comes to the rescue.

Suppose that there is a rational function {r(x)} that approximates {|x|} well, which of course there is by Newman’s theorem. Then, for {x} not too near {0}, {r(x)/x} must well approximate the sign of {x}. The problem is that their Turing machine can “compute” polynomials, but not rational functions—they cannot divide in their model. So they use the following cool, but trivial, trick:

\displaystyle  a/b > 0 \text{ if and only if } ab > 0.

There is much more to their proof, but this is the key use of Newman’s theorem. They do need to modify his function to one that works over the integers in an exponential size range. Here are the polynomials they use; note, as is often the case in computer science not only degree but the size of coefficients is also important. In particular they use the following polynomials:

\displaystyle  \begin{array}{rcl}  	P_{n}(x) &=& (x-1)\displaystyle{\prod_{i=1}^{n}(x-2^{i})^{2}}, \\ 	S_{n}(x) &=& \frac{P_{n}(-x) - P_{n}(x)}{P_{n}(-x) + P_{n}(x)}, \\ 	A_{n}(x,y) &=& S_{n}(x) + S_{n}(y) -1. \end{array}

Some of the key properties are:

  1. The degree of {A_{n}(x,y)} is {O(n)};
  2. Each coefficient of {A_{n}(x,y)} is {2^{O(n^{2})}};
  3. If {x,y} are integers in {[1,2^{n}]}, then {A_{n}(x,y) > 0};
  4. If {x,y} are integers in {[1,2^{n}]}, then {A_{n}(-x,y) < 0}, {A_{n}(x,-y) < 0}, and {A_{n}(-x,-y) < 0}.

Matrices

I have several open problems that concern the behavior of matrices. I noticed that the following simple theorem can be proved using Newman’s theorem. This is likely to be known, but I will present the statement here with an outline of the proof.

Lemma: Suppose that {\mid f(x) -g(x) \mid \le \delta } for all {x} in the interval {[-1,1]}, where {f(x)} and {g(x)} are polynomials. Suppose also that {A} is an {n} by {n} matrix with all its eigenvalues in {[-1,1]}. Then,

\displaystyle  \mid \mathsf{trace}(f(A)) - \mathsf{trace}(g(A)) \mid \le n\delta.

Recall, {\mathsf{trace}(A)} is the sum of the diagonal entries of the matrix {A}.

What is lemma says is: if an inequality holds for an interval that contains the eigenvalues of a matrix, then a related inequality holds for the traces of the matrices.

The proof in the case that {A} is diagonalizable is fairly straightforward. Suppose that {A = S^{-1}DS} where {D} is a diagonal matrix. Then, the key insight is that

\displaystyle  \begin{array}{rcl}  	\mathsf{trace}(f(A)) &=& \mathsf{trace}(f(S^{-1}DS)) \\ 							&=& \mathsf{trace}(S^{-1}f(D)S) \\ 							&=& \mathsf{trace}(f(D)). \end{array}

The last equality follows from a basic property of the trace of a matrix.

\displaystyle  \mathsf{trace}(f(A)) - \mathsf{trace}(g(A)) = \mathsf{trace}(f(D)) - \mathsf{trace}(g(D)).

The lemma then follows by applying the inequality on {f(x)} and {g(x)} to each entry of the diagonal matrix {D}; the bound is clearly {n} times {\delta}.

The proof in the case that {A} is not diagonalizable follows by a trick that I believe is due to Richard Bellman. His insight is that given any {A} there is a {A^{(k)}} that is diagonalizable and is within {1/k} of {A}. Then, one lets {k} tend to {\infty} and concludes the result by continuity. This trick, for example, gives a very short proof of the fact that any matrix satisfies its own characteristic polynomial. First prove this for diagonalizable matrices, which is simple, and then use the Bellman trick.

Theorem: Suppose that {A} is an {n} by {n} matrix with all its eigenvalues in the interval {[-1,1]}, and with {n \ge 5}. Let {r_{n}(x)} be the rational function from Newman’s theorem. Then,

\displaystyle  \mid \mathsf{trace}(r_{n}(A)) - S \mid \le 3n\exp(-{\sqrt n})

where {S} is the sum of the absolute values of the eigenvalues of {A}.

The proof follows from the lemma and Newman’s theorem using {f = r_n} and the absolute value function for {g}.

Open Problems

An obvious open problem is to find other applications of Newman’s theorem. Can you use the matrix version? I had an idea to use the matrix version on a learning problem, but that has not worked out yet. So I will get back to that in the future.

10 Responses leave one →
  1. 2009 October 8
    Ryan O'Donnell permalink

    On this topic, I can never resist pointing out the following:

    The result Beigel-Reingold-Spielman needed for PP, namely a good rational approximation to the function sgn(x) on the range [-1, -epsilon] union [epsilon, 1] was solved *optimally* by Yegor Zolotarev in 1877 (!!). By optimally, I mean for each degree d and each epsilon, he explicitly gave *the* best degree-d rational function for approximating sgn(x) on [-1, -epsilon] union [epsilon, 1], in L_infinity distance! (Sadly, he was run over by a train the next year at age 31.)

    I guess people weren’t much for studying asymptotics back then, so Zolotarev didn’t work them out. However Naum Akhiezer worked them out in 1929 (“On a problem of E. I. Zolotarev”). In particular, as Dick mentions it’s easy to deduce results for approximating |x| from results on approximating sgn(x), and indeed Akhiezer’s paper explicitly includes Newman’s Theorem, that the best degree-d approximation to |x| on [-1,1] has error exp(-Theta(sqrt(d))).

    Newman was apparently unaware of Akhiezer’s paper.

  2. 2009 October 8
    Paul Beame permalink

    The first time I saw an application of this approximation was in the dissertation work of Jim Hoover on feasible computation over the reals (journal version in [SICOMP vol 19, No. 1]) which predated BRS.

    Ko and Freedman had defined a notion of feasible real computation by modifying Turing’s definition of computable real number as given by a TM that on input n spits out a rational approximation of the number within 2-n. Function computation is given by an oracle TM that takes a TM giving a real-number and produces an n-bit approximation. Ko and Freedman said that for feasibility the time to spit out an n-bit approximation had to be polynomial in n. They had some neat results such as the fact that there are very spiky feasible real functions whose integral computation is #P-hard to compute.

    A problem with this definition is that the model doesn’t conform to what numerical analysts do for real computation. They just add, subtract, multiply, and divide (rarely) their inputs and test for approximate equality (never exact equality). Part of this was the motivation for Blum, Shub, and Smale to produce their real computation model. However, a major unrealistic aspect of the BSS model is that they included exact equality tests, something that numerical analysts never do. (In fact, in Turing’s model, tests for equality are uncomputable and a similar prohibition holds for the Ko and Freedman feasible version.)

    Independent of BSS and around the same time, Jim Hoover came up with a model for real computation that really seems to capture a lot of how numerical analysts compute. The notion is that of a uniform polynomial-size arithmetic circuit family (+,-,*,/), one for each interval of the form [-2n,2n] that has one other restriction, namely that the value of any internal gate is such a circuit can be at most exp(nO(1)). It correctly computes f if the n-th circuit computes an n-bit approximation to f on the interval. He called these feasible-size-magnitude arithmetic circuits. (Note that these circuits might compute rational functions of exponential degree.)

    Surprisingly, this turned out to be completely equivalent to the Ko-Freedman definition. Moreover, one could remove all division operations without loss of generality despite the potential for exponential degree – a case not covered by Strassen’s result. A key to the proof of this equivalence is the surprising quality of the rational approximation of absolute value which allows a feasible-size-magnitude arithmetic circuit to do a good job at approximating the extraction of bits from its input.

    • 2009 October 8
      Andrew Hunter permalink

      Prof. Beame: that’s a very interesting model! I should read that paper. However, that circuit model doesn’t in fact consider approximate equality tests (which numerical people do use, as you said…) Is there an obvious argument that allowing single-step equality up to any fixed constant (say, a rational constant with polynomial bits?) doesn’t add power?

      • 2009 October 9
        Paul Beame permalink

        Is there an obvious argument that allowing single-step equality up to any fixed constant (say, a rational constant with polynomial bits?) doesn’t add power?

        It is a corollary of the equivalence with the Ko-Friedman model. Beyond that I am not sure.

  3. 2009 October 8
    Paul Beame permalink

    WordPress seems to have stripped out my “sup” tags so my exponents don’t show up as such. They should be [-2^n,2^n] and exp(n^O(1)).

  4. 2009 October 8
    rjlipton permalink

    There is a great quote that cannot find to the effect:

    The credit goes to the person that discovers something the LAST.

    I think that is what is going on here…dick

  5. 2009 October 9

    This reminds me a lot of the problem presented in the Factoring and Factorials post. I looked into the approximation I made more, and it looked like it needed n/logn terms of the Taylor series to get to n bits of precision, which is too many operations to factor in polynomial time. It turns out you can get the Taylor series by exponentiating the Stirling series, but it’s not much use if it takes n/logn terms.

    However, if there’s a way to get n bits of precision with only (logn)^c terms in the numerator and denominator of a rational function, that’d solve the problem, and factoring could be done in polynomial time. I wonder how one would go about looking for such a formulation…

    • 2009 October 9
      rjlipton permalink

      Interesting connection. I have no idea if will work, but like the idea.

  6. 2009 October 9
    none permalink

    You probably know Scott Aaronson found a very cute proof of the Beigel-Reingold-Spielman theorem using quantum postselection: http://scottaaronson.com/papers/pp.pdf

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS