The Surprising Power of Rational Functions
Newman’s theorem on rational approximations and complexity theory
Donald Newman was not a theorist, but was a mathematician who worked on many topics during his career. One of his results is a lovely theorem that shows that the approximation of continuous functions by rational functions can be very different from the approximation by polynomials.
Today I want to talk about this famous theorem, and discuss some applications to theory. If you do not know this theorem you may enjoy seeing it. Even if you do know it, you may enjoy seeing some thoughts I have on a potential application of this theorem.
I never had the honor of meeting Newman, but I just read a nice tribute to him, while preparing this, which contained a number of interesting stories. One explained how he once introduced the great Paul Erdös to a seminar audience in a “novel” way—you will have to read it yourself. Another was on Newman’s performance on the Putnam mathematics competition—he was clearly a very fast and powerful problem solver. I quote:
“Ordinarily, six or seven solutions are sufficient for a student to win the competition. In his freshman year at CCNY, Newman won the Putnam with nine correct solutions. As a sophomore, he again won the competition with an unprecedented perfect 12 solutions, a fete (sic) he repeated in his junior year. He didn’t bother taking the exam as a senior.”
I recall taking the Putnam exam as a freshman—I did okay—nothing like Newman. I did worse as a sophomore, even worse as a junior, and finally was thrown off the team as a senior. For me, the more mathematics I learned the worse I got. Oh well.
I will not discuss whether or not the “world is digital or not.” I think the last post generated a lot of fun comments—many at my expense. I make mistakes and perhaps that post was one. If so I apologize to you all; I hope that the many comments, while most disagreed with me, were interesting to you. I somehow did not get my point across. So I will fold up my tent on that problem for a bit—no pun intended—and move on to Newman’s classic theorem. Wait, I now see how to explain my point stop, I better not go there today.
It has long been known that any continuous function on the interval , or any compact interval will do, can be approximated arbitrarily well by a polynomial. There are many proofs of this fundamental fact, many refinements, and many generalizations.
The function , the absolute value of , while continuous on the interval is not easy to approximate by a polynomial. To get to an error bound of roughly requires a polynomial of degree . The reason, at least the intuition, is that the function has a sharp corner: at zero its derivative is undefined. This behavior makes it hard for a low degree polynomial to “fit” the absolute value function.
What Newman did in 1962—the paper was published in 1964—is to prove that one can do exponentially better with rational functions instead of polynomials in approximating the absolute value function. I am not an expert in approximation theory, but I believe that this result came as a surprise. See Herbert Stahl’s paper, for instance. Another surprise in mathematics.
Let’s now turn to Newman’s Theorem. We need some notation first. Let and . Then,
Theorem: For all and ,
where is the rational function,
The theorem is remarkable in a number of respects: First, it shows that the degree of the approximating rational function can be much smaller than the degree of any approximating polynomial. Previously, to get an error bound of order a polynomial had to have degree order ; now the degrees of the numerator and denominator are only order . This is a huge decrease. Second, the rational function is not just constructive, but is expressible in a simple closed form. Very neat.
Both of these are wonderful for applications—as we will discuss in the next section.
I knew the Newman theorem back when I was a graduate student, yet I never could find an application of it to some theory problem. The theorem seemed to be perfect: great error bound and constructive. Yet I never found any “use” of his theorem.
It took a great insight from Richard Beigel, Nick Reingold, and Daniel Spielman to realize that Newman’s theorem could be used to solve an important open problem from complexity theory. They showed in a brilliant paper that the complexity class PP is closed under intersection and union. This is one of the great proofs in complexity theory. It is very sad that Reingold passed away about a year ago.
Later Mitsunori Ogihara used similar ideas to prove that the PL “hierarchy” collapses. There are now other uses of Newman’s theorem: for example, it is used in parts of learning theory. However, I will not attempt an exhaustive survey of its applications.
The PP result solved a well known open problem that had been raised as soon as PP was defined as a complexity class. Recall PP is the complexity class that contains sets that are accepted by a polynomial-time bounded probabilistic Turing machine that accepts with probability . It trivial that this class is closed under complements, but closure under intersection/union is not obvious at all.
The PP Proof
Here is a high level view of their proof. They need to detect the sign of an integer: the sign of is and the sign of is . The problem is that they need to do this with a low degree polynomial, which is not obvious how to do it. Newman’s theorem comes to the rescue.
Suppose that there is a rational function that approximates well, which of course there is by Newman’s theorem. Then, for not too near , must well approximate the sign of . The problem is that their Turing machine can “compute” polynomials, but not rational functions—they cannot divide in their model. So they use the following cool, but trivial, trick:
There is much more to their proof, but this is the key use of Newman’s theorem. They do need to modify his function to one that works over the integers in an exponential size range. Here are the polynomials they use; note, as is often the case in computer science not only degree but the size of coefficients is also important. In particular they use the following polynomials:
Some of the key properties are:
- The degree of is ;
- Each coefficient of is ;
- If are integers in , then ;
- If are integers in , then , , and .
I have several open problems that concern the behavior of matrices. I noticed that the following simple theorem can be proved using Newman’s theorem. This is likely to be known, but I will present the statement here with an outline of the proof.
Lemma: Suppose that for all in the interval , where and are polynomials. Suppose also that is an by matrix with all its eigenvalues in . Then,
Recall, is the sum of the diagonal entries of the matrix .
What is lemma says is: if an inequality holds for an interval that contains the eigenvalues of a matrix, then a related inequality holds for the traces of the matrices.
The proof in the case that is diagonalizable is fairly straightforward. Suppose that where is a diagonal matrix. Then, the key insight is that
The last equality follows from a basic property of the trace of a matrix.
The lemma then follows by applying the inequality on and to each entry of the diagonal matrix ; the bound is clearly times .
The proof in the case that is not diagonalizable follows by a trick that I believe is due to Richard Bellman. His insight is that given any there is a that is diagonalizable and is within of . Then, one lets tend to and concludes the result by continuity. This trick, for example, gives a very short proof of the fact that any matrix satisfies its own characteristic polynomial. First prove this for diagonalizable matrices, which is simple, and then use the Bellman trick.
Theorem: Suppose that is an by matrix with all its eigenvalues in the interval , and with . Let be the rational function from Newman’s theorem. Then,
where is the sum of the absolute values of the eigenvalues of .
The proof follows from the lemma and Newman’s theorem using and the absolute value function for .
An obvious open problem is to find other applications of Newman’s theorem. Can you use the matrix version? I had an idea to use the matrix version on a learning problem, but that has not worked out yet. So I will get back to that in the future.