Correcting an erratum in our quantum algorithms textbook

 Cropped from source

Paul Bachmann was the first person to use ${O}$-notation. This was on page 401 of volume 2 of his mammoth four-part text Analytic Number Theory, which was published in Germany in 1894. We are unsure, however, whether he defined it correctly.

Today we admit that we got something wrong about ${O}$-notation in an exercise in our recent textbook, and we ask: what is the best way to fix it?

Bachmann was simply plugging in Leonhard Euler’s estimate of the harmonic sum in

$\displaystyle \tau(n) = n\sum_{k=1}^n \frac{1}{k} - r,$

where ${r \leq n}$. He stated, “we find that

$\displaystyle \tau(n) = n\log n + O(n),$

if we use the expression ${O(n)}$ to represent a quantity whose order in regard to ${n}$ does not overstep the order of ${n}$; whether it really has components of order ${n}$ inside it is left undetermined in the above derivation.”

The big ${O}$ clearly stands for orderOrdnung in German—but where does he define Ordnung? According to the search at the University of Michigan Historical Math Collection page for Bachmann (click the third item, which is mis-labeled), the word Ordnung appears on eleven previous pages of volume 2 in several senses. However, the closest I find to a definition is a place on pages 355356 where he discusses a formula by Adrien-Marie Legendre:

If one says that a quantity ${A}$ is of order ${\frac{x}{\log^n x}}$ when for unboundedly growing ${x}$ the ratio of ${A}$ to ${\frac{x}{\log^m x}}$ for ${m > n}$ is unboundedly large, whereas for ${m < n}$ it is unboundedly small, then one can show that Legendre’s expression is not precise enough up to quantities of the order ${\frac{x}{\log^m x}}$ inclusive, rather that among all functions of the form ${\frac{x}{A\log x + B}}$ this is only the function ${\frac{x}{\log x - 1}}$.

I will not claim that my translation has enhanced the clarity but I don’t think it has diminished it much either—I think it is ${\Theta}$-of the actual clarity. Edmund Landau, who also introduced little-${o}$ and did much to rigorize and popularize both notations, only stated that he found the notation in Bachmann’s book, not the definition.

## Erratum Erratum

My dictionary defines erratum as an error in writing or printing, but Wikipedia defines it as a correction of such an error. The latter is clearly meant when one publishes an erratum, while an errata page means to indicate both the errors and the fixes. My subtitle “fixing an erratum…” may seem the former—a tony way of saying “fixing an error”—but it’s not. We need to fix the first erratum on our own errata page.

The original error appears in the exercises to Chapter 2 of our recent textbook, Quantum Algorithms via Linear Algebra. In the book the exercise reads,

Show that a function ${f: \mathbb{N} \rightarrow \mathbb{N}}$ is bounded by a polynomial in ${n}$, written ${n^{O(1)}}$, if and only if there is a constant ${C}$ such that for all sufficiently large ${n}$, ${f(2n) \leq Cf(n)}$.

When I wrote the exercise I had in mind this proof of the ${\Longleftarrow}$ direction: Pretend ${f}$ is a real function with that property and consider the integral of ${f(x)}$ from ${x = 0}$ to ${n}$. We can rewrite this as the integral from ${0}$ to ${\frac{n}{2}}$ of ${f(2n)}$. The assumption ${f(2n) \leq Cf(n)}$ bounds the latter by ${C}$ times the integral of ${f(n)}$ from ${0}$ to ${\frac{n}{2}}$. Iterate this ${k}$ times in all until ${\frac{n}{2^k}}$ is less than the fixed and finite value ${n_0}$ underlying the phrase “sufficiently large ${n}$.” This gives the following bound on the integral:

$\displaystyle C^{\log_2 n}V = Vn^{\log_2 C} = n^{O(1)},$

where ${V}$ is the maximum value of ${f(x)}$ on ${[0,n_0]}$. Jumping back to ${f}$ being originally defined on integers avoids any potential nastiness about ${V}$, and the bound on the integral clearly applies to ${f(n)}$. I figured the ${\Longrightarrow}$ direction was immediate since ${(2n)^c = Cn^c}$ with ${C = 2^c}$.

I had intended that time functions ${f(n)}$ are monotone, that is ${m \leq n \implies f(m) \leq f(n)}$, as holds automatically if ${f(n)}$ means the worst-case running time on inputs of length at most ${n}$. I received one e-mail showing this false when ${f}$ could decrease, and thought adding the word “monotone” fixed the problem. However it does not.

## The Counterexample

A clever counterexample was sent to us last weekend by Marcelo Arenas and Pedro Bahamondes Walters of the Pontifical Catholic University of Chile in Santiago. They defined a function that stays bounded by ${n^2}$ but alternates between behaving linearly and suddenly jumping up to meet the parabola again:

$\displaystyle f(n) = \begin{cases} 0 & \text{~if~} n = 0;\\ n^2 & \text{~if~} (\exists i \in \mathbb{N})\; n = 2^{2^i};\\ f(n-1) + 1 & \text{~otherwise.} \end{cases}$

We can convey in words the essence of the rigorous proof they provided, technically defining a slightly different function ${f}$: Consider any point ${(x,x^2)}$ on the parabola. Make a line go northeast at 45${\circ}$ through the points ${(x+i, x^2 + i)}$ for ${i}$ up to some ${k}$. The line ends at the point ${P = (x+k, x^2 + k)}$; put ${n = x+k}$. Now continue the line on a steeper slope back up to meet the parabola at the point

$\displaystyle Q = (2n,(2n)^2) = (2(x+k), 4(x+k)^2).$

The slope of the segment from ${P}$ to ${Q}$ is the ratio ${f(2n)/f(n)}$, and it is

$\displaystyle \frac{4(x+k)^2 - (x^2 + k)}{(x + k)} = \frac{3x^2 + 8xk + 4k^2 - k}{x + k} > 3x + 3k.$

This shows that the ratio ${f(2n)/f(n)}$ is nowhere near staying bounded by a constant ${C}$, either as ${x}$ or ${k}$ is made as large as desired, even though ${f(n) \leq n^2}$. So the ${\implies}$ direction is wrong.

## How to Fix and Keep the Motivation?

The motivation that we want to convey is that a polynomial bound is really a linear bound as the size of the data doubles. Dick recalls Bob Sedgewick emphasizing this point in his teaching. An old post discussed Bob’s idea of determining running times ${f(n)}$ of real polynomial algorithms empirically by sampling cases ${f(y)/f(x)}$ where ${|y| = 2|x|}$ and fitting ${an^b}$.

We could try to fix things by using ${\Theta}$ in place of ${O}$. The bad function ${f}$ above is excluded because it is not ${\Theta(n^c)}$ for any ${c}$. Now the forward direction holds because:

$\displaystyle \begin{array}{rcl} f(n) = \Theta(n^c) &\implies& (\exists n_0,a,b)(\forall n \geq n_0)\; an^c \leq f(n) \leq bn^c\\ &\implies& (a2^c)n^c \leq f(2n) \leq (b2^c)n^c\\ &\implies& (\frac{a2^c}{b})f(n) \leq f(2n) \leq (\frac{b2^c}{a})f(n)\\ &\implies& f(2n) = \Theta(f(n)). \end{array}$

However, now we have lost the converse direction, because ${f(2n) = \Theta(f(n))}$ still allows ${f(n)}$ to wobble enough that it is not Theta-of any one polynomial. We could try to demand that ${f(2n)}$ be asymptotic to ${Cf(n)}$ for some fixed constant ${C}$ (that is, so that ${f(2n)/f(n) \rightarrow C}$ as ${n \rightarrow \infty}$), but then we lose the forward direction again.

Besides, once we try to be more specific about the asymptotics, we lose the simple motivation a-la Sedgewick. What is the simplest way to preserve it?

## Restrictions

Dick recalled a post by Terence Tao on Mikhail Gromov’s theorem characterizing finitely-generated groups of polynomial growth as those having nilpotent subgroups of finite index. The growth is of neighborhoods ${B_S(n)}$ of radius ${n}$ in the Cayley graphs of the groups with generators ${S}$. Tao describes a new proof of Gromov’s theorem by Bruce Kleiner, and for simplicity uses the stronger condition ${|B(2r)| \leq C|B(r)|}$ for some constant ${C}$ and all ${r \geq 0}$. He writes (emphasis his):

In general, polynomial growth does not obviously imply bounded doubling at all scales, but there is a simple pigeonhole argument that gives bounded doubling on most scales, and this turns out to be enough to run the argument below. But in order not to deal with the (minor) technicalities arising from exceptional scales in which bounded doubling fails, I will assume bounded doubling at all scales.

Unfortunately this insight is specific to the geometry of the Cayley graphs and does not carry over simply to growth rates—functions of the form ${f}$ above can be made to violate bounded doubling for many long intervals. We wonder instead about restricting ${f(n)}$ to some “natural” class of time functions. Donald Knuth’s seminal 1976 paper on asymptotic notation reminds us of this theorem by Godfrey Hardy:

Theorem 1 If ${f(n)}$ and ${g(n)}$ are any functions built up recursively from the ordinary arithmetic operations and the exp and log functions, we have exactly one of the three relations ${f(n) = o(g(n))}$, ${f(n) \sim Cg(n)}$ for some ${C > 0}$, or ${f(n) = \omega(g(n))}$

The equivalence in our exercise seems to hold for Hardy’s class of functions—at least the mechanism of the counterexample is excluded. Is the forward direction now easy to prove?

Even so, Hardy’s class might be felt too restrictive to stand for running times of all natural algorithms. For one reason, it excludes the ${\log^*(n)}$ function, which figures in the analysis of algorithms for numerous problems mentioned here. For another, we can readily imagine that natural algorithms could exhibit the kind of stepwise behavior of the counterexample ${f}$ above, when exceptional inputs requiring extra attention are sparsely distributed.

## Open Problems

Is there a simple and natural extra condition that makes bounded doubling equivalent to polynomial growth?

More generally, how does one express something that is “morally true” but technically false?

[missing constant C before Tao quote, earlier f “could decrease”, Marcelo Arena -> Arenas]

1. September 12, 2015 12:56 pm

Corrigendum …

2. September 12, 2015 1:19 pm

Missing the C in |B(2r)| < |B(r)| in the beginning of the section titled "Restrictions"

• September 12, 2015 1:51 pm

Ah, thanks!

September 12, 2015 2:30 pm

“I received one e-mail showing this false when {f} has zero or negative values”: isn’t the negative “bad case” discarded in the statement of the exercise, since $f$ takes non-negative values?

4. September 12, 2015 11:38 pm

Just post this on http://www.mathoverflow.net

5. September 14, 2015 2:37 am

How about assuming the function is everywhere convex or concave?
Tough I guess “staircase like” functions are not too uncommon, for example $l \lfloor 2^n \rfloor$.