How Joe Traub Beat the Street
An insight into the computation of financial information
Columbia memorial source
Joseph Traub passed away just a week ago, on August 24th. He is best known for his computer science leadership positions at CMU, Columbia, CSTB, the Journal of Complexity—they all start with “C.” CSTB is the Computer Science and Telecommunications Board of the National Academies of Science, Engineering, and Medicine. At each of these he was the head and for all except Carnegie-Mellon he was the first head—the founder.
Today Ken and I wish to highlight one technical result by Traub and his co-workers that you may not know about.
I knew Joe for many years, and he will be missed. He had a sharp mind and corresponding sharp wit. There was a slight twinkle in his eye that would have called to mind a leprechaun had he been Irish rather than German—from a family that left after the Nazis seized his father’s bank in 1938 when he was six. He was a delight to interact with on almost any topic.
Joe created an entire area of theory: information-based complexity theory. To quote its website:
Information-based complexity (IBC) is the branch of computational complexity that studies problems for which the information is partial, contaminated, and priced. [Functions over the unit cube] must be replaced by … finite sets (by, for example, evaluating the functions at a finite number of points). Therefore, we have only partial information about the functions. Furthermore, the function values may be contaminated by round-off error. Finally, evaluating the functions can be expensive, and so computing these values has a price.
Traub was helped greatly by a brilliant colleague, Henryk Woźniakowski, who developed key ideas alongside him in the 1970s that were distilled in their 1980 monograph, A General Theory of Optimal Algorithms. A shorter source is Joe’s seminal paper, “An Introduction to Information-Based Complexity.” A statement on Woźniakowski’s Columbia webpage shows his sustaining interest in IBC:
I am mostly interested in computational complexity of continuous problems. For most continuous problems, we have only partial information. For problems defined on spaces of functions, this partial information is usually given by a finite number of function values at some points which we can choose. One of the central issues is to determine how many pieces of information, or function values, are needed to solve the computational problem to within a prescribed accuracy.
IBC Under Attack
I must admit that years ago I opened my Bulletin of the AMS, in 1992 to be exact, and was shocked to see an article titled: “Some Basic Information On Information-Based Complexity Theory” written by the eminent numerical analyst Beresford Parlett. The abstract got right to the point:
…Why then do most numerical analysts turn a cold shoulder to IBCT? Close analysis of two representative papers reveals a mixture of nice new observations, error bounds repackaged in new language, misdirected examples, and misleading theorems.
What is interesting about this paper is that the attack on IBCT=IBC is on the idiosyncrasies of its framework. The claim is not that there are errors—technical errors—in the IBC papers, but rather that the model, the framework, is uninteresting and misleading. You should take a look at the very readable paper of Parlett to decide for yourself.
Traub of course disagreed and had a chance to fire back an answer. I believe that Parlett’s attack actually had little effect on IBC, except to make it better known in the mathematical community.
What was the tiff about? We can try to boil it down even more than Parlett’s article does.
Integrals and Algorithms
Integrals are fundamental to most of mathematics. And in many practical cases the integrals cannot be exactly computed. This leads to a huge area of research on how to approximate an integral within some error. Even simple ones like
can be challenging for nasty functions . Worse still are multi-dimensional integrals
where is a -dimensional region. Speaking very roughly, let’s say and what matters to is whether components are “low” (toward ) or “high” (toward ) in each dimension. If the gradient of in each dimension behaves differently for each setting of other components being “low” or “high,” then we might need samples just to get a basic outline of the function’s behavior. The IBC website fingers this exponential issue as the “curse of dimensionality.”
Now as ordinary complexity theorists, our first instinct would be to define properties intrinsic to the function and try to prove they cause high complexity for any algorithm. Making a continuous analogy to concepts in discrete Boolean complexity, drawing on papers like this by Noam Nisan and Mario Szegedy, we would try to tailor an effective measure of “sensitivity.” We would talk about functions that resemble the -ary parity function in respect of sensitivity but don’t have a simple known integral. Notions of being “isotropic” could cut both ways—they could make the sensitivity pervasive but could enable a good global estimate of the integral.
IBC, however, focuses on properties of algorithms and restrictions on the kind of inputs they are given. Parlett’s general objection is that doing so begs the question of a proper complexity theory and reverts to the standard—and hallowed enough—domain of ordinary numerical analysis of algorithms.
Ken and I find ourselves in the middle. Pure complexity theory as described by Parlett has run into barriers even more since his article. We wrote a pair of posts on bounds against algorithms that are “progressive.” The IBC website talks about settings for algorithms. This seems to us a fair compromise notion but it could use more definition.
One classic setting for approximating these high-dimensional integrals is the Monte-Carlo method. The common feature of this setting is that one samples points from and takes the average of the values of . This method yields a fair approximation, but the convergence is slow. For example, the dimension is : the number of months in a 30 year collateralized mortgage obligation (CMO). To get error requires evaluations of the function—clearly a completely hopeless task. Financial firms on Wall Street that needed to estimate fair market values for CMOs were using Monte Carlo methods as best they could.
IBC Strikes Back
Enter Traub with his Columbia IBC research group in the early 1990s. They—in particular his student Spassimir Paskov in late 1993 and early 1994—discovered and verified that a non-random method would beat the Monte-Carlo by many orders of magnitude. This was based on an old idea that uses low-discrepancy sequences. These are sequences of points inside a region that have good coverage of the space. For the unit cube as above, one may specify a class of natural measurable subsets of ; then low discrepancy means that for every the proportion of points in that belong to is close to the measure of —and moreover this happens for every long enough consecutive subsequence of .
In the worst case the resulting Quasi-Monte Carlo (QMC) methods would perform much poorer than Monte-Carlo methods. However, the surprise was that for actual financial calculations they worked well—really well.
As with any change in basic methodology, the work and claims of Traub’s group were met with great initial skepticism by the financial firms. However, the performance was and remains so good that many financial calculations still use QMC. This raised a further beautiful problem that loops back to the initial mission of IBC:
Why do the discrepancy-based methods perform so well?
That is, what properties of the particular high-dimensional integrals that occurred with CMOs and other applications that made non-random methods work so well?
There are some ideas of why this is true in the actual financial calculations, but the basic question remains open. Wikipedia’s IBC article picks up the story in a way that connects to Parlett’s point though neither he nor his article is mentioned:
These results are empirical; where does computational complexity come in? QMC is not a panacea for all high-dimensional integrals. What is special about financial derivatives?
Here’s a possible explanation. The 360 dimensions in the CMO represent monthly future times. Due to the discounted value of money variables representing times for in the future are less important than the variables representing nearby times. Thus the integrals are non-isotropic.
Working with Ian Sloan, Woźniakowski introduced a notion of weighted spaces that leverages the above observation. Their 1998 paper laid down conditions on the integrals and showed that they are rendered tractable in the QMC setting, often in worst case where Monte Carlo gives average case at best. How far this positive result can be extended—for which other classes of integrals does QMC beat MC—is a concrete form of the wonderful problem.
Our condolences to the Traub family and to our colleagues at Columbia and Santa Fe. The Santa Fe Institute’s memorial includes a nice statement by our friend Joshua Grochow relating to workshops led by Joe and a joint paper on limits to knowledge in science.