An insight into the computation of financial information

 Columbia memorial source

Joseph Traub passed away just a week ago, on August 24th. He is best known for his computer science leadership positions at CMU, Columbia, CSTB, the Journal of Complexity—they all start with “C.” CSTB is the Computer Science and Telecommunications Board of the National Academies of Science, Engineering, and Medicine. At each of these he was the head and for all except Carnegie-Mellon he was the first head—the founder.

Today Ken and I wish to highlight one technical result by Traub and his co-workers that you may not know about.

I knew Joe for many years, and he will be missed. He had a sharp mind and corresponding sharp wit. There was a slight twinkle in his eye that would have called to mind a leprechaun had he been Irish rather than German—from a family that left after the Nazis seized his father’s bank in 1938 when he was six. He was a delight to interact with on almost any topic.

Joe created an entire area of theory: information-based complexity theory. To quote its website:

Information-based complexity (IBC) is the branch of computational complexity that studies problems for which the information is partial, contaminated, and priced. [Functions over the unit cube] must be replaced by … finite sets (by, for example, evaluating the functions at a finite number of points). Therefore, we have only partial information about the functions. Furthermore, the function values may be contaminated by round-off error. Finally, evaluating the functions can be expensive, and so computing these values has a price.

Traub was helped greatly by a brilliant colleague, Henryk Woźniakowski, who developed key ideas alongside him in the 1970s that were distilled in their 1980 monograph, A General Theory of Optimal Algorithms. A shorter source is Joe’s seminal paper, “An Introduction to Information-Based Complexity.” A statement on Woźniakowski’s Columbia webpage shows his sustaining interest in IBC:

I am mostly interested in computational complexity of continuous problems. For most continuous problems, we have only partial information. For problems defined on spaces of functions, this partial information is usually given by a finite number of function values at some points which we can choose. One of the central issues is to determine how many pieces of information, or function values, are needed to solve the computational problem to within a prescribed accuracy.

IBC Under Attack

I must admit that years ago I opened my Bulletin of the AMS, in 1992 to be exact, and was shocked to see an article titled: “Some Basic Information On Information-Based Complexity Theory” written by the eminent numerical analyst Beresford Parlett. The abstract got right to the point:

…Why then do most numerical analysts turn a cold shoulder to IBCT? Close analysis of two representative papers reveals a mixture of nice new observations, error bounds repackaged in new language, misdirected examples, and misleading theorems.

What is interesting about this paper is that the attack on IBCT=IBC is on the idiosyncrasies of its framework. The claim is not that there are errors—technical errors—in the IBC papers, but rather that the model, the framework, is uninteresting and misleading. You should take a look at the very readable paper of Parlett to decide for yourself.

Traub of course disagreed and had a chance to fire back an answer. I believe that Parlett’s attack actually had little effect on IBC, except to make it better known in the mathematical community.

What was the tiff about? We can try to boil it down even more than Parlett’s article does.

Integrals and Algorithms

Integrals are fundamental to most of mathematics. And in many practical cases the integrals cannot be exactly computed. This leads to a huge area of research on how to approximate an integral within some ${\epsilon>0}$ error. Even simple ones like

$\displaystyle \int_{a}^{b} f(x) dx$

can be challenging for nasty functions ${f(x)}$. Worse still are multi-dimensional integrals

$\displaystyle \int \int \cdots \int_{S} f(x_{1},\dots,x_{d}) dx_{1}dx_{2}\dots dx_{d},$

where ${S}$ is a ${d}$-dimensional region. Speaking very roughly, let’s say ${S = \{0,1\}^d}$ and what matters to ${f(x)}$ is whether components ${x_i}$ are “low” (toward ${0}$) or “high” (toward ${1}$) in each dimension. If the gradient of ${f}$ in each dimension ${i}$ behaves differently for each setting of other components being “low” or “high,” then we might need ${2^d}$ samples just to get a basic outline of the function’s behavior. The IBC website fingers this exponential issue as the “curse of dimensionality.”

Now as ordinary complexity theorists, our first instinct would be to define properties intrinsic to the function ${f}$ and try to prove they cause high complexity for any algorithm. Making a continuous analogy to concepts in discrete Boolean complexity, drawing on papers like this by Noam Nisan and Mario Szegedy, we would try to tailor an effective measure of “sensitivity.” We would talk about functions ${f}$ that resemble the ${n}$-ary parity function in respect of sensitivity but don’t have a simple known integral. Notions of ${f}$ being “isotropic” could cut both ways—they could make the sensitivity pervasive but could enable a good global estimate of the integral.

IBC, however, focuses on properties of algorithms and restrictions on the kind of inputs they are given. Parlett’s general objection is that doing so begs the question of a proper complexity theory and reverts to the standard—and hallowed enough—domain of ordinary numerical analysis of algorithms.

Ken and I find ourselves in the middle. Pure complexity theory as described by Parlett has run into barriers even more since his article. We wrote a pair of posts on bounds against algorithms that are “progressive.” The IBC website talks about settings for algorithms. This seems to us a fair compromise notion but it could use more definition.

One classic setting for approximating these high-dimensional integrals is the Monte-Carlo method. The common feature of this setting is that one samples points ${x=x_{1}\dots x_{d}}$ from ${S}$ and takes the average of the values of ${f(x)}$. This method yields a fair approximation, but the convergence is slow. For example, the dimension ${d}$ is ${360 = 30 \times 12}$: the number of months in a 30 year collateralized mortgage obligation (CMO). To get error ${\epsilon=10^{-2}}$ requires ${10^{720}}$ evaluations of the function—clearly a completely hopeless task. Financial firms on Wall Street that needed to estimate fair market values for CMOs were using Monte Carlo methods as best they could.

IBC Strikes Back

Enter Traub with his Columbia IBC research group in the early 1990s. They—in particular his student Spassimir Paskov in late 1993 and early 1994—discovered and verified that a non-random method would beat the Monte-Carlo by many orders of magnitude. This was based on an old idea that uses low-discrepancy sequences. These are sequences ${X}$ of points inside a region ${S}$ that have good coverage of the space. For the unit cube ${S}$ as above, one may specify a class ${{\cal B}}$ of natural measurable subsets of ${S}$; then low discrepancy means that for every ${B \in {\cal B}}$ the proportion of points in ${X}$ that belong to ${B}$ is close to the measure of ${B}$—and moreover this happens for every long enough consecutive subsequence of ${X}$.

In the worst case the resulting Quasi-Monte Carlo (QMC) methods would perform much poorer than Monte-Carlo methods. However, the surprise was that for actual financial calculations they worked well—really well.

As with any change in basic methodology, the work and claims of Traub’s group were met with great initial skepticism by the financial firms. However, the performance was and remains so good that many financial calculations still use QMC. This raised a further beautiful problem that loops back to the initial mission of IBC:

Why do the discrepancy-based methods perform so well?

That is, what properties of the particular high-dimensional integrals that occurred with CMOs and other applications that made non-random methods work so well?

There are some ideas of why this is true in the actual financial calculations, but the basic question remains open. Wikipedia’s IBC article picks up the story in a way that connects to Parlett’s point though neither he nor his article is mentioned:

These results are empirical; where does computational complexity come in? QMC is not a panacea for all high-dimensional integrals. What is special about financial derivatives?

It continues:

Here’s a possible explanation. The 360 dimensions in the CMO represent monthly future times. Due to the discounted value of money variables representing times for in the future are less important than the variables representing nearby times. Thus the integrals are non-isotropic.

Working with Ian Sloan, Woźniakowski introduced a notion of weighted spaces that leverages the above observation. Their 1998 paper laid down conditions on the integrals and showed that they are rendered tractable in the QMC setting, often in worst case where Monte Carlo gives average case at best. How far this positive result can be extended—for which other classes of integrals does QMC beat MC—is a concrete form of the wonderful problem.

Open Problems

Our condolences to the Traub family and to our colleagues at Columbia and Santa Fe. The Santa Fe Institute’s memorial includes a nice statement by our friend Joshua Grochow relating to workshops led by Joe and a joint paper on limits to knowledge in science.

[modified 1990s dates on QMC for CMOs following this; see also this comment.]

1. August 31, 2015 12:15 pm

Really sad and surprising to hear about Joe’s passing. I was talking at Columbia last Fall and it was a great pleasure seeing Joe was still extremely engaged in research despite being in his eighties.

August 31, 2015 3:06 pm

Daft question here: why is *anything* in mathematical finance modeled using integrals over real functions? Nobody has pi dollars or executes infinite trades in an epsilon time interval. It’s factually incorrect to have models like that, but it’s very convenient to scribble on a piece of paper I suppose. If they can’t compute those wicked integrals cheaply enough, then they’re just paying the price of that earlier convenience.

September 1, 2015 4:04 pm

I don’t think it’s correct that quasi-monte-carlo was met with “great initial skepticism by the financial firms”. I was personally involved in developing such methods well before Traub et. al., and they caught on very quickly once they were well understood within the larger practitioner community. Working on the CMO valuation problem in 1990, I visited UCLA to talk to Russell Caflisch, whose just-finished student William Morokoff had done a dissertation on QMC and I learned of the technique there. (Morokoff later went to Goldman Sachs.) I implemented a version in software then, and continued to work on it over the next few years, but plenty of other people were doing the likewise. You can find a general discussion here: http://www.cs.hunter.cuny.edu/~felisav/StochasticProcesses/Course_Material_files/BBG_jedc96.pdf

It’s quite possible that Traub et.al.’s efforts to sell the technique to Wall St. firms were met with dismissal, but that likely has to do with the fact that it was already widely known rather than skepticism as to its value.

[A particular irritant in relation to the work of Traub et.al. was that Columbia’s IP office got involved and obtained a patent on the technique long after it was in general use. Even a misbegotten patent like this can be an irritant to work around when earlier implementations are mostly hidden away in proprietary code and not described anywhere in the same level of detail.]

The challenge with QMC methods in this context was dimensional reduction, which you allude to. QMC offers no or even negative benefit (since convergence can be tricky) if implemented naively. The trick with CMOs is not so much that future cashflows are discounted (thought that helps), but rather that the integrands vary most strongly at low frequency and are nearly constant at high frequency. A weighted fourier transform construction of the stochastic process (or almost equivalently, the brownian bridge construction, as suggested by Caflisch) then leads to an effective solution. This was very much a problem of details. (I have no idea how this relates to the philosophical arguments about IBC.)

Koray – no one was trying to solve infinite dimensional integrals. As the OP describes, these were at most 360 dimensional problems. But from a computational standpoint, 360 is close to infinity. That’s why dimensional reduction was so useful (when combined with QMC).

• September 1, 2015 10:33 pm

Hi, Oren—thanks for detailed and nice reply, and greetings from a fellow Princeton ’81-er!

Of course your first comment needs addressing also to two major sources for this note—the Santa Fe memorial and Traub’s Wikipedia bio—which include statements with “contradicted long-held practices” and “Wall Street firms to considerable initial skepticism.” There’s also Wikipedia’s article specific to QMC in finance, which dates the start of Traub-Paskov’s activity to “early 1992” (though we went with the 1994 presentation date) and actually seems to gives primacy for the suggestion and “new interest in QMC” to Wozniakowski in 1991. Looking at the paper you reference, perhaps the real bottom line was that the Sobol-based method of Paskov-Traub gave the least error and hence best performance. Indeed, I see we should move “1994” to the clause with Paskov himself, and I am doing so.