Should theory do more with real world problems?

Judith Norback is not a theorist, but is an expert on communication skills—especially oral presentations. She is a faculty member in the ISyE department at Georgia Tech, where she runs the communications laboratory.

Today I want to talk about about the role that theory can and should play in practical problems.

I was recently invited as her guest to attend ISyE’s “best presentations,” which raised a question about the role of theory in my mind. I should also add, for full-disclosure—that Judith is my wife and I am her husband.

The 3% Solution

The name ISyE stands for Industrial and Systems Engineering. This department is currently the best, of its kind, in the nation and has been best for over a decade, which is impressive since the competition includes departments from most of the top engineering schools.

ISyE, as most departments do, has a “capstone” course. In this course teams of seniors work with companies, applying their industrial engineering knowledge to solve real problems. This year, one team, for example, worked on improving the routing of auto parts for Honda; another on prediction of the future demand for wind turbines for GE.

The other night I went to the final presentations of the three best teams: its an American Idol type show. Each team makes a presentation of their work, there is a panel of judges, and a winner is picked. The winning team does not get a recording contract, but the sense of competition was still high. The teams take their presentations seriously, very seriously, and it showed. The presentations were clear, interesting, and professional. I wish my presentations were half as good.

What struck me about the presentations is they made an implicit point that we in theory usually avoid: Constants matter—sometimes a great deal.

Each of the teams presented work that went like this:

Our company has a real problem X that is ${\dots}$ We collected data on the problem, studied what the company does now to solve X, and determined the cost of their current solution. Then, we applied our industrial engineering methods and found a new solution S to their problem. We realized that we could use randomization, local 2-opt search, and tabu methods to make our solution S more efficient. The result was a cost savings of ${\dots}$ dollars over the previous methods.

The surprise to me was that their solutions S were often slight improvements to the current practice. In one case S was better by 3%, and in another case, by 1%. These are relatively small improvements. The cool point is the 3% was of about 190 million dollars. Thus, the savings to their company was potentially more than 6 million dollars. This is a huge amount of money, since it goes right to the bottom line of the company’s profit. I do not have an MBA, but to get 6 million dollars to the bottom line, would require 100s of millions of dollars in new revenue.

Theory’s Role

As theorists, in our thinking, in our classes, in our papers, and at our conferences, we all work in O-notation. We prove ${\log n}$ approximations, we decrease a bound from one polynomial to a smaller one, and so on. But, what I realized the other night is that there is something missing from this model. We need to have papers that show even small percent increases—provided the percent is of a large enough amount.

Thus, the question that I have been thinking about is: how can we do what the ISyE teams did? Can we apply our algorithmic tricks to get small percent increases that really matter? Would papers on this type of work be accepted at our top conferences? Would NSF fund this type of research? Are there models of how we can help in this way?

Open Problems

Should we work on such results? How can we help apply some of our theory “tricks” to make small percent improvements, that really matter?

December 19, 2009 12:46 pm

One way of looking at this: you have a case where, say, company C can save ~6 million in costs by having a bunch of students (undergraduates?) spend less than a year applying algorithmic knowledge that is sophisticated but probably isn’t really “cutting-edge”.

Given the relative cost of engaging in such research — let’s be generous and say 150k/annum per researcher all-in (office space / benefits / salary / etc.) — it seems like at some point most companies over a certain size threshold would benefit from having an in-house research department (which would model operations as algorithms and then optimize said algorithms).

So perhaps there’s a future role for CS theorists doing something like this? To some extent the same space has already been colonized by the operations research folks, who so far as I know would on average have more statistical and econometric sophistication than someone with the equivalent level of CS degree but would have less (not none, but less) algorithmic sophistication.

The ideal strategy might be to pair up with the OR folks and have some kind of joint degree in algorithms-and-or (if no one’s done that yet), as you can then piggyback on the existing industry-academia bridges the OR community has built up. If you don’t do that it’s a bit of a marketing issue: businesses have algorithmic issues but don’t know they do, so how do you convince them they have such issues?

A secondary point is that if it doesn’t already exist it might be very useful to start an online repository of problems-and-algorithms, wherein eg SORT is characterized, theoretical bounds described (comparison sort is at best ~nlog(n)), and then specific algorithms are included with more-realistic formula (eg some particular sort is ~ 10k_one * n + 12k_two * k_one * n * log(n)).

The important point is some relatively-authoritative handbook listing the theoretical bounds and the performance contours of the known algos; without such an authoritative reference it’s hard to tell at a glance if a few % improvement is actually an improvement.

2. December 19, 2009 1:09 pm

But if constants are introduced in studying complexity isn’t there a problem
on agreeing on the underlying model of computation? That is how to avoid that
the trick for saving time is just using a different kind of processor that is slightly

December 19, 2009 2:15 pm

One of the early attempts to make a theory that included constants was work of John Savage. I should do a whole discussion on his work in the future. He had models that included not just time and space, but also included other measures.

3. December 19, 2009 2:08 pm

Right now it’s still very difficult to justify anything like a 3% improvement in graphics, even in the high performance real-time rendering community. (which I only mention out of more familiarity) it seems to me like there are a number of reasons we (ie all computer scientists, not just theorists) are loathe to work on the 3% problems.

First, Moore’s law made us all very lazy. Second, we often don’t get those kinds of 3%->\$3 million conversions that make this kind of detail work cost effective. And third, most computer programs are supposed to work over a diverse enough set of applications and scenarios that it becomes much harder to show that a proposed 3% improvement isn’t sometimes also a 3% decrease.

That said, It does seem like teaming up with OR or other, more big industry focused disciplines might be productive.

December 19, 2009 2:10 pm

Great point on Moore’s Law. Since it could be changing perhaps we will have to adjust our models.

December 19, 2009 4:19 pm

Moore’s Law “always” exists (I’ll leave that argument aside for now) and, for the sake of argument, let’s say companies reasonably invest in the latest computing technology to take advantage of Moore’s Law. The point is that, on top of improvements due to Moore’s Law, we can get another 3% improvement.

One might argue that 3% improvement in running time on top of a 100% improvement in computer speed is essentially nothing. But when the 3% is in an approximation ratio instead of a running time, it can mean a lot. A computer that’s twice as fast might get an answer twice as fast, but will still get the same answer unless the approximation algorithm itself is improved.

In terms of models, this has long been an issue in the theory community. I believe it was Mihai who once said something like: in the real world, no one cares about computational models; with the exception of super-computers, all standard computers are word-RAMs. So when it comes to constants for many standard problems, working in the word-RAM model seems like a reasonable thing to do.

December 19, 2009 9:31 pm

For most of the “real-world” problems OR cares about, the underlying problems are already NP-hard and are either inapproximable or require solutions better than the best possible approximation. Most of the time, there aren’t computational guarantees that method X will work better than method Y (and give the 3% increase), but there is empirical evidence that on historical data or simulations that method X tends to work better. The problems aren’t “clean”, and models are nearly as important as optimization strategies. Getting a 3% average increase in speed or computable size on standard test sets for max-clique or graph isomorphism or 3-SAT or “find an optimal strategy for Go” is great and may lead to new directions in theory, but getting a 3% increase on the “make more money for company Z” problem seems like a very different beast.

• December 20, 2009 3:41 pm

You’re correct that real-world business problems are usually much less homogeneous than theoretical NP-hard problems, though there are sometimes variations that apply. Techniques for the simplified version may be at least partly applicable to the variant. It simply requires more investigation.

For example, supply chain management (e.g. when should we buy how many loads from whom to ensure that our shelves are stocked as well as possible?) is often expressed as a variation on integer programming. It’s a huge issue for grocery stores and big department stores. Another example, though I don’t know if it’s actually used, is team selection for special projects could be expressed as a variation on graph colouring or k-clique.

6. December 19, 2009 9:45 pm

Most professors, but more troubling, most software developers, that I talk to about performance optimization are convinced of two items that baffle me:
1) Optimizing a piece of software can only achieve up to 5% speedup on average.
2) A 2x speedup is useless.

The second point is a particular example of the main subject of your post, but the first is also important. Even if people think 1% improvement would be amazing, if they think that a 1% improvement is impossible, they won’t even look at trying to make such an improvement. Moreover, if someone finds a way to achieve a 10% improvement, people brush it aside as nonsense and won’t examine it.

In optimizing AQUA@Home, a distributed quantum computation simulator that was already multi-threaded, we managed to achieve a 10x speedup by some changes that, although they required experience to implement, were quite fast and easy to make:
– Simplify the main data structures to improve caching and eliminate branches.
– Approximate the exponential function instead of using the exact function.
– Explicitly vectorize as much as possible to do 4 things at once.
– Balance the work of the threads better.

All of that was implemented and debugged in a couple of weeks, saving us months of computation time so far on each of thousands of volunteers’ computers, and allowing us to get better quality results. Not including the explicitly vectorized version, the code is significantly simpler than it was, and the vectorized version is only more complicated because C/C++ compilers don’t have any readable means of vectorizing.

Software developers nonetheless seem to pretend that this is just impossible whenever I describe the several cases like this of which I’ve been a part. Perhaps they’re just so stuck on maintaining the status quo and not having to learn new skills that they’ll just blindly ignore anything that disrupts their view of software.

December 20, 2009 11:08 am

You won’t find many places where 3% improvement will be something that is greater than the cost of changing things. Even in the examples given, I highly doubt that unless there are simple direct changes that could be made any of that 3% could be realized.

This is the “Six Sigma” discipline issue. There comes along this neat quantitative quality control scheme (Six Sigma, which I actually do think is decent) that has great tools for analyzing processes and technologies. It works on some manufacturing problems very well and saves millions. It doesn’t work that well on other things because of what Andrew Gelman calls the “All other things being equal” fallacy. Everything else changes around a change, and the costs of this are often way higher than that costs of the particular process change itself. Constants do matter, and they often matter in a negative way – they show that even if something is cool it may not save any money, time or aggravation.

I have been one of the system designers that have told people with double time speedups or more that we won’t implement. Why? Because it didn’t actually double speed at peak hours and peak hours dominated the cost.

So why implement change (which costs) when we get nothing for it that helps any pain point and adds complexity. Every software change has overhead.

December 20, 2009 11:31 am

my impression is that in comp architecture they go through incredible pains to get from e.g. 99.4% branches predicted correctly to 99.5% it actually translates to a big speed-up.
Also for what it’s worth in crypto it is nowadays well-recognized that constants are very important, at least for practical schemes. Often researchers try to avoid the use of asymptotics in this type of work, although some use of Big-Oh seems unavoidable due to the details of the model of computation as one commenter mentioned. Would be interested in hearing more about Savage’s work…

December 20, 2009 1:52 pm

Richard,

There’s a great deal in this post, and many similar themes have been considered in various posts in my blog. But my high-order comment is that we who work in “algorithms” (broadly construed) do a poor job in teaching our students about heuristic algorithms and the kinds of techniques that can eke out small (and sometimes large) improvements in real-life problems that can have significant economic implications. This is understandable — in theory we are all about proofs, so we’d rather teach a 3-approximation or even an O(log n) approximation algorithm for an NP-hard problem that nobody would think to implement in practice in favor of a heuristic technique that we can’t prove anything about that, in practice, generally gets within 2-3% of the optimal solution.

I, personally, think that’s the wrong choice. I devote 1-2 lectures (and one programming assignment) in my algorithms class to heuristic methods — and often think it should be more. Perhaps in AI classes people are still able to learn about tabu search, “go-with-the-winners” style parallel search, and other useful methods, but I worry too many people don’t know basic, useful optimization approaches.

If you’re looking for a simple heuristic method to teach that fits in well with an algorithms class, I always point people to my paper on what we called BubbleSearch (easily found online, or at my home page). The idea is simple. Suppose you have a greedy algorithm — like bin-packing by placing items in some way in order of decreasing size. You can run it once to get a good solution, but then you’re done. What if you have more time? You can’t run it again — or can you? We propose perturbing the greedy ordering in a natural way — instead of picking the best item at each step, pick the best with probability say 1/2, the next best with probability 1/4, the next with probability 1/8, and so on. This preserves the intuition behind your greedy heuristic (packing big things first is best!), but introduces some flexibility leading to exponentially many orders. Very quickly you tend to find somewhat better solutions, and you can stop at any time. This is a simple and natural way to generalize pretty much any basic greedy algorithm that proves very effective. I believe we should be teaching our students more heuristic “algorithmic tools” like this.

December 20, 2009 4:22 pm

I think that not everyone here is talking about the same thing: a 3% speedup in an algorithm is very different from a 3% reduction in the cost of production, unless 100% of your company’s costs are in CPU time. The practical improvements of interest are more likely to relate to improving the approximation ratio achieved on typical instances.

• December 20, 2009 6:21 pm

That’s true.

To work through a translation to cost: in the case of AQUA@Home, it’s taken several months of computation to get the results we have so far, whereas before it would’ve taken several years to get the same results. Assuming the 3 of us working on it continued to be employed through those years, the total amount of time spent managing the project is much less, even though some of the several years time would be spent on other projects. Maybe it wouldn’t be the full 10x less, but probably 4x less, and since labour is the only significant cost involved in the project (thanks to the many volunteers donating their spare CPU time), that’s a pretty big cost savings.

In other cases, the costs/revenues aren’t impacted directly by performance, but quality of the product or service usually is. The benefits of improved product quality are usually much more difficult to predict and measure, though can also be significant. For example, Windows Vista and Fedora Core 10 can take 5 to 10 minutes to reboot, so I can either repeatedly spend that time, or always leave the computers on, spending power. If they only took 20 seconds to reboot, like Windows 95 did on much slower hardware, I’d just shut them down when I don’t need to leave them running, but at the moment, I opt for leaving everything on.

December 21, 2009 9:28 am

Does it still take that long to re-boot? I never understood why it takes that long. I mean I understand, but do not.

December 30, 2009 12:42 am

I have a few mundane observations from having been involved in working on (hacking at) software in industry for 20 years.

I mention these because there seems to be some comment thread conflation between algorithmic improvements which result in the quality of operations routing/scheduling/etc or economic resource allocation that are produced by software — where 3% improvements can indeed result in significant economic gains, and perhaps even in improvement in the quality of life here on earth — and algorithmic improvements to the run-time of the software itself — where my experience is that there are generally so many egregious problems that single-digit percent improvements are simply never even worth thinking about in any fashion.

* Algorithmic improvements in software — of the type that CS professionals investigate — are rarely of any value.
This is because the quality of software is so wretched (and I’ve been dispatched deep into the “crown jewel” disasters of a number of “industry leaders”‘ code ) that bumming out constant factors (the stuff hidden in the O() notation) is nearly always possible and nearly always easy for somebody with some engineering discipline and experience.

* Beyond that, the quality of software is so wretched that inspection of the code by somebody with some engineering discipline and experience can reveal scores of bugs, many of which are long standing, and some of which have huge commercial implications (system uptime guarantees, real-time responsiveness, redundant equipment needs, corporate reputation and sales.)

* As somebody mentioned above, Moore’s Law does make considerations of software improvements that yield 3% improvements uninteresting. I imagine there are exceptions for very high quality, well-optimized, fully-vectorized, extremely well-tested and absolutely CPU-cycle-limited code (I’m sure QFD, QFT, etc, and financial code has this sort of thing), but my experience in large swathes of real industrial software is that this will just be lost in the noise.

* The point about factors of 2x not being worth considering (my engineering rule of thumb is close to that number) is not because such speed-ups aren’t desirable or possible, but because undisciplined programmers are wont to speed up (or, more accurately, “improve”, with full emphasis on the “scare quotation marks”) parts of the code that end up being irrelevant to overall _system_ performance and reliability. So it’s not that such speed-ups aren’t worth thinking about, but rather that there are nearly always lower hanging fruit, and premature optimization mostly ends up being a waste of effort — both of the person who self-assigns him/herself this task, and of the people who have to clean up after the “improvement”. A lot of stuff, no matter how sub-optimal, should simply be _left alone_ because it is _irrelevant_, even to worst-case performance cases.

* 80/20 (or 90/10, or 98/2) rules, suggesting that 90% of the time is spent in 10% of the code, generally apply in spades, especially once other engineering stupidities are dealt with. Most software run-time improvement comes down to organizing things so that a critical and simple inner loop or two dominates performance foe nearly all imput cases, and then optimizing the hell out of that loop.

* Very few programmers have any _practically useful_ conception of the orders of magnitude of cycle delay involved in the register/L1-/L2-/L3-cache/DRAM/Flash/Disk/backplane/NAS/etc hierarchies.
Because of this, improving data and instruction locality (ie improving cache performance, which can yield radical, factor of 100+ performance boosts) nearly always beats any algorithmic improvements, inside the O() or outside.

* Very few programmers have any practically useful conception of how CPU instruction pipelines work, and in particular of the performance implications of code with lots of conditional tests and branches. Reconfiguring code from having a branch-filled inner loop to having multiple branch-free loops — even at the apparent cost of extra code — can radically improve performance. Combine this with warmed-up caches (pre-fetching data long enough before it is touched, avoiding cache misses that can costs thousands of CPU cycles) and we’re starting to talk seriously faster performance … none of which has much to do with any algorithmic complexity.

* Beyond the above:; tight, branch-free, data-prefetched inner loops can benefit significantly (another factor of 2+) by explicit coding to take advantage of the mutiple parallel functional units (ALUs, etc) present in even the most pedestrian consumer CPU these days.

Anyway, that’s my perspective. Beyond being nearly uniformly intellectually unambitious and unlikely to be aware of or care about cutting edge academic algorithmic advances, most software writing professionals are so unskilled in their craft that orders of magnitude improvement in software performance are readily attainable without any attention to any algorithmic issues.

And finally, yes, when it comes down to it, its the constants, not the log log Ns, that dominate. For many problems in many industries, the Ns are on the order of a few thousands or tens of thousands, and anything that another year of CPU speedup and more memory or code optimizations such as I’ve mentioned don’t address, will be more strongly influenced by the constant factors buried in the O() notation than they are by what most of us think of algorithmic complexity.

In summary: an embittered report from the trenches of intellectual despair of the software industry.

12. January 4, 2010 1:31 am

A 625x speedup from one day of making algorithmic changes, or the difference between “This isn’t feasible” and “This’ll be done by tomorrow morning on my old laptop”

There’s also the >30,000x speedup from not printing things out, but that should be obvious, even though so many people have told me that logging couldn’t possibly have a significant impact on performance. I could also do “inner-loop” optimization, such as by throwing it into assembly language, heck, I already had something similar in assembly language, but it’s thankfully not necessary now. 🙂