What Is Big? What Is Small?
How does the growth in computer memory change applications
Andrei Broder is one of the top experts in computational theory, and is especially famous for his work on algorithms. A recurrent theme that makes his work so wonderful is that some of his best theoretical work is also extremely practical. No galactic algorithms for Andrei.
Today I want to talk about computer memory, and how applications need for memory can shape them.
Broder’s work on the “power of two” phenomena with Yossi Azar, Anna Karlin, Eli Upfal is some of the best algorithmic work ever. Mathematicians and theorists have been studying the throwing of balls into bins for decades. It was a huge surprise that after such intensive study, Broder and his colleagues could discover a method which exponentially improved the previous best results. The maximum number of balls in a bin grows logarithmically; they show that allowing two choices reduced this growth to . A beautiful result—take a look at their paper or the many versions of it on the Web. It is a classic.
Broder also has worked on the question of large data sets—at least indirectly. He was a central figure in one of the best early search engines, AltaVista. For a while Broder was the chief scientist of AltaVista. While Google is now the dominant search engine, AltaVista introduced many important features into searching, features that we now take for granted. He also helped make AltaVista the search engine for several years. He is now at Yahoo and works on search and computational advertising.
Andrei once told me that when AltaVista was first implemented it was available only internally to DEC employees. In order to update the data structures that stored the search information, they did a re-make of the system during the night. This meant that the search engine was not available for about 30 minutes late each night.
Andrei said that they quickly began to get heated complaints that said
“what’s up, I need the search engine now.”
And far worse complaints. Broder said that the AltaVista team realized that DEC employees all over the world were using the engine. There was no good time to stop the search engine, so that they could update the tables. None. Late east coast in the US was a just fine time to be working somewhere else.
So the team re-implemented the entire AltaVista engine. They had to rebuild it completely so that the updates could happen while the engine was running. Andrei told me this made a big impression on him—successful tools like AltaVista had to run 24/7. Users expected it to always be available: every day, every hour, every minute.
Three Points About Memory
Let me make three points—you did not expect four? They are about the growth and need of computer memory.
- Need for memory will continue to grow forever:
- Ability to deliver cheap memory will continue to grow for many years:
- Some requirements of memory will plateau.
(1) The memory needed will continue to grow. I doubt this will be a shock to anyone. In the history of computing the one quantity that we have consistently underestimated is memory—or space as we say in complexity theory. There will always be applications that need more memory than is currently available.
(2) The memory available will continue to grow, but there may be limits. The key word is may. As I said many times, it is very hard to prove physical limits. Perhaps twenty years from now people will be shocked that we now use electrons to store information, perhaps they will use some exotic quantum effect. Who knows.
I used the math symbol and later to make the point that the need for memory is going to continue to grow toward a potentially larger “infinity,” than the amount of memory available. Of course this is only an approximate notion: it is quite an abuse of these symbols , since there is not a different type of infinity involved here. But it is fun to pretend.
The cost of memory is still going down rapidly. Moore’s Law on the decrease in the cost of cycles seems to be slowing, but there are indications that flash memory is decreasing still at a phenomenal rate: at least 50 per cent per year.
(3) The memory needed for some applications will plateau. Let me discuss that next.
Anyone who bets against the need for more memory in the future is probably wrong. Also anyone who bets against the industry’s ability to continue to make more and cheaper memory available is also probably wrong. The later point is of course a bit of a paradox: there must be some physical limits on the amount of bits that we will be able to buy cheaply. There must be some limit. But again seeing where that limit will be and when we will hit it seems very difficult.
However, I do believe that the requirements for memory for some applications will plateau. Let me be clear I think there will continue to be new applications that require more and more memory. Years ago no one would have guessed that we all would have videos on our machines, or even just audio. In the near future I could imagine that we may start to have immersive 3-D environments. These could require astronomical amounts of memory—amounts that are way beyond anything available today.
So what is the point? I think the key insight is that some applications’ memory requirements will stop growing or at least not grow as fast as the growth in available memory. They will change from being at the edge of what can be stored, needing huge amounts of memory, and move to what can routinely be stored. The reason for this is that some memory requirements are based on social limits, and others on physical limits.
Call an application small if its data set is easily stored in memory today, and large if it cannot. Here is an example of a data set that changed from large to small:
The Dictionary: There was a time when storing on your computer any dictionary, not the OED, was out of the question. Now you get realtime spell correction, which I could not possibly do without. The reason this application’s memory needs reached a plateau is that the number of words we use is not increasing fast, if at all.
Here are two examples of data sets that will flip, in my opinion, to be small in the near future.
Technical Papers: Today there are applications like Citeseer that contain about one million papers. The total storage for this is beyond the ability of most of us to store on our laptops. But this should change in the near future. The issue is that the number of papers will continue to grow, but will unlikely grow as fast as memory increases. If this is the case then an implication is that in the future we could have all the technical papers from an area on our own devices. Just as realtime spelling is useful, realtime access to technical papers seems to be a potentially exciting development.
This prediction could wrong in at least two ways that I can think of: papers could start to contain much richer data types. They might start to contain videos, or simulations, or animations. I might argue that they would then not be papers, but would be a new type of object, with new applications.
Another way I could be wrong is: what if papers start to be written by automatic agents. This could lead to a huge explosion in papers. Imagine a world where “robot” authors create papers that summarize other papers, or re-write papers for a less expert audience, or who knows what. Perhaps.
Books: Right now there are too many books, even restricted to a subfield like mathematics, to have all of them stored on a single cheap device. But this large—huge—amount of memory could easily become a small one in the future.
The reason I think this could happen is again related to the above point about papers. As long as they are written by people, there is an inherent limit to how many books there can be. Even if every mathematician wrote a book a month, the number of bytes would not remain large. At some point it would become small.
In this future we could have a device that already has every book on your favorite topic. Then you would not need a connection to search, browse, and read them. There might be interesting questions about costs. How would authors be payed in this world: would they get money as their book is read? Not when it is included in a device? Seems like a number of interesting questions.
I think that there will be fundamental shifts in many applications as their memory requirements change from large to small. What do you think will happen? Are there interesting research questions that we should be working on?