April 17, 2011

Comparing searching libraries and searching the Internet

Carl Gardner, Billy Guy, Bobby Nunn, and Leon Hughes formed the group called the “Coasters.” In 1957, when even I was pretty young, they recorded the famous song Searchin’ which was written for them. The group was inducted into the Rock and Roll Hall of Fame in 1987, at least partially based on this number-one hit.

Today I want to talk about searching—for technical material—and how the rise of the Internet has changed the search for information.

The theme of their song is about their search for love:

Well, I’m searching

I’m gonna find her

The refrain is:

Gonna find her, yeah ah, gonna find her

Go here to see them be introduced by Steve Allen and then perform their song.

Since GLL has not really been purchased by Google, we are able to say whatever we want about that company. Google is a great search machine, but there are things that I liked back before the Internet that I think are missing today.

Before Internet Searchin’

One of the skills I had when I was a young researcher was the ability to find things in libraries. When I was an assistant professor at Yale the math library was conveniently right next door to the CS building. I often went over to the library and browsed for hours, then would check out a ton of books.

One of the coolest things about being a faculty member at Yale was the fine rule: there were no fines possible for faculty. We could not be charged anything at all for late return of books—this actually was in the Yale faculty handbook. I really liked this, since as an undergraduate and as a graduate student I had often run up serious fines for late books. Late fines were later used by the video rental company Blockbuster, then replaced by a 30-day grace period ended by a full-cost replacement charge, which caused other problems as their fortunes declined. (They have recently been bought out by the Dish Network.)

The Yale librarian did not like me taking books out at all, and especially did not like me taking out large numbers of them. I think there were two issues. First, I was not a real faculty member, that is, I was not on the faculty of the math department. The other was that the library was in a perfect state when all the books were on the shelf in order, not when many books were sitting in my office or sitting somewhere else. Sometimes after I took out a bunch of books the librarian would call—there was no email then—and ask me to return book {X}. I would always comply immediately, since perhaps someone really needed to look something up. But I usually went back in a day or two and took the book out again. It was a simple game we played; I think after a few years we were able to co-exist in peace, even though we didn’t become friends.

Enough about librarians, let’s get back to searching for information. There were a few tricks that I used back then to help me search the library:

{\bullet } Location: I really could remember information based on its physical location in the Yale math library. If someone asked for a reference on something, I often could find the book by recalling that it was on a certain shelf. Of course now and then the library re-structured the whole place and I was disoriented for a while, but I would get back in synch pretty quickly.

{\bullet } Shape, Color, Size: I could also recall a book by its physical characteristics. Paul Erdős and Joel Spencer’s little blue book was the first book on the probabilistic method. I could find it or help you find it by recalling its size and color. It was a thin book too.

{\bullet } Linear Search: No matter how big the Yale library was, I sometimes spent hours just reading through all books in an area of the library. When I was a graduate student at CMU I used to pick a journal, the journal on {X}, pull out all the volumes, and scan through all the articles. One by one: from the first to the latest. Of course I could not read them all, nor could I even do more than just scan them. But the ability to scan all of them in this way allowed me to remember that there was some article on some topic. This often was instrumental in finding information that I really needed later.

One concrete example of this type of brute force search is a long story that I will go over another time. It is about a result on the cover time of a graph, joint with Romas Aleliunas, Dick Karp, Laszlo Lovasz, and Charles Rackoff. One day Dick Karp and I had an outline of a proof that the cover time was polynomial, but we needed a simple lemma. Karp had class, so while he was teaching, I went to the Berkeley library, and after about an hour of brute force search found the lemma we needed in some unrelated article.

{\bullet } Browse: I browsed through the library all the time. I would lookup something that I thought I needed, but once there in the shelves I would look at all the books nearby. This branching search often uncovered great nuggets of information.

{\bullet } Tomography: If I needed to learn something about a topic, especially if I thought this topic could help me prove some theorem, then I did a kind of “tomography.” Suppose the topic were finite group theory. I would go the library and take out 10-20 books on group theory—sometimes it would be all the books they had on the topic. I would not read them all—too hard. I would look at the exact topic I needed and see many different views of the same topic. These multiple views would give me a much better insight into the question I needed. Different authors had their own views, even of the same exact theorem. Some would give examples, some had different motivation, some different proofs, some different applications, but all together would give a fuller picture than one.


All of these techniques seem to be harder to do on the Internet. The search engines, like Google, are of course terrific at finding things, but the techniques that I used to employ are much harder to do today.

{\bullet } Location: I think this means nothing anymore.

{\bullet } Shape, Color, Size: The Internet could allow you to look for a book by specifying: It’s yellow, thin, and on exponential sums. But I do not believe that it does.

{\bullet } Linear Search: This is very hard today because there is so much stuff. I believe that the volume of material, especially since much of it is repetitive, makes finding the golden nugget more difficult.

{\bullet } Browse: Hard to do today, because there is less locality. You can branch and search, but not quite the same as before.

{\bullet } Tomography: I think this is really difficult today for two reasons. We cannot get books—the Internet is best for articles. Moreover, getting all the books or articles on some subject {X} is nearly impossible. The number would probably be close to infinite; in any event it would be overwhelming.

E-Search Not Research?

Ken notes the following: The main advantages of online search are that with minimal skill you can frame criteria to specify what you are looking for, and the results often give you a high-valence tree of links to follow. If you spot a desired asociation among the first 10 or 20 hits, you can often follow the link to find more associations and make a better search. Thus in place of a linear search you are following paths in a tree.

The issue comes up if you find something {Y} that resembles, but is not as good as, the {X} you need. There are basically three choices:

  • Take {Y} and—facing down at a desk rather than forward to a terminal—try to work out how to make {X} out of it.

  • Keep on searchin’ trying to find {X}.

  • Be happy with {Y}, which you found so easily, and change what you’re doing, instead.

With books, the {Y} you find is usually at the same level of expertise as what you need, and this plus being at a desk or in a chair promotes the first kind of effort, which is the most valuable for research. Whereas online, there is more inducement not to think, or not to try harder. We have noticed this effect in our own research. Keep on searchin’ is good for not overlooking prior citations, but masks valuable thinking time with the feeling that you’re still being productive click-click-clicking.

Worse IMHO is the third case, whereby the Net can con you into “going with the flow” and thinking about something else rather than the problem at hand. This is a general issue faced first at middle-school and high-school level. Is “e-search”—not research in libraries—being used to produce papers that are broader but shallower? Does a sense of entitlement brought on by having answers come easy keep us from aiming for more? Amid a general discussion of the kind, “Does Google Make Us Stupid?”, we at the above-PhD end can be a valuable test of the answer.

Open Problems

Can we make today’s search for technical material better? Does e-search depress research? Or are the things that I am discussing just old and silly? I guess we all will keep on searchin’.

19 Comments leave one →
  1. Andy J permalink
    April 17, 2011 4:17 pm

    Although it doesn’t fix all your problems, the internet can do better than google when it comes to tomography and linear search. Specifically, library genesis


    have >500,000 texts between them (though it’s probably significantly smaller if you discount different editions of the same work), and so are great for exactly that sort of thing.

  2. April 17, 2011 5:04 pm

    Honestly, the QA100 section of our main library is one of my favorite places to sit on the floor, undisturbed, looking through old good books. I’ve been in better QA100’s, but this one is still really very good. There’s some nice stuff in QA200 as well, which I find myself in more and more these days.

    The books in these areas smell funny, and I’m convinced it’s the sum of the following two properties: old books were made with a certain kind of binding glue and the ink on the pages was pretty serious ink. Much of that smell will go away for a book that’s been opened lots and lots of times, but based upon the checkout history for these books (some of them still have the public-library style glue-in leaf with checkout stamps on them), they haven’t been getting much usage in the last 20-50 years.

    It feels pretty awesome to know that you’re the 10th person, say, who has *ever* cracked this thing open in your tiny town. Especially when it’s a slim book filled with amazing theorems.

    There’s nothing that sexy online.


  3. April 17, 2011 7:07 pm

    A great advantage of Web search is access to broader material . Alas, there are always the paywalls, but in my short research history I’ve managed to procure articles that would require me to send requests to other libraries. I can access (most of) the Yale Library too, although I live in debt-ridden Greece with zero research funds for professors, let alone an undergraduate.

    Also, quite recently Google released a feature called “Reading levels” . I am sure all researchers have run into the problem of searching online for a term , usually specific to your field, that is overshadowed by a popular notion that uses the same or similar terms. Or you are looking for information on a recent paper on an open problem and you get a storm of newspaper articles and information that you already know. Google reading levels allows you to select the reading level of the results and I have found it quite useful. You can see how to use it here:

    As I have said before online, especially when I explore a new notion or even a new subfield, I use the wikipedia page as a launch point. From there I can find the major papers, researchers and journals that will provide me the higher quality results that I look for.

  4. Cfp permalink
    April 17, 2011 8:01 pm

    I absolutely remember things on the internet by location. I subscribe to a lot of RSS feeds and mailing lists (including journal ones, working paper series, academic blogs etc.) and although I have a terrible memory for names and titles of papers, I can almost always find the paper I’m thinking of by remembering which email list/RSS feed I saw it in.

  5. Mugizi Rwebangira permalink
    April 17, 2011 9:37 pm

    I spent many hours browsing in the Engineering and Science library as a grad student and I always found interesting stuff. I could easily “scan” 4 -6 shelves in one session – which is at least 500 books – just reading the titles for most of them of course and flipping through a few of them and checking out 5 or 6 at a time. And then of course I’d come back in 2-4 weeks and repeat and get different ones. It seems hard to do something similar online.

    Online it seems you have to rely on blogs and twitter to “push” interesting stuff at you – its not as easy to do a “linear search” (as Lipton calls it) of all books on a certain topic…

    Actually I just realized a library type interface could easily be implemented online – a “netflix for books” if you will – but I don’t think publishers would ever agree to that.

    Same thing with book stores – I often find very good books that I would never have heard about through other channels. The problem is that going to the book store is far less convenient than downloading onto my iPad – I don’t think I’ve deliberately gone to a book store in over a year…

    So the internet is great of course if you know exactly what you want – but still not so great for discovering new great stuff (especially other than through recommendations).

  6. Stefan permalink
    April 18, 2011 2:25 am

    This post reminds me a little of Clifford Stoll’s outdated Silicon Snake Oil.

    Hopefully this post will, at some point, become equally outdated.

  7. April 18, 2011 2:41 am

    I find that (okay, unintended pun already) one of the most difficult things to find is material to help a nonexpert on a topic understand material written by an expert.

    As an example, of many, I was recently trying to read up on “the cavity method”. On one end, I can find the Wikipedia article, which has only a handwavy description that doesn’t really say anything significant about it. On the other end, I can find plenty of papers referring to or using it, some deriving complex equations about it, etc. The thing I really need is material to get me from from point A to point B, but as usual, it seems impossible to find that type of in-between material online, perhaps because I don’t know enough to know the right words to search for, or perhaps because nobody had the motivation to post it.

  8. Gilbert Bernstein permalink
    April 18, 2011 3:01 am

    Back when I was doing math research in undergrad, (only a few years ago) I spent quite a bit of time in the math library, and my library experience was remarkably similar to what you’ve mentioned here. I learned a great deal about what different types of geometry there are, about the history of various topics, and many random tidbits in the process of looking for things I was interested in. At one point I looked through books on oriented matroid theory to try to get some insight to a problem involving halfspaces. Later on, when working on knotted graphs, some of the basic concepts I had picked up from matroid theory turned out to be extremely useful there.

    I don’t think these deficiencies in e-search are trivial or irrelevant in the slightest. A recent study of academic e-reader use here at UW concluded that one of the major drawbacks of e-readers (as used for papers/textbooks) was a loss of “mapping” between physical place in the book and location in the text. Your comments about shape, color, size and location seem eerily similar to me.

  9. April 18, 2011 4:19 am

    I too find that although I come across a lot of interesting things by chance on the internet, the experience of browsing in a physical library is (or rather was, since I hardly ever go to a library any more) different. This feels like a soluble problem, though I don’t have a proposal for a solution: basically there are two different metrics on the space of all mathematical (and of course more general) knowledge, and it’s not clear that there isn’t some mechanism for getting the internet to learn a more libraryish metric on all its data. It would be very interesting to work out what it was about library browsing that was different in a good way — I can’t put my finger on it.

    • serge permalink
      April 18, 2011 12:07 pm

      The main difference IMHO between library browsing and internet browsing is that the latter is an automated search while the former is a manual one. You memorize things much better when your brain is active. Therefore I think one should find ways of making search engines a bit more “manual” than they are as of today. This would be an interesting challenge.

  10. eqnets permalink
    April 18, 2011 7:27 am

    Library browsing actually led directly to my first real job and in a separate instance to a big success in that job. I did a lot of that as an undergrad in the mid-nineties, when the arxiv was still new and there wasn’t much else easily accessible material useful for research on the web. I did more as a grad student, but less and less over the years. Libraries taught me a lot about fields that I wanted to learn about. Browsing journals taught me about fields that I knew something about.

    But now journals and books are easier to browse online. I will literally use Google’s book search to find a passage in a book that is no more than a few steps away from me–because it’s quicker. I will use citation searching to find precisely the (first) reference that speaks to my problem. And I will pay for a book (or get an article from a library) that I discover while searching for specific material online.

    The need to physically wander through stacks these days is (I think) limited to the student looking to find good references on a field they are not familiar with and want or need to learn, or to look up older journal articles that aren’t digitized. And even these are declining.

    The demise of the printed page is not likely to come soon. It is a great technology that requires no batteries or network connection and has pretty good storage space for notes and annotations (unless you are Fermat). But the physical stacks of libraries are bound to become resources of last resort.

  11. April 18, 2011 7:40 am

    I like how Amazon recommends books related to the current book being viewed. Sounds like “Browse” to me.

  12. April 18, 2011 1:59 pm

    Location, Shape, Color, Size – agreed for paper library. Yes, for the Internet, location means nothing anymore, but if I save a file in my computer then its location very important for me. For classical MacOS 7.5 I could change color of file icon, unfortunately I can not do it for Windows XP, but I can sort files by size in any e-books folder. Linear Search – unfortunately this method is not useful now and unfortunately we are forced to use this method for search in Google results. Browse – very useful. Tomography – “see many different views of the same topic” – yes, very-very useful (BTW There is good tradition in Moscow State University: students have to use many different textbooks at the same time, a list for only one short course may have 20 titles, where the books have the same context). “We cannot get books—the Internet is best for articles” – disagreed, I have thousands of paper books, but I have much more e-books. For example, I have unlimited access to books and journals by Springer-Verlag. Very important: I can try to find any word in e-book very fast, in contrast with paper book, where I can find only words which had been listed in index. So I prefer e-books. Open Problems – I’m sure Google a.o. can make today’s search for technical material much better! Too many ads and spam depress research.

  13. Aubrey da Cunha permalink
    April 19, 2011 12:03 pm

    One quality that I make use of in a physical library (along with color and shape) is age. With most articles that I find online, unless I specifically look for the date, all I can tell is pre-TeX or post-TeX. That makes it more difficult to build a mental chronology of a subject area, rather than, say, remembering that I found this theorem in a beat-up old tome or a brand-new, never-been-opened library acquisition.

    • April 19, 2011 9:45 pm

      I recently read some articles from the 60’s on the time hierarchy theorem. Bibtex or a Google (Scholar?) search with the articles name allows you to find its publication information, including when it was published.

  14. Janos permalink
    April 20, 2011 4:23 pm

    I also spent many hours sitting in library floors. Still do some of it to learn about non-CS/Math stuff.
    I still use the physical appearence cue for CDs: sometimes I do not know the name of a piece of music, its composer, etc, but I remember the CD cover amd some other info. Amazon gets me the picture of the CD cover….

    Our library lets you access the catalog in shelf order list–this gives you at least the titles of the books, as if you were in the stacks.

    On the other hand a big advantage of the web is that you can get definitions and theorem statements very fast. Forgot what an ear decomposition is — just type it into google.

  15. P.A.S. permalink
    April 23, 2011 4:45 pm

    You are certainly right in that there is no experience like going to the library.

    I live in a different reality, where libraries are not very confortable or attractive or “culturally-valued” places, so I only found out the pleasure of using them to late. And it is certainly better than the internet in many sences. In special, I feel more focused and more concetrated reading the books themselves.

    However, internet seems to have been a fantastic hub to link researchs and reserachers from almost every place and from almost every area!

    Maybe, in every research, we should consider going (each time) to one of this sources of equal (?): internet and libraries. Using only one of them alone is being “old and silly”.


