Skip to content

Mathematical Search

July 18, 2020


A flying start from nearby Rochester

Anurag Agarwal and Richard Zanibbi are tenured faculty in Mathematics and Computer Science, respectively, at RIT. They partner with Clyde Lee Giles of Penn State and Douglas Oard of U.Md. on the MathSeer project. If the name reminds you of CiteSeer{{\,}^x}, no surprise: Giles co-originated that and still directs it.

Today we note last month’s release of a major piece of MathSeer called MathDeck and show how to have fun with it.

Agarwal is a PhD graduate from Buffalo. He did his thesis in the Mathematics Department under Thomas Cusick on cryptography, but often visited Computer Science. He took part in seminars on lattice-based cryptography led by Jin-Yi Cai when he was in Buffalo and in one of mine on related topics. I also knew him socially as a housemate of Pavan Aduri, whose joint work we’ve mentioned here.

A long time ago, Dick wrote a post on Detexify, which does optical character recognition (OCR) for mathematical symbols and finds corresponding (La)TeX commands. MathDeck does OCR as well, but what it is really trying to recognize is the formula you are trying to write. If it is a famous formula—or one you have already saved in your “deck”—it will find and complete it for you. It also takes input from LaTeX. MathDeck was created by Gavin Nishizawa, Jennifer Liu, Yancarlos Diaz, Abishai Dmello, and Wei Zhong along with Zanibbi. They are credited on a brief paper and video.

Trying MathDeck

On first visit, the site shows a very brief tutorial which can be dismissed via a sometimes-invisible X in the upper-right corner. The first formula I thought to write was Leonhard Euler’s “mystic equation” {e^{i\pi} + 1 = 0}. Its OCR sprang into action as I drew and converted my attempt to draw a curly {i} and then {\pi} as {7n} with an extra crossbar. (The website graphics are much sharper than our screenshots here.)



Nevertheless, its default deck of “WikiCards” recognizes the attempt and includes “Euler’s Identity” as an option. Selecting the card changes the display to an immaculately typeset version. I then decided to change it to the form {e^{i\pi} = -1}. MathDeck does not have a pixel eraser like Microsoft Paint does but allows you to delete a region after selecting it:



Selecting “trash” did not close up the space to the equals sign. I drew a minus bar at far right, but my attempts to follow with {1} kept being interpreted as “{y}” or something worse, and the alternate form of Euler’s identity did not come up below. Finally, I restored the LaTeX-input box on the right and edited the source to read, e^{i\pi} = -1:



The artifact of my hand-drawn minus sign was still at far right. I could not select and trash it even after refreshing the page. What fixed the issue was drawing something else over the squiggle and having the OCR interpret the tandem into something else, which I could then select and delete.

Superposing Forms and Associations

I next wondered how the system would react to my trying to write Schrödinger’s equation. One challenge is that it has many forms. I chose the form given uppermost in Wikipedia’s article:

\displaystyle  i\hbar \frac{d |\Psi(t)\rangle}{d t} = \hat{H}|\Psi(t)\rangle

To arrive at the challenge gradually, I first omitted the quantum ket notation, the hat on {H}, and the dependence on {t}. I wrote partial derivatives and lowercase {\psi}. Thus what I first tried to handwrite was:

\displaystyle  i\hbar \frac{\partial \psi}{\partial t} = H\psi

The system jumped on my handwriting right away and I won’t report the results except to say it looked like Dada art with math symbols. One can, however, do a drawing in Paint or similar app and upload it. So I drew



and obtained



The ten cards returned (one is below the snip) have some wild but inspired associations. Handwriting Wikipedia’s form as given did not produce better results. However, I realized I could snip what appears on Wikipedia and upload that:



The result was a ghostly evocation of the equation, with LaTeX to boot:



The LaTeX output reminded me that I could enter Schrödinger’s equation in LaTeX and remove all doubt. I did the short form first. Would the system recognize it?



The name Schrödinger popped up in the eighth card at bottom center, but not the form I had typed. It has an integral and no equals sign. Of greater note, the center card called up another giant of quantum mechanics and began with exactly the left-hand side I had typed. To its left came Wolfgang Pauli with another occurrence of {i\hbar}. None of the output had the time variable {t}, however, so I put it in:



Bingo—the card labeled simply Schrödinger Equation appears at upper center. Unlike Wikipedia’s version, it includes r standing for other coordinates and—hence—properly uses partial derivatives. Otherwise it is exactly what my search intended to summon.

I felt the real reward came in the other cards. I did not know that Hubble’s law had such a simple statement. I knew quantum mechanics takes glory in symmetries, of course, but did not know what equation would bear the definitive moniker, “Symmetry in Quantum Mechanics.” I remember as a child the fun of unstructured time in libraries where the physical card catalog was sorted by theme and one could browse adjacent ideas, as also on the shelves.

Now I put in the ket notation, the hat to make {\hat{H}}, and the uppercase {\Psi}. The latter two changes evidently moved the Schrödinger Equation card to the top of the deck, with Pauli Equation moving up behind it, but other cards completely changed:



I had not known the term Einselection. Finally, I decided to wipe the canvas and simply enter {H\Psi = E\Psi} as the minimalist form of Schrödinger’s equation. I’ll leave you to see what the system comes up with on your own fresh canvas.

Search and Research

The goal is to augment search that includes equations as well as text. For instance, right now I’d like to find sources that use a formula like

\displaystyle  \sum_{r=0}^{\infty}\frac{r^2}{e^{ar}}

in the context of statistical tests for distinguishing between distributions. I’ve had success with Google on smaller pieces of TeX but this chunk yields nothing sensible. Changing the variable “{r}” to “{x}” changes some of the results but comes no closer; nor does changing to {i} or editing the sum to begin with {1} not {0}. The search should somehow recognize that {a} is a constant but is also the main parameter.

The fact that I’ve typed a particular LaTeX form might be an impediment. I could have written {r^2 e^{-ar}} in the body of the sum without using a fraction. The MathDeck documentation focuses on enabling mathematical search for non-LaTeX users, but independence from syntax for all users is a commensurate goal. The idea is to make formulas “chunks” in their own right, chunks governed by semantics more than syntax, and promote saving and recombining them. For instance, I could save the body and replace the sum by an integral.

The visual unit for this in MathDeck is the blue oval enclosing a formula. They can be created and edited at the top, imported to make a new card, and combined onto the canvas to build up larger formulas. The paper calls them “chips” but for me they evoke hieroglyphic cartouches enclosing royal names. Cards can be marked as favorites and the collection added to.

Here is an example where I made a cartouche and card out of the abstract form of the main equation in my chess model, as I expounded in a post to mark the 2018 world chess championship match in London.



Once again the system pitches in with interesting associations. Some were expected but others are surprises. Logit and logarithmic loss are naturally associated but I had not heard of “Perplexity,” and what is Benford’s Law doing here? (Dick and I have been trying to find natural exceptions to Benford’s Law in Covid-19 statistics—we’ve not had time yet to tell whether we’ll succeed.)

The searches at top are still “vanilla” Google search, and the search below the canvas is only within a deck or decks. We look forward to when a truly smart integration of mathematics into major search engines will be engendered by this project.

Open Problems

How do you see MathDeck and the larger MathSeer project growing in the near future? We hope MathDeck stokes some immediate enjoyment and curiosity.

One Comment leave one →
  1. July 23, 2020 12:45 pm

    In a related story …

    https://mathoverflow.net/a/5680

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s