# Explanations and Explorations

*Comparing proofs for the Jaccard metric*

BetterExplained source |

Kalid Azad is the founder of the website *Better Explained*. It is devoted to explaining mathematical concepts. He also has written two books.

Today we discuss how some proofs provide a concise *explanation* whereas others promote *exploration* of related concepts.

Azad’s site has a rich page titled, “Math Proofs vs. Explanations (aka Nutrition vs. Taste).” It argues that the best *explanations* start with an analogy to a relation that readers already understand. Even if the connection is not sharp, it can be refined once the reader’s attention is solid. This is opposed to a formal proof in which every step is sharp and correct but intuition is wanting.

To this we add the role proofs can play in *exploration*. If you have one proof of a theorem that you understand, there is value in seeking other proofs that use other ideas. Usually we think of ideas as coming first—as thoughts we refine into a proof. The advantage of starting with a proof is already having certitude and sharpness—you know a recipe that works and now can try varying and augmenting it.

## Jaccard Distance as Example

Azad’s page gives examples of proofs for the Pythagorean Theorem and for . It then quotes from William Thurston’s essay “On Proofs and Progress in Mathematics,” which we once mentioned. We will use the example of Jaccard distance from our previous post. We start with this definition:

now using for the symmetric difference . So the triangle inequality becomes, for any finite sets :

We think the proof we gave in the last post is simple and direct and intuitive but maybe not explorative. It first connects the solid understanding that without the denominators this would be the well-known triangle inequality for Hamming distance. To reprise, it considers fixed and varies to arrive at that simpler fact in three steps:

- If contains elements not in then removing them subtracts from both right-hand numerators and both right-hand denominators. Since those fractions are each (else 1 would be immediately true), the right-hand side goes down.
- Then we have and can replace the denominators by without increasing the right-hand side.
- Now we have a common denominator and a statement equivalent to the known truth about Hamming distance. Since undoing the first two steps to restore the original can only increase the right-hand side, (1) is proved in all cases.

This reasoning readily extends to nonnegative measures on besides simple counting, provided the removal of elements from makes the same additive-or-proportional change to as it does to , and likewise for the other fraction.

## Three Snapshot Proofs

The first short proof should join the pantheon of half-page journal papers. Under fair use, here it is in one screenshot:

Perhaps this is *too* short. We think this proof would have been more satisfying if a few more lines of calculation had been added. Let us divide the region into its inner part and outer part and do likewise for . Then it seems the intent was:

The end uses the symmetric-difference definition of , so perhaps fully expanding this paper’s intent would have been longer. One can also begin with that definition to get a shorter calculation, but it skips over the step. Indeed, it does not mention at all, so it was not intended. The proof by Artur Grygorian and Ionut Iacob in a short paper in last October’s *College J. Math.* strikes us as a similar-style proof.

The second proof comes from a MathOverflow thread. It assigns a variable to each region of the Venn diagram, forms the fractions, and cross-multiplies to obtain “the following monstrosity”:

The fact that no coefficient is negative completes the proof. This is clear from a computer algebra system, but what about *why* no negative term appears?

We have realized since the last post that the second of two proofs given in the 2016 paper by Sven Kosub, which we linked in that post, is really equivalent to ours. This is easier to see if one just presumes in the following:

Here *sub-modularity* is a standard property for which Kosub cites the equivalent condition that whenever and ,

This suffices for step 1 of our earlier proof, first taking then ; the rest of that proof needs only that is monotone (and implicitly ).

## A Magical Proof

Now we look at proofs that add ideas. The first one still strikes us as clean and magical. We are computer scientists so it is natural to think of finite sets as binary-valued vectors of length . They have a in position precisely when is in the set. Of course is the size of the “universe.”

Now let be such a non-zero vector. The key is to use a probabilistic proof. We will show that we can relate the Jaccard distance to the outcome of a simple random experiment. The experiment once selected leads to a simple proof—it only requires the union bound. Recall this is the fact that

for any two events and .

The cool idea is to look at the permutations of the vector . For a permutation let us define to be

Let provided is the first value that is equal to . Of course since is non-empty it follows that this is well defined.

Note is a random variable that depends on the choice of the permutation . The key is to see that the probability that when we average over all permutations uniformly is equal to

This follows by noting that there are ways to select the same and there are total ways to select an . Complementing gives us that the probability of equals .

Now hark back to our sets . The event

is subsumed by the disjunction of events

regardless of what is. By the simple union bound, the probability of the first event is at most the sum of the probabilities of the latter two events. We have thus proved

The last step is the same as in the proof that Hamming distance is a metric. What does the randomized view gain us? It gains a nice interpretation of as the probability that and hash to different values under the *min-hash* function for random . Min-hashing is used all the time—see this book chapter by Jure Leskovec, Anand Rajaraman, and Jeffrey Ullman, with this proof in section 3.3.3.

## A Gradient Idea

Atri Rudra suggested to us the “game” of adjusting one element at a time to walk it toward an extreme value. The sets and can be adjusted too. We start by assuming the triangle inequality (1) is false and make moves that can only keep it that way, until we reach a case where it is obviously true.

Step 1 of our proof already plays this game by removing from any elements not in . So we really start the game with and we want to walk it to . Simply replacing the denominators and in (1) by was good in step 2 of the proof but is not a legal move in this game.

What we can do legally is add elements from to : those leave the denominators unchanged but lower the numerators and . The interesting case is when we want to add to an element from or from . The former add decreases the numerator and increases the denominator while leaving unchanged, but it *increases* the numerator . Let us abstract the right-hand side of (1) to . Then the former add converts it to

If *both* moves increase the right-hand side, then we must have

And from

But cross-multiplying gives the contradiction . So one or both moves must always be possible. This grows to include either all of or all of . The rest of the argument to gobble up all of we leave to you, dear readers.

Compared to the above proofs, this is tedious. But it captures some tensions among the sizes of , , and that may inform intuitions about Jaccard similarity under changes in the sets.

## Open Problems

Which proof do you like best for explanation and which for creative impulse?

This is our post. We intended this discussion as number 800 but were surprised to find the simple proof by reduction to triangle for Hamming distance (steps numbered 1-2-3 above). Are we really the first to write it down, with acknowledgment also to Kosub?

[some typo fixes]

Typo fixes: In Gilbert expansion, extraneous union symbol on 1st line. Missing bars in 2nd line denominator. In “Magical Proof”, para. 2, sentence 2, missing verb.

Well noted—thanks very much. Fixed.

P versus NP is considered as one of the most important open problems in computer science. This consists in knowing the answer of the following question: Is P equal to NP? A precise statement of the P versus NP problem was introduced independently by Stephen Cook and Leonid Levin. Since that date, all efforts to find a proof for this problem have failed. Given a positive integer x and a collection S of positive integers, MAXIMUM is the problem of deciding whether x is the maximum of S. We prove this problem is complete for P. Another problem is SUCCINCT-MAXIMUM. SUCCINCT-MAXIMUM contains the instances of MAXIMUM that can be represented by an exponentially more succinct way. We show this succinct version of MAXIMUM must not be in P. Under the assumption of P = NP, we prove the membership of SUCCINCT-MAXIMUM in P is also hold. In this way, we demonstrate the complexity class P is not equal to NP by the reduction ad absurdum rule. Another major complexity classes are LOGSPACE and NLOGSPACE. Whether LOGSPACE = NLOGSPACE is another fundamental question that it is as important as it is unresolved. We show the problem MAXIMUM can be solved in logarithmic space. Consequently, we demonstrate the complexity class LOGSPACE is equal to P and thus, LOGSPACE is equal to NLOGSPACE.

https://www.academia.edu/38039040/LOGSPACE_vs_NLOGSPACE_and_P_vs_NP

You can participate in the debate

Thanks