# Explaining The Jaccard Metric

*Why is it a metric?*

Composite of source 1, source 2 |

Paul Jaccard was a botanist who worked at ETH in Zurich during much of the first half of the 20th century. He created, or discovered, the similarity notion that became the Jaccard metric. Very neat having a metric named after you.

Today we discuss proofs and explanations that the metric is indeed a metric.

The Jaccard *index* is the ratio of to . The metric is

provided and are not both empty sets. If and are both empty then by definition is and so is . Generally we can assume that all the sets are non-empty.

The key question is to show that this satisfies the **triangle inequality**. That is, we must show that

Many proofs of this are known, and it has been remarked that some are fairly complicated. Some are short, but continued recent interest seems to say they haven’t satisfied as *explanations*.

We think we can supply a quick explanation, if you are already familiar with the triangle inequality holding for Hamming distance on sets:

where is symmetric difference. By rewriting [*] as we can see that [**] becomes similar but includes denominators. If [**] is *false* then

Now if we have then replacing both right-hand denominators by cannot make the right-hand side of [***] bigger. But then we have a common denominator, and we can see from Hamming distance that [**] must be true.

So suppose includes elements not in . The left-hand side of [***] is at most , so each right-hand fraction must be of the form where $latex {p \frac{p – b}{q – b}}&fg=000000$, so removing those elements from would also make the right-hand side of [***] smaller. That brings us back to the case and the previous contradiction. So [**] must be true. That’s our proof and explanation in brief.

## A Careful Take

We will do the above proof more slowly and carefully, to ensure it is really clear. A convention: to make the formulas a bit more readable we use to denote the intersection of and . So the Jaccard metric is now

First we explain the main ideas:

We will argue that the intermediate set in the triangle inequality can be constrained. The set can be a subset of . The intuition is that any extra elements in can only make the triangle inequality weaker.

We will replace the definition of the Jaccard metric by equivalent one. This new definition is much closer to a known metric.

We will reduce the triangle inequality finally to a known triangle inequality.

*Proof:*

Let’s assume that are the sets, and we wish to prove the triangle inequality:

**Claim**: We can assume that . Suppose there was an element in but not in . Then removing from would only tighten the triangle inequality. That is the LHS

stays the same and the RHS terms

can only decrease.

**Claim**: We can re-write as

As noted above, is the set of elements that are in or but not both.

**Claim**: We note that after applying the last claim three times, [**] becomes:

**Claim** : Since we can multiply by and get that the triangle inequality is implied by

Note this uses that

which follows from

**Claim** : But the last step is that

is just the triangle inequality for the Hamming distance. It uses that

holds for any single bits .

## How We Found The Proof

If you’re curious how we found the above, we were trying to check a different kind of proof. We started with the statement of the triangle inequality:

This looks a bit scary, with its multiple ratios and addition and subtraction. But the following feature jumps out:

The right side depends on but the left side does not.

This suggests the idea of asking what happens if we change one element at a time? Can we “walk” it to an extreme point at which the truth of [**] is obvious? A promising start was that we could remove any elements from that are not in , as shown above. So we can assume . Can we move toward or at least while preserving the implication of truth for [**]?

Seen in this light, our above proof’s replacing the denominators by is an “illegal move”—not a change to —so less interesting. But we happened to notice it worked. Let’s follow the train of thought from where we got that can be assumed. Okay, what the next move to make with ? Look again at the key expression:

Can we simplify this in some way? The answer is yes. The structure of minus an expression suggested that perhaps we could combine the and the ratios. Indeed it is not too hard to note that

Okay perhaps this is not obvious. It is not trivial, but it is a standard idea that looks like the complementation of the “probabilty” ratio . Once you think of this the exact formula follows. Thus we can re-write the required triangle inequality now as

We are almost done. The ratios are annoying, so can we get rid of them? We have assumed that . So it seems like a good idea to assume—for the moment—that is actually equal to . But then it is easy to see that and . So the above becomes after multiplying by ,

This looks really nice, no more ratios. Wait what is this expression? The is the exclusive-or function and it is not hard to note that this is the classic Hamming distance. So this inequality is a fact. Recall the Hamming distance records the number of differences between two bit-vectors, but sets are really just bit vectors.

Are we there yet? Almost. We only need to argue that if is less than the inequality we need is actually stronger. So we are done.

## Open Problems

Do you like our proof? Is it clear from the intro—or from the second section? Or are you still unsure why the Jaccard metric satisfies the triangle inequality? We may follow up with more about other proofs.

We don’t think the photo used by this online Alchetron bio of Paul Jaccard is the botanist. It looks too modern, for one. No other photo seems extant. Google Images guesses Alchetron’s image to be Adrian Herzog, but we think its highest Jaccard index is to Orepic user paul.jaccard, as shown at top. Can you solve the mystery of who?

[Switched to standard J-sub-delta notation to clarify and fix issues, fixed HTML conversion glitch with p-q fractions]

If we identify sets with the uniform distribution on them, is there a way to extend the Jaccard metric to arbitrary probability distributions?

This is partly the motivation I see in Sven Kosub’s paper. We intend a followup.

I think the image on the left of your composite is of Jean-Paul Jaccard, who is the executive director of a german company: https://stramec.ch/portrait/about-us/

Thanks very much—I think you have ID-ed him.

If A and B are both empty, shouldn’t J(A, B) be 0? I think you are mixing up J and d_J, to use the notation of the Wikipedia article.

My favorite proof is via minhash: for a random permutation \pi of the universe, let h_{\pi}(A) for each set A denote the smallest element in A w.r.t. \pi. Then it can be seen that:

Pr_{\pi}[h_{\pi}(A) \ne h_{\pi}(B)] = J(A,B)

This is an embedding of the Jaccard metric isometrically to the (scaled) discrete metric. Now the triangle inequality follows.

This viewpoint of using hashing to understand similarity was perhaps first expounded by Moses Charikar in “Similarity estimation techniques from rounding algorithms”, STOC 2002.

Another cute application that I like is that the Geomens-Williamson hyperplane rounding allows one to show that the angular distance between unit vectors in R^d is a metric. For d=3, this is apparently the “great circle” distance for which the standard proof, as far as I know, has more of a geometric flavor.

Thanks. This is the most concise explication of the minhash proof we’ve found. A discussion of Jeffrey Ullman’s version (section 3.3.3 here) was originally intended for this post and may be in a followup.

Should J(A,B) be 0 if A and B are both empty?

Thanks, Peter and Aaron above. I went with for the metric.

I think that Jaccard metric can be derived from the (more general) fact that the following function, defined on pairs of finite random variables, is a metric. The function is , where denote the mutual information and the joint entropy. Horibe (1973) proved that is indeed a metric and the proof is very, very short. Now, it is well known that any valid formula with entropies and mutual information can be transformed into an equally valid formula (involving finite sets) in which is substituted with and with , where is an arbitrary set-additive function (see Csiszar and Korner Information theory book, p. 37). One gets Jaccard metric by choosing be equal to set cardinality . This should also answer Chris Moore question, I believe.