Skip to content

Mathematics of COVID-19

May 1, 2020

Its not just {R_0}

[ Sir Francis Galton by Charles Wellington Furse ]

Francis Galton is a perfect example of a Victorian era scientist. Sir Francis, as he was knighted in 1909, had many roles: a statistician, a sociologist, a psychologist, an anthropologist, a tropical explorer, a geographer, a meteorologist, a geneticist, and an inventor. He coined the phrase “nature versus nurture” and more.

Today we trade in jokes for some mathematics of the virus.

We wish there was something clever we can say about the spread of the virus. But the statistics of the spread are complex. We wish there was something theory can contribute to the fight against the virus. But the front-line is clearly dominated by medicine and biology.

However there are two areas that are relevant. The first is the math of how fast the virus spreads and the second is how valid are the claims about the virus. The latter is an area where theory could play a role in the future.

In preparing this discussion we noted that Galton was indeed an inventor. He invented a device to demonstrate the central limit theorem. You probably have seen some version. Sometimes called the bean machine, it gives a visual demonstration of the central limit theorem. Other times it is called the Galton board. Perhaps if Galton were alive today he would be on cable news explaining how the virus spreads.

Galton also had views that are troubling. See this for example. He lived over one hundred years ago, but his views on eugenics are still upsetting. Should we not have featured him? What do you think?


The issue is will the terrible virus stop infecting people? Will it become extinct? Or will it at least stop infecting more and more people. The part of math that studies such questions was invented by Galton in 1889 as a model to track family names. We wish we were talking today about family names and not a killer virus. The area he invented is now called branching process.

There was concern amongst the Victorians that aristocratic surnames were becoming extinct. Galton originally posed the question regarding the probability of such an event in an 1873 issue of The Educational Times, and the Reverend Henry Watson replied with a solution. Together, they then wrote an 1874 paper entitled On the probability of the extinction of families.

We are interested in branching processes and when they are likely to become extinct. We want the virus to stop infecting people, and become extinct. Or at least stop its explosive growth that is so terrible. See these comments, for more information.

On Average

The contagiousness of a disease is described by its “reproduction rate” or the average number of people infected by one infectious person in a population without immunity. You might also hear this number referred to as the {R_0} value. When it is less than {1}, the disease does not become an pandemic. This process is called a branching process. In order to tell if a branching process will eventually become extinct we need more than {R_0}. That is we need to understand more than the average number of descendants. Let’s see why.

Consider a process {A} that creates {k} descendants with probability {a_{k}} and a process {B} that creates {k} descendants with probability {b_{k}}. The number of average descendants is for {A} is

\displaystyle  \mu_{A} = a_{1} + 2a_{2} + \dots

and for {B} is

\displaystyle  \mu_{B} = b_{1} + 2b_{2} + \dots

Is it always better to have the process with the smaller average? The answer is no.

Consider the process {A} so that

\displaystyle  a_{0} = 0, a_{1} = 1, a_{2} = \epsilon, a_{3} = 0, \dots

And consider the process {B} so that

\displaystyle  b_{0} = 1/n, b_{1} = 1/n, \dots, b_{n} = 1/n, b_{n+1} = 0, \dots

The first average {\mu_{A} = 1 + 2\epsilon} and the second average is

\displaystyle  1/n + 2/n + \cdots + n/n = (n+1)/2.

Clearly the second has a much larger value for {n \ge 2}. But the first will never go extinct and the second can become extinct.

It’s In the Variance

The key difference is that the second process has higher variance. The importance of the variance—as opposed to the mean—is remembered in some ways but seems to be forgotten in others. It is the nub of one of the jokes we included in the previous post:

There was a statistician who drowned crossing a river—that was only 3 feet deep on average.

For an example closer to our point, suppose a third-party candidate {C} entering a race expects to take more votes away from candidate {A} than candidate {B} so as to double the margin that {A} expects to lose by. But suppose {C} alters the dynamics of the race so that the standard deviation is quadrupled. Then {A} generally has a better chance of winning under that scenario. If the distribution is normal and the original standard deviation equaled the expected margin, then {A}‘s chances of winning improve from 16% to over 30%.

In our case, “winning” means outcomes where the virus dies out, locally and ultimately globally. Such outcomes are needed for opening up. It is not enough to reduce the number of active cases to “{O(1)}” because the branching started from such a state. Whatever active cases there are must be known and contained as well as {O(1)}. This is the situation currently claimed in New Zealand.

Branching and Uncertainty

At the other end is the state where the virus is not contained but branching stops because of saturation. When the proportion of targeted descendants who have already had the virus and are (we hope) immune is greater than the branching factor, then the expectation takes over from the variance as a determinant of stoppage. This proportion is what is meant by “herd immunity” and is estimated to be 60–70% for this virus.

What we feel is the central mystery is whether there are enough undetected cases to bring that point even possibly in range. Randomized testing in the New York area has found nearly a 25% rate of antibodies. Analogous tests in less-affected California and in other countries have found under 10% positive rates, however. Those results are subject to uncertainty in the representativeness of the samples.

Open Problems

We had planned on saying something about checking if reported data about the virus is correct, or is it faked. With so much at stake it seems smart to insist on data being verified. More on that in the future.

11 Comments leave one →
  1. Lydia permalink
    May 2, 2020 12:20 am

    Nitpick: it’s “Sir Francis” not “Sir Galton”. Supposedly the reason for this is that a knighthood, unlike a noble title, is not hereditary (although of course, “Professor” and “Doctor” aren’t hereditary either, but we still use them with surnames). Nevertheless, while one might introduce the person as “Sir Francis Galton”, it’s always “Sir Francis” after that, and it’s never correct to use “Sir” just with a surname.

    • rjlipton permalink*
      May 2, 2020 11:46 am

      Dear Lydia:

      Thanks so much. Sir Francis it is. My error.

      Best, be well


  2. Mantas Pajarskas permalink
    May 2, 2020 6:17 am

    If a_k is a probability that k descendants are created, shouldn’t a_0 + a_1 + … = 1?
    This doesn’t hold, a_1 = 1 and a_2 > 0; would it be possible to clarify the definition of a_k?

  3. May 2, 2020 12:44 pm

    You may enjoy this piece of mine, which also has some visualizations. The variance is enormous when the process is close to critical, or when there are rare superspreading events that produce many children in a single step:

  4. May 3, 2020 2:15 am

    What about base rates?

  5. May 5, 2020 9:11 am

    I think the common man is going to know the answer for this virus, before any experts, mathematical or otherwise, have all of their theoretical constants in hand. That point is imminent in my area, in my own view, already: It is clean here. The devil in the details, of course, is the promulgation of fake data by corrupt academics, politicians and other “authorities”.

    • May 5, 2020 9:16 am

      …the key, of course, is to simply observe the variance from the established past, before this virus. When it becomes indistinguishable from that background, it is done.

  6. Sandip Tiwari permalink
    May 9, 2020 2:46 am

    There are too many assumptions that are broken in trying to take this approach and apply it everywhere. Correlations and lack of randomness and independence. Variances at transition becoming large occurs precisely because of the correlations and this is renormalization that came much after Galton.

    Sweden has its societal milieu, China another, Brazil or Russia entirely different, within USA Chelsea or Bronx is different from Maine. Each one has to find its own solution.

  7. E.L. Wisty permalink
    May 21, 2020 10:23 am

    This is wonderful. Of course you both have more than enough to do, however, considering the current rona situation, you could fork a new blog on epidemiology algorithms and models as well as reliability tests of published statistics. This current state of the world seems to be getting ready for an extended stay.

  8. michaeldembinski permalink
    July 25, 2020 9:33 pm

    The problem with ‘Sir Galton’ is that such an error in the opening paragraph suggests that other errors may have cropped up elsewhere in the article. I’d recommend editing it to ‘Sir Francis’ and then deleting my comment.

    • July 26, 2020 8:17 pm

      That’s bizarre—I thought we fixed that one two months ago…see comment at top. Done now.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s