Sabermetric Research: Herd immunity comes faster when some people are more infectious

By now, we all know about "R0" and how it needs to drop below 1.0 for use to achieve "herd immunity" to the COVID virus.

The estimates I've seen is that the "R0" (or "R") for the COVID-19 virus is around 4. That means, in a susceptible population with no interventions like social distancing, the average infected person will pass the virus on to 4 other people. Each of those four passes it on to four others, and each of those 16 newly-infected people pass it on to four others, and the incidence grows exponentially.

But, suppose that 75 percent of the population is immune, perhaps because they've already been infected. Then, each infected person can pass the virus on to only one other person (since the other three who would otherwise be infected, are immune). That means R0 has dropped from 4 to 1. With R0=1, the number of infected people will stay level. As more people become immune, R drops further, and the disease eventually dies out in the population.

That's the argument most experts have been making, so far -- that we can't count on herd immunity any time soon, because we'd need 75 percent of the population to get infected first.

But that can't be right, as "Zvi" points out in a post on LessWrong*.

(*I recommend LessWrong as an excellent place to look to for good reasoning on coronavirus issues, with arguments that make the most sense.)

That's because not everyone is equal in terms of how much they're likely to spread the virus. In other words, everyone has his or her own personal R0. Those with a higher R0 -- people who don't wash their hands much, or shake hands with a lot of people, or just encounter more people for face-to-face interactions -- are also likely to become infected sooner. When they become immune, they drop the overall "societal" R0 more than if they were average.

If you want to reduce home runs in baseball by 75 percent, you don't have to eliminate 75 percent of plate appearances. You can probably do it by eliminating as little as, say, 25 percent, if you get rid of the top power hitters only.

As Zvi writes,

"Seriously, stop thinking it takes 75% infected to get herd immunity...

"... shame on anyone who doesn’t realize that you get partial immunity much bigger than the percent of people infected.

"General reminder that people’s behavior and exposure to the virus, and probably also their vulnerability to it, follow power laws. When half the population is infected and half isn’t, the halves aren’t chosen at random. They’re based on people’s behaviors.

"Thus, expect much bigger herd immunity effects than the default percentages."

-------

But to what extent does the variance in individual behavior affect the spread of the virus? Is it just a minimal difference, or is it big enough that, for instance, New York City (with some 20 percent of people having been exposed to the virus) is appreciably closer to herd immunity than we think?

To check, I wrote a simulation. It is probably in no way actually realistic in terms of how well it models the actual spread of COVID, but I think we can learn something from the differences in what the model shows for different assumptions about individual R0.

I created 100,000 simulated people, and gave them each a "spreader rating" to represent their R0. The actual values of the ratings don't matter, except relative to the rest of the population. I created a fixed number of "face-to-face interactions" each day, and the chance of being involved in one is directly proportional to the number. So, people with a rating of "8" are four times as likely to have a chance to spread/catch the virus than people with a rating of "2".

Each of those interactions, if it turns out person was infected and one wasn't, there was a fixed probability of the infection spreading to the other person.

For every simulation, I jigged the numbers to get the R0 to be around 4 for the first part of the pandemic, from the start until the point where around three percent of the population was infected.

The simulation started with 10 people newly infected. I assumed that infected people could spread the virus only for the first 10 days after infection.

--------

The four simulations were:

1. Everyone has the same rating.

2. Everyone rolls a die until "1" or "2" comes up, and their spreader rating is the number of rolls it took. (On average, that's 3 rolls. But in a hundred thousand trials, you get some pretty big outliers. I think there was typically a 26 or higher -- 26 consecutive high rolls happens one time in 37,877.)

3. Same as #2, except that 1 percent of the population is a superspreader, with a spreader rating of 30. The first nine infected people were chosen randomly, but the tenth was always set to "superspreader."

4. Same as #3, but the superspreaders got an 80 instead of a 30.

--------

In the first simulation, everyone got the same rating. With an initial R0 of around 4, it did, indeed, take around 75 percent of the population to get infected before R0 dropped below 1.0.

Overall, around 97 percent of the population wound up being infected before the virus disappeared completely.

Here's the graph:

The point where R0 drops below 1.0 is where the next day's increase is smaller than the previous day's increase. It's hard to eyeball that on the curve, but it's around day 32, where the total crosses the 75,000 mark.

-------

As I mentioned, I jigged the other three curves so that for the first days, they had about the same R0 of around 4, so as to match the "everyone the same" graph.

Here's the graph of all four simulations for those first 22 days:

Aside from the scale, they're pretty similar to the curves we've seen in real life. Which means, that, based on the data we've seen so far, we can't really tell from the numbers which simulation is closest to our true situation.

But ... after that point, as Zvi explained, the four curves do diverge. Here they are in full:

Big differences, in the direction that Zvi explained. The bigger the variance in individual R0, the more attenuated the progression of the virus.

Which makes sense. All four curves had an R0 of around 4.0 at the beginning. But the bottom curve was 99 percent with an average of 3 encounters, and 1 percent superspreaders with an average of 80 encounters. Once those superspreaders are no longer superspreading, the R0 plummets.

In other words: herd immunity brings the curve under control by reducing opportunity for infection. In the bottom curve, eliminating the top 1% of the population reduces opportunity by 40%. In the top curve, eliminating 1% of the population reduces opportunity only by 2%.

-------

For all four curves, here's where R0 dropped below 1.0:

75% -- all people the same
58% -- all different, no superspreaders
44% -- all different, superspreaders 10x average
20% -- all different, superspreaders 26x average

And here's the total number of people who ever got infected:

97% -- all people the same
81% -- all different, no superspreaders
65% -- all different, superspreaders 10x average
33% -- all different, superspreaders 26x average

--------

Does it seem counterintuitive that the more superspreaders, the better the result? How can more infecting make things better?

It doesn't. More *initial* infecting makes things better *only holding the initial R0 constant.*

If the aggregate R0 is still only 4.0 after including superspreaders, it must mean that the non-superspreaders have an R0 significantly less than 4.0. You can think of a "R=4.0 with superspreaders" society like maybe a "R=2.0" society that's been infected by 1% gregarious handshaking huggers and church-coughers.

In other words, the good news is: if everyone were at the median, the overall R0 would be less than 4. It just looks like R0=4 because we're infested by dangerous superspreaders. Those superspreaders will more quickly turn benign and lower our R0 faster.

---------

So, the shape of the distribution of spreaders matters a great deal. Of course, we don't know the shape of our distribution, so it's hard to estimate which line in the chart we're closest to.

But we *do* know that we at least a certain amount of variance -- some people shake a lot of hands, some people won't wear masks, some people are probably still going to hidden dance parties. So I think we can conclude that we'll need significantly less than 75 percent to get to herd immunity.

How much less? I guess you could study data sources and try to estimate. I've seen at least one non-wacko argument that says New York City, with an estimated infection rate of at least 20 percent, might be getting close. Roughly speaking, that would be something like the fourth line on the graph, the one on the bottom.

Which line is closest, if not that one? My gut says ... given that we know the top line is wrong, and from what we know about human nature ... the second line from the top is a reasonable conservative assumption. Changing my default from 75% to 58% seems about right to me. But I'm pulling that out of my gut. The very end part of my gut, to be more precise.

At least we know for sure is that the 75%, the top line of the graph, must be too pessimistic. To estimate how far pessimistic, we need more data and more arguments.

Labels: covid

5 Comments:

At Tuesday, May 19, 2020 9:12:00 PM, Cliff Blau said...: This is all supposing, of course, that there is such a thing as immunity, which we don't know yet. Also that if there is, that it lasts long enough to produce herd immunity. Furthermore, that 20% figure for NYC is probably way high, due to false positives in the testing.
At Friday, May 22, 2020 8:50:00 PM, Chris Migliaccio said...: There'd likely be variance for herd immunity among individual communities as well. In NYC, for example, a higher percentage of the population interacts with a lot of people daily without social distancing (on my average morning commute, I'd be close enough to spread this virus to at least a dozen people, while someone driving alone in a car obviously has no chance to do so). So it's possible that you get herd immunity at 20% in a more rural environment, at 40% in a suburban one, and at 60% in an urban one.
At Thursday, July 23, 2020 11:30:00 AM, Mike said...: Here's a more recent article making the same point: https://www.quantamagazine.org/the-tricky-math-of-covid-19-herd-immunity-20200630/

Thanks for explaining it ahead of time, so I could think "I already knew that" when I read it :-)

I do think Zvi is his actual given name. Uncommon as it is, it rung a bell with me from a past life, and it looks like he was/is fairly well known in Magic: the Gathering circles, where he wrote under the same name.
At Thursday, July 23, 2020 1:03:00 PM, Phil Birnbaum said...: That makes sense ... LessWrong has lots of posts about Magic: the Gathering. I've seen Zvi's last name there too on occasion, but I don't remember it offhand.
At Thursday, August 06, 2020 9:06:00 PM, Luc Boyer said...: Not sure if it matters at all in your analysis but by most accounts the infection rates appear to be considerably lower in most countries than the 20% you ascribe to the situation in New York. The following article is from La Presse (in French) but I am sure there are other articles like it:

https://www.lapresse.ca/covid-19/2020-04-22/les-etudes-suggerant-un-faible-taux-d-infection-se-multiplient

In any case, that still leaves a large segment of the population ripe for infection.

<< Home

Sabermetric Research

Sunday, May 17, 2020

Herd immunity comes faster when some people are more infectious

5 Comments:

About Me

Previous Posts