Thursday, August 27, 2020

Charlie Pavitt: Open the Hall of Fame to sabermetric pioneers

This guest post is from occasional contributor Charlie Pavitt. Here's a link to some of Charlie's previous posts.


Induction into the National Baseball Hall of Fame (HOF) is of course the highest honor available to those associated with the game.  When one thinks of the HOF, one first thinks of the greatest players, such as the first five inductees in 1936 (Cobb, Johnson, Matthewson, Ruth, and Wagner). But other categories of contributors were added almost immediately; league presidents (Morgan Bulkeley, Ban Johnson) and managers (Mack, McGraw) plus George Wright in 1937, pioneers (Alexander Cartwright and Henry Chadwick) in 1938, owners (Charles Comiskey) in 1939, umpires (Bill Klem) and what would now be considered general managers (Ed Barrow) in 1953, and even union leaders (Marvin Miller, this year for induction next year). There is an additional type of honor associated with the HOF for contributions to the game; the J. G. Taylor Spink Award (given by the Baseball Writers Association of America) annually since 1962, the Ford C. Frick Award for broadcasters annually since 1978, and thus far five Buck O’Neill Lifetime Achievement Awards given every three years since 2008.  Even songs get honored ("Centerfield", 2010; "Talkin' Baseball", 2011).

But what about sabermetricians? Are they not having a major influence on the game?  Are there not some who are deserving of an honor of this magnitude?

I am proposing that an honor analogous to the Spink, Frick, and O’Neill awards be given to sabermetricians who have made significant and influential contributions to the analytic study of baseball. I would have called it the Henry Chadwick Award to pay tribute to the inventor of the box score, batting average, and earned run average, but SABR has already reserved that title for its award for research contributions, a few of which have gone to sabermetricians but most to other contributors. So instead I will call it the F. C. Lane award, not in reference to Frank C. Lane (general manager of several teams in the 1950s and 1960s) but rather Ferdinand C. Lane, editor of the Baseball Magazine between 1911 and 1937. Lane wrote two articles for the publication ("Why the System of Batting Should Be Reformed," January 1917, pages 52-60; "The Base on Balls," March 1917, pages 93-95) in which he proposed linear weight formulas for evaluating batting performance, the second of which is remarkably accurate.

I shall now list those whom I think have made "significant and influential contributions to the analytic study of baseball" (that phrase was purposely worded in order to delineate the intent of the award). The HOF began inductions with five players, so I will propose who I think should be the first five recipients:

George Lindsay

Between 1959 and 1963, based on data from a few hundred games either he or his father had scored, George Lindsay published three academic articles in which he examined issues such as the stability of the batting average, average run expectancies for each number of outs during an inning and for different innings, the length of extra-inning games, the distribution of total runs for each team in a game, the odds of winning games with various leads in each inning, and the value of intentional walks and base stealing. It was revolutionary work, and opened up areas of study that have been built upon by generations of sabermetricians since.


Bill James

Starting with his first-self-published Baseball Abstract back in 1977, James built up an audience that resulted in the Abstract becoming a conventionally-published best seller between 1982 and 1988.  During those years, he proposed numerous concepts – to name just three, Runs Created, the Pythagorean Equation, and the Defensive Spectrum – that have influenced sabermetric work ever since.  But at least if not more important were his other contributions.  He proposed and got off the ground Project Scoresheet, the first volunteer effort to compile pitch-by-pitch data for games to be made freely available to researchers; this was the forerunner and inspiration for Retrosheet. During the same years as the Abstract was conventionally published, he oversaw a sabermetric newsletter/journal, the Baseball Analyst, which provided a pre-Internet outlet for amateur sabermetricians (including myself) who had few if any other opportunities to get their work out to the public.  Perhaps most importantly, his work was the first serious sabermetric (a term he coined) analysis many of us saw, and served as an inspiration for us to try our hand at it too. I might add that calls for James to be inducted into the Hall itself can be found on a New York Times article from January 20, 2019 by Jamie Malinowski and the Last Word on Baseball website by its editor Evan Thompson.

Pete Palmer

George Lindsay’s work was not readily available. The Hidden Game of Baseball, written by Palmer and John Thorn, was, and included both a history of previous quantitative work and advancement on that work in the spirit of Lindsay’s. Palmer’s use of linear-weight equations to measure offensive performance and of run expectancies to evaluate strategy options were not entirely new, as Lane and Lindsay had respectively been first, but it was Palmer’s presentation that served to familiarize those that followed with these possibilities, and as with James these were inspirations to many of us to try our hands at baseball analytics ourselves.  Probably the most important of Palmer’s contributions has been On-base Plus Slugging (OPS), one of the few sabermetric concepts to have become commonplace on baseball broadcasts.


David Smith

I’ve already mentioned Project Scoresheet, which lasted as a volunteer organization from 1984 through 1989. I do not wish to go into its fiery ending, a product of a fight about conflict of interest and, in the end, money.  Out of its ashes like the proverbial phoenix rose Retrosheet, the go-to online source for data describing what occurred during all games dating back to 1973, most games back to 1920, and some from before then. Since its beginning, those involved with Retrosheet have known not to repeat the Project’s errors and have made data freely available to everyone even if the intended use for that data is personal financial profit. Dave Smith was the last director of Project Scoresheet, the motivator behind the beginning of Retrosheet, and the latter’s president ever since. Although it is primed to continue when Dave is gone, Retrosheet’s existence would be inconceivable without him.  Baseball Prospectus’s analyst Russell Carleton, whose work relies on Retrosheet, has made it clear in print that he thinks that Dave should be inducted into the Hall itself.


Sean Forman

It is true that Forman copied from other sources, but no matter; it took a lot of work to begin what is now the go-to online source for data on seasonal performance. Baseball Reference began as a one-man sideline for an academic, and has become home to information about all American major team sports plus world-wide info on “real” football. 



Here are two others that I believe should eventually be recipients.

Sherri Nichols

Only two women have been bestowed with HOF-related awards; Claire Smith is a past winner of the Spink Award and Rachel Robinson is a recipient of the O’Neill Award.  Sherri Nichols would become the third. I became convinced that she deserved it after reading Ben Lindbergh’s tribute, and recommend it for all interested in learning about the "founding mother" of sabermetrics. I remember when the late Pete DeCoursey (I was scoring Project Scoresheet Phillies games and he was our team captain) proposed the concept of Defensive Average, for which (as Lindbergh’s article noted) Nichols did the computations. This was revolutionary work at that time, and laid the groundwork for all of the advanced fielding information we now have at our disposal.


Tom Tango

Tango has had significant influence on many areas of sabermetric work, two of which have joined Palmer’s OPS as commonplaces on baseball-related broadcasts. Wins Above Replacement (WAR) was actually Bill James’s idea, but James never tried to implement it. Tango has helped define it, and his offensive index wOBA is at the basis of the two most prominent instantiations, those from Baseball Reference (alternatively referred to as bWAR and rWAR) and FanGraphs (fWAR).  Leverage was an idea whose time had come, as our blogmaster Phil Birnbaum came up with the same concept at about the same time, but it was Tango’s usage that became definitive. His Fielding Independent Pitching (FIP) corrective to weaknesses in ERA is also well-known and often used. Tango currently oversees data collection for MLB Advanced Media, and has done definitive work on MLBAM’s measurement of fielding (click here for a magisterial discussion of that topic).

There are some historical figures that might be deserving; Craig Wright, Dick Cramer, and Allan Roth come to mind as possibilities. Maybe even Earnshaw Cook, as wrong as he was about just about everything, because of what he was attempting to do without the data he needed to do it right (see his Percentage Baseball book for a historically significant document). Perhaps the Award could also go to organizations as a whole, such as Baseball Prospectus and FanGraphs; if so, SABR should get it first.

Labels: ,

Wednesday, August 05, 2020

The NEJM hydroxychloroquine study fails to notice its largest effect

Before hydroxychloroquine was a Donald Trump joke, the drug was considered a promising possibility for prevention and treatment of Covid-19. It had been previously shown to work against respiratory viruses in the lab, and, for decades, it was safely and routinely given to travellers before departing to malaria-infested regions. A doctor friend of mine (who, I am hoping, will have reviewed this post for medical soundness before I post it) recalls having taken it before a trip to India.

Travellers start on hydroxychloroquine two weeks before departure; this gives the drug time to build up in the body. Large doses at once can cause gastrointestinal side effects, but since hydroxychloroquine has a very long half-life in the body -- three weeks or so -- you build it up gradually.

For malaria, hydroxychloroquine can also be used for treatment. However, several recent studies have found it to be ineffective treating advanced Covid-19.

That leaves prevention. Can hydroxychloroquine be used to prevent Covid-19 infections? The "gold standard" would be a randomized double-blind placebo study, and we got one a couple of months ago, in the New England Journal of Medicine (NEJM). 

It concluded that there was no statistically significant difference between the treatment and placebo groups, and concluded

"After high-risk or moderate-risk exposure to Covid-19, hydroxychloroquine did not prevent illness compatible with Covid-19 or confirmed infection when used as postexposure prophylaxis within 4 days after exposure."

But ... after looking at the paper in more detail, I'm not so sure.


The study reported on 821 subjects who had been exposed, within the past four days, to a patient testing positive for Covid-19. They received a dose of either hydroxychloroquine or placebo for the next five days (the first day was a higher "loading dose"), and followed over the next couple of weeks to see if they contracted the virus.

The results:

49 of 414 treatment subjects (11.8%) became infected
58 of 407   placebo subjects (14.3%) became infected.

That's about 17 percent fewer cases in patients who got the real drug. 

But that wasn't a large enough difference to show statistical significance, with only about 400 subjects in each group. The paper recognizes that, stating the study was designed only with sufficient power to find a reduction of at least 50 percent, not the 17 percent reduction that actually appeared. Still, by the usual academic standards for this sort of thing, the authors were able to declare that "hydroxychloroquine did not prevent illness."

At this point I would normally rant about statistical significance and how "absence of evidence is not evidence of absence."  But I'll skip that, because there's something more interesting going on.


Recall that the study tested hydroxychloroquine on subjects who feared they were already exposed to the virus. That's not really testing prevention ... it's testing treatment, albeit early treatment. It does have elements of prevention in it, as perhaps the subjects may not have been infected at that point, but would be infected later. (The study doesn't say explicitly, but I would assume some of the exposures were to family members, so repeated exposures over the next two weeks would be likely.)

Also: it did take five days of dosing until the full dose of hydroxychloroquine was taken. That means the subject didn't get a full dose until up to nine days after exposure to the virus.

So this is where it gets interesting. Here's Figure 2 from the paper:

These lines are cumulative infections during the course of the study. As of day 5, there were actually more infections in the group that took hydroxychloroquine than in the group that got the placebo ... which is perhaps not that surprising, since the subjects hadn't finished their full doses until that fifth day. By day 10, the placebo group has caught up, but the groups are still about equal.

But now ... look what happens from Day 10 to Day 14. The group that got the hydroxychloroquine doesn't move much ... but the placebo group shoots up.

What's the difference in new cases? The study doesn't give the exact numbers that correspond to the graph, so I used a pixel ruler to measure the distances between points of the graph. It turns out that from Day 10 to Day 14, they found:

-- 11 new infections in the placebo group
--  2 new infections in the hydroxychloroquine group.

What is the chance that of 13 new infections, they would get split 11:2? 

About 1.12 percent one-tailed, 2.24 percent two-tailed.

Now, I know that it's usually not legitimate to pick specific findings out of a study ... with 100 findings, you're bound to find one or two random ones that fall into that significance level. But this is not an arbitrary random pattern -- it's exactly what we would have expected to find if hydroxychloroquine worked as a preventative. 

It takes, on average, about a week for symptoms to appear after COVID-19 infection. So for those subjects in the "1-5" group, most were probably infected *before* the start of their hydroxychloroquine regimen (up to four days before, as the study notes). So those don't necessarily provide evidence of prevention. 

In the "6-10" group, we'd expect most of them to have been already infected before the drugs were administered; the reason they were admitted to the study in the first place was because they feared they had been exposed. So probably many of those who didn't experience symptoms until, say, Day 9, were already infected but had a longer incubation period. Also, most of the subsequently-infected subjects in that group probably got infected in the first five days, while they didn't have a full dose of the drug yet.

But in the last group, the "11-14" group, that's when you'd expect the largest preventative effect -- they'd have had a full dose of the drug for at least six days, and they were the most likely to have become infected only after the start of the trial.

And that's when the hydroxychloroquine group had an 84 percent lower infection rate than the placebo group.


In everything I've been reading about hydroxychloroquine and this study, I have not seen anyone notice this anomaly, that beyond ten days, there were almost seven times as many infections among those who didn't get the hydroxychloroquine. In fact, even the authors of the study didn't notice. They stopped the study on the basis of "futility" once they realized they were not going to achieve statistical significance (or, in other words, once they realized the reduction in infections was much less than the 50% minimum they would endorse). In other words: they stopped the study just as the results were starting to show up! 

And then the FDA, noting the lack of statistical significance, revoked authorization to use hydroxychloroquine.

I'm not trying to push hydroxychloroquine here ... and I'm certainly not saying that I think it will definitely work. If I had to give a gut estimate, based on this data and everything else I've seen, I'd say ... I dunno, maybe a 15 percent chance. Your guess may be lower. Even if your gut says there's only a one-in-a-hundred chance that this 84 percent reduction is real and not a random artifact ... in the midst of this kind of pandemic, isn't even 1 percent enough to say, hey, maybe it's worth another trial?

I know hydroxychloroquine is considered politically unpopular, and it's fun to make a mockery of it. But these results are strongly suggestive that there might be something there. If we all agree that Trump is an idiot, and even a stopped clock is right twice a day, can we try evaluating the results of this trial on what the evidence actually shows? Can we not elevate common sense over the politics of Trump, and the straitjacket of statistical significance, and actually do some proper science?