QB Score and football "box score" statistics
In "The Wages of Wins," the authors introduced a quarterback evaluation statistic called "QB Score." The formula is
QB Score = Yards – (3 * plays) – (50 * turnovers)
Basically, any play that gains fewer than three yards is a negative; any play that gains more than three yards is a positive.
In our respective reviews of the book, Roland Beech and I both criticized QB Score. Beech argued that the statistic doesn't take situation into account. "By the writers' measure," he wrote, "an 8 yard pickup on 3rd and 20 is worth more to a team than a 5 yard pickup on 3rd and 5."
My criticism was similar; I argued that teams who play a different style of football might be able to achieve the same yardage and first-down success, but with more plays. QB score would rate these kinds of teams lower than their success rate would suggest.
But I've been thinking about this a bit, and I'm wondering if the stat might not possibly be reasonable after all. (Or, at least as reasonable as a QB stat could be, given that it doesn't separate the contribution of the quarterback from the other players on his team.)
My criticism could be answered by empirical evidence showing that QB score "works" – in the sense of predicting points scored – for all kinds of teams, rushing teams as well as passing teams. I have no actual evidence that teams actually have different enough tendencies in short/long play calling to make the statistic invalid for some of them, and it's possible that it works well enough for all kinds of teams. And, even if not, you might be able to adjust the stat fairly easily by adding another variable to the regression, maybe "percentage running plays" or some such.
Beech's criticism might be countered by the fact that, over the long term, the effect of situation evens out. After all, we do accept that for baseball. Linear Weights values a home run higher than a single. That means that a solo home run in a 12-0 game is worth more than a two-run game-winning single with two outs in the bottom of the ninth. That doesn't invalidate Linear Weights, which implicitly makes the assumption that timing is random, and evens out in the long run. Can't QB Score make the same assumption?
There is a difference between the QB Score case and the Linear Weights case. In baseball, any given play almost always either helps the team, or hurts the team – that is, it always has the same sign. With rare exceptions, a single is always a good thing, always makes the team's chances of winning better. An out, on the other hand, (almost) always reduces the team's chances of winning. Hits are good; outs are bad.
On the other hand, as Roland Beech points out, an 8-yard gain on third down is sometimes good (when it's third-and-7) and sometimes bad (when it's fourth-and-9). A one-yard gain is sometimes good (on third-and-inches) and sometimes bad (on first-and-10). But QB score always counts the 8-yard gain as a positive, and the 1-yard-gain as a negative.
It seems intuitively wrong to give the quarterback a "credit" for failing to make a first down, or to "penalize" his statistics for converting a third-and-inches. In baseball, we often don't credit the situation properly, but at least we always give it the right sign. That's partly what's disconcerting to me about QB Score, that we sometimes reward failure or penalize success. It would be disconcerting to be watching a game, see the quarterback convert a two-yard pass on third-and-1, and realize that made his rating go *down*.
But if these anomalies cancel out over the long term, and QB Score actually "works," should we care that some of the component plays aren't individually accurate? I'm not sure. Maybe we're just spoiled. The structure of baseball, basketball, and hockey just happen to be such that the atom of performance we analyze – the plate appearance or the possession –turns out to be unambiguously good or bad. In football, the sign of the atom just happens to be dependent on the situation, and we're not used to that. It might be a compromise we just have to accept, unless we're willing to move to a different atom, like the "first down" or the "possession". But those comprise many separate plays, and make it impossible to isolate the performance of individual players, like the quarterback.
All this, of course, is contingent on QB Score actually being shown to work well for all types of teams and quarterbacks. I haven't seen any studies validating QB Score, and I think that's still the biggest reason for doubt. But if we want to see any non-situational "box score" statistic for football, we may have to accept that it's sometimes going to get the signs wrong at the level of individual plays.