## Thursday, February 15, 2007

### QB Score and football "box score" statistics

In "The Wages of Wins," the authors introduced a quarterback evaluation statistic called "QB Score." The formula is

QB Score = Yards – (3 * plays) – (50 * turnovers)

Basically, any play that gains fewer than three yards is a negative; any play that gains more than three yards is a positive.

In our respective reviews of the book, Roland Beech and I both criticized QB Score.
Beech argued that the statistic doesn't take situation into account. "By the writers' measure," he wrote, "an 8 yard pickup on 3rd and 20 is worth more to a team than a 5 yard pickup on 3rd and 5."

My criticism was similar; I argued that teams who play a different style of football might be able to achieve the same yardage and first-down success, but with more plays. QB score would rate these kinds of teams lower than their success rate would suggest.

But I've been thinking about this a bit, and I'm wondering if the stat might not possibly be reasonable after all. (Or, at least as reasonable as a QB stat could be, given that it doesn't separate the contribution of the quarterback from the other players on his team.)

My criticism could be answered by empirical evidence showing that QB score "works" – in the sense of predicting points scored – for all kinds of teams, rushing teams as well as passing teams. I have no actual evidence that teams actually have different enough tendencies in short/long play calling to make the statistic invalid for some of them, and it's possible that it works well enough for all kinds of teams. And, even if not, you might be able to adjust the stat fairly easily by adding another variable to the regression, maybe "percentage running plays" or some such.

Beech's criticism might be countered by the fact that, over the long term, the effect of situation evens out. After all, we do accept that for baseball. Linear Weights values a home run higher than a single. That means that a solo home run in a 12-0 game is worth more than a two-run game-winning single with two outs in the bottom of the ninth. That doesn't invalidate Linear Weights, which implicitly makes the assumption that timing is random, and evens out in the long run. Can't QB Score make the same assumption?

There is a difference between the QB Score case and the Linear Weights case. In baseball, any given play almost always either helps the team, or hurts the team – that is, it always has the same sign. With rare exceptions, a single is always a good thing, always makes the team's chances of winning better. An out, on the other hand, (almost) always reduces the team's chances of winning. Hits are good; outs are bad.

On the other hand, as Roland Beech points out, an 8-yard gain on third down is sometimes good (when it's third-and-7) and sometimes bad (when it's fourth-and-9). A one-yard gain is sometimes good (on third-and-inches) and sometimes bad (on first-and-10). But QB score always counts the 8-yard gain as a positive, and the 1-yard-gain as a negative.

It seems intuitively wrong to give the quarterback a "credit" for failing to make a first down, or to "penalize" his statistics for converting a third-and-inches. In baseball, we often don't credit the situation properly, but at least we always give it the right sign. That's partly what's disconcerting to me about QB Score, that we sometimes reward failure or penalize success. It would be disconcerting to be watching a game, see the quarterback convert a two-yard pass on third-and-1, and realize that made his rating go *down*.

But if these anomalies cancel out over the long term, and QB Score actually "works," should we care that some of the component plays aren't individually accurate? I'm not sure. Maybe we're just spoiled. The structure of baseball, basketball, and hockey just happen to be such that the atom of performance we analyze – the plate appearance or the possession –turns out to be unambiguously good or bad. In football, the sign of the atom just happens to be dependent on the situation, and we're not used to that. It might be a compromise we just have to accept, unless we're willing to move to a different atom, like the "first down" or the "possession". But those comprise many separate plays, and make it impossible to isolate the performance of individual players, like the quarterback.

All this, of course, is contingent on QB Score actually being shown to work well for all types of teams and quarterbacks. I haven't seen any studies validating QB Score, and I think that's still the biggest reason for doubt. But if we want to see any non-situational "box score" statistic for football, we may have to accept that it's sometimes going to get the signs wrong at the level of individual plays.

Labels: ,

At Thursday, February 15, 2007 6:32:00 PM,  Via Chicago said...

"I argued that teams who play a different style of football might be able to achieve the same yardage and first-down success, but with more plays. QB score would rate these kinds of teams lower than their success rate would suggest."

The WoW authors used QB Score as a simple way to rate QBs so they can try see if there is any consistency in how they rate from season to season. I am not sure your criticism linking QB Score and team ratings is valid as QB Score isn't a measure of a team's total offensive efficiency.

QB Score per play is about the efficiency of a QB. I don't see how measuring the efficiency of a QB is dependent on the type of offense they are running. Evaluating only plays involving the QB (pass attempts, sacks, QB rushes) someone who takes more plays to accumulate the same amount of yardage as another with fewer should be ranked lower. If you want to rate teams and compare different styles of offense you need more metrics than QB Score alone.

Given that QB Score and DVOA have a 90% correlation between them according to Berri and QB Score is the easiest box stat to calculate I think it has some utility in being a way to provide a quick and dirty way rating QBs. While QB Score isn't a better metric than DVOA, which does take into account situation and opponent, it is definitely an improvement over QB Rating and more than adequate to use as the authors did to discuss consistency.

Also a tiny clarification on the formula in the post. Yards and plays include rushing and sacks and it is -50 per turnover (both fumbles lost and interceptions) instead of just interceptions. WoW and you, in your review, state the formula correctly and you've capture the essence of it here but for those that haven't read either it should be noted.

I definitely agree with you that there are certain parts of WoW that make you gnash your teeth but QB Score wasn't one of those for me.

At Thursday, February 15, 2007 6:44:00 PM,  Phil Birnbaum said...

Hi, Kevin,

The TWOW authors got the QB Score formula by regressing the elements of offense on points scored. And so it's perfectly proper to ask if these values apply reasonably well to all teams, not just to the "average" team that the regression assumes.

Suppose team X is so good that they gain exactly three yards every play. In that case, they score a touchdown every possession -- but QB Score will have them at zero!

For me to accept QB Score, it has to be shown that these kinds of teams -- to a much lesser degree, of course -- don't exist.

And while the authors do use QB Score only on plays involving the quarterback, the regression comes from *all* plays. So it's reasonable to use total plays and points to check whether QB score "works".

>"I definitely agree with you that there are certain parts of WoW that make you gnash your teeth but QB Score wasn't one of those for me."

Me neither. This is my only concern with QB score that I've written about, I think.

P.S. Thanks for the note about fumbles, I'll update the post.

At Friday, February 16, 2007 1:46:00 PM,  Tangotiger said...

This is the NFL QB rating formula:

C = (Completions - .30) * 5
Y = (Yards - 3) * 0.25
T = (TD) * 20
I = (.095 - Interceptions) * 25

All are "per pass attempted".

(C+Y+T+I) / 6 * 100

As you can see, the "Y" term is the one that the QB Score also uses. The "C" term, allegedly, offers the counterweight for those "short" situations.

The QB rating gives 100x more weight to the interception than the yard gained, as opposed to the 50x by the QB score. But the QB rating, allegedly, offers the counterweight with the TD term.

I have no idea if the NFL rating or the TWOW rating is any better.

Certainly, it should not be looked at in terms of individual plays. It's meant to be looked at in aggregate terms, say at least 10 or 15 pass attempts.

At Friday, February 16, 2007 1:53:00 PM,  Tangotiger said...

If it's not obvious:
QB Score = Yards – (3 * plays) – (50 * turnovers)

If you divide all the terms by "passes attempted", you get:

QB per pass = Yards per pass - 3 - 50*interceptions per pass

Which is also:
= 4 * 0.25 * (Yards per pass - 3)
- 4 * 12.5 * (Interceptions)

Divide the whole thing by 4, and you get:
QB per pass / 4
= (Yards per pass - 3 ) * .25
- (Interceptions) * 12.5

("passes attempted" could include actual passes thrown, or sacks, etc)

So, the TWOW is akin to the NFL QB rating.

At Friday, February 16, 2007 3:58:00 PM,  Phil Birnbaum said...

Thanks, Tango. Interesting that the NFL came up with a "yards" term exactly the same as the TWOW regression result. Wonder if they used similar methods?

At Friday, February 16, 2007 5:31:00 PM,  Tangotiger said...

That was done in 1973. My guess is that they just tried different things until they got something reasonable.

They may have reasoned that since a running back should get 3 yards a carry, that becomes the bench mark.

At Saturday, February 17, 2007 8:52:00 AM,  Jon said...

Hey, Phil. I'm not sure if I posted here before. I'm a fan of the net yards per attempt metric. It might not be as precise as Football Outsiders DVOA or DPAR, but I can calculate it in my head as I'm watching a game in some bar.

QB Rating seems to put too much emphasis on relatively rare events (TDs and interceptions).