Friday, April 16, 2010

The flaw in the Scully pay/performance regession

In 1974, Gerald Scully published an academic article called "Pay and Performance in Major League Baseball." (Here's a Google search that finds a copy on David Berri's site.) It's a very famous paper, because it reached the conclusion that, in the pre-free-agent era, teams were paying players far, far less than the players were earning for their employers.

It was also one of the first academic papers to try to find a connection between individual performance and team wins. Unfortunately, Scully chose SLG and K/BB ratio as his measures of performance, but, I suppose, those probably seemed like reasonable choices at the time. But it turns out there's a much more serious problem.

In his set of variables to use in predicting winning percentage, the set of variables that included SLG and K/BB, Scully also included dummy variables for how far the team was out of first place. That is, in trying to predict how well a team did, Scully based his estimates partially on ... how well the team did!

That biases the results so much that I can't believe nobody's mentioned it before ... at least they haven't in all the mentions I've seen of this study.

If it's not obvious why that's the wrong thing to do, let me try to explain.

Suppose you predict that, this season, your favorite team will slug .390 and have a K/BB ratio of 1.5. What will its winning percentage be?

Well, if Scully had used only SLG and K/BB in his regression, it would be easy to figure out: you just take his regression equation, which would look something like

PCT = (a * SLG) + (b * K/BB) + c (if an NL team) + d

Plug in Scully's estimates for a, b, and c, plug in .390 and 1.5, and there you go -- your estimate.

But Scully's actual equation included those two extra terms

PCT =(.92 * SLG) + (.90 * K/BB) - 38.57 (if an NL team) + 37.24 + 43.78 (if the team finished within 5 games of first) - 75.64 (if the team finished more than 20 games out)

So now how do you calculate your team's expected PCT? You can't! Because you don't know whether to include those last two variables. After all, how can you predict in advance whether your team will wind up having finished near the top or the bottom? You can't! If you could, you probably wouldn't need this regression in the first place!


Not only does the regression not make sense, but, more importantly, by including those two dummy variables, Scully's estimates of productivity wind up completely wrong. For instance: what is the effect of raising your SLG by 10 points? Well, that depends. Keeping all the other variables constant, it's .92 * 10, or 9.2 points. That's .0092, or about 1.5 wins in a 162-game season.

But wait! Those dummy variables for standings position won't necessarily stay constant. What if those 1.5 wins lifted you from 21 games back to 19.5 games back? In that case, the equation would give 9.2 for the SLG, but an extra 75.7 for the change in the dummy, for a total of 86.9 points! And what if they lifted you from 6 games back to 4.5 games back? In that case, the equation would estimate an extra 43.8 point bump, for a total of 53.0!

So what's the benefit of an extra 10 points slugging on the team's winning percentage?

--> 9.2 points -- for a team 22 or more games out
-> 75.7 points -- for a team 21.5 to 20 games out
--> 9.2 points -- for a team 19.5 to 7 games out
-> 43.8 points -- for a team 6.5 to 5 games out
--> 9.2 points -- for a team less than 5 games out

That makes no sense like that. You can't figure out how much the player's productivity is worth unless you know which of the five groups the team is in. But which group the team is in is exactly what you're trying to predict!

In any case, it's obvious that using 9.2 points as the measure of the player's increased productivity is wrong. It's *at least* 9.2 points, but sometimes substantially more. You need to average all five cases out, in proportion to how often they'd occur (and how can you know how often they occur without further study?). When you do that, you'll obviously get more than 9.2 points. But, as far as I can tell, Scully just used the SLG coefficient as his measure -- the player only got credit for the 9.2 points! And so he *severely underestimated* how much a player's performance helps his team.


Here's an example that will make it clearer. Suppose a lottery gives you a 1 in a million chance of winning $500,000. Then, if you do a regression to predict winnings based on how many tickets you buy, you'll probably get something close to

Winnings = 0.5 * tickets bought

Which makes sense: a 1 in a million chance of winning half a million dollars is worth 50 cents.

But now, suppose I add a term that says whether or not you won. Now, I'll get

Winnings = 0.0 * tickets bought + $500,000 (if you won)

That's true, but it completely hides the relationship between the ticket and the winnings. If you ignore the dummy variable, it looks like the ticket is worthless!

Same for Scully's regression. By including part of the "winnings" for having a good SLG or K/BB "ticket" in a different term, he underestimates the value of the "ticket".


So, since Scully's conclusion was that players are underpaid for their productivity, and Scully himself had underestimated that productivity ... well, the conclusion is completely unjustified by the results of the study. It may be true that players were underpaid -- I think it almost certainly is -- but this particular study, famous as it is, doesn't even come close to proving it.

UPDATE: As commenter MattyD points out (thanks!), I got it backwards. For the previous paragraph, I should have said:

However, Scully's conclusion on pay and productivity still holds. The study underestimated player productivity, but, if it found that players are paid less than even that underestimated production, it's certainly true that they are underpaid relative to their true production.

Labels: ,


At Friday, April 16, 2010 10:24:00 AM, Anonymous MattyD said...

Very interesting and I always learn a ton from these posts where you explain flaws in other studies.

On this one, though, does your final conclusion that Scully's conclusion was wrong follow? If I'm reading you right, he found that there was a certain relationship between performance and wins and thus a certain relationship between performance and value. Perhaps he finds that they should be paid an additional $500K for each .100 in SLG, but are only actually being paid $250K. You point out a flaw that says that the relationship between performance and value is even stronger than he thought, say $1M per .100 SLG. Well that just strengthens his conclusion that players are underpaid, no?

At Friday, April 16, 2010 10:26:00 AM, Blogger Phil Birnbaum said...

Oh, you're right, I got it backwards! Let me post a correction ...

At Friday, April 16, 2010 10:44:00 PM, Anonymous Guy said...

Phil: That's a good catch on the inclusion of team record as "predictors" of team record. Perhaps an even bigger problem, though, is the way he estimates a player's contribution to the team. Take hitters, whose contribution is measured in SLG. Scully assigns a value to players equal to the difference between his production and a .000 SLG player. So an average hitter (.340 SLG) contributes 28.3 "points" (.028) of SLG in the model.

But this is clearly not a realistic assessment of the average player's contribution. It would actually take a SLG of .680, twice league average, to raise team SLG by 28 points. If we compare players to replacement level, then the average player probably raises team SLG by only 4-5 points. So Scully is in fact massively overestimating the contribution of the average player. (Accepting the model at face value. It obviously has other problems.)

By using a zero baseline, the Scully method hugely underestimates the real variance in player value. In his model, a .680 SLG hitter is worth only twice as much as a .340 SLG hitter (see table on page 923). So Willie McCovey, the 1968 NL leader with a .545 SLG, was worth only 60% more than the average hitter, and about 80% more than the worst hitters in the league.

It's too bad Scully didn't factor defense into his model. Because if you measure the impact of a third baseman by comparing him to someone who makes no plays at all, his value would be hundreds (if not thousands) of runs. Every player would be massively underpaid! Or, maybe Scully would have recognized that a zero baseline doesn't work.....

At Friday, April 16, 2010 11:31:00 PM, Blogger Phil Birnbaum said...

Guy: right, a lot of these kinds of studies use zero as the baseline. I'm not that surprised to see that flaw here ... it still happens sometimes, even 35 years later.

At Saturday, April 17, 2010 6:19:00 AM, Anonymous Guy said...

But this article is the "original sin." This establishes the Scully method of valuation. Most of the other studies you're referencing took the method from Scully.

And the problem is larger than the replacement issue. I think Scully is confusing marginal and average here. Suppose he had the benefits of sabermetric knowledge, and used runs created instead of SLG to value hitters. His regression would tell him that every run is worth about .6 points of win% (.0006). And thus an average team of hitters scoring 750 runs would produce 463 points of winning percentage. The problem, of course, is that you have hitters generating virtually all of the team's wins. Because the marginal value of runs (.1 win) is much larger than the average value (more like .05 wins).

At Saturday, April 17, 2010 9:50:00 AM, Blogger Phil Birnbaum said...

I guess I should read the paper before commenting further: I read a description of the regression in another paper and read only enough of the Scully paper to verify.

You're right that the regression doesn't work. Absolutely, it could be that Scully is confusing marginal and average.

Have you read the paper? Might it also be that Scully is incorrectly assuming declining marginal productivity of runs? Declining marginal productivity is the case in most real-life situations: the first waiter is the most important, the second waiter next most, and so on, until the point where 15 waiters are not much more productive than 14 waiters.

With baseball batting, it's different. The first run is worth very little. The second is worth very little, but slightly more. The third slightly more than that.

The marginal value of runs *increases*, not decreases, up to the average team production (or, more accurately, up to the number of runs the team gives up). Then it starts decreasing down to zero.

If Scully figured that for a .500 team that scores 700 runs, each of the first 699 must be worth *more* than the 700th, that might cause the problem. With waiters, it's true -- with runs, it's the opposite.

Anyway, I'm agreeing with you that the method doesn't work, but I haven't read the paper, so I can't agree (yet) that the problem is confusing marginal and average. But if you've read the paper, and that's your conclusion, it makes sense.

I agree with you that this might be an even bigger problem than the one I wrote about. (Although, how can you decide which is bigger? Either one makes the conclusions invalid.)

I should probably rephrase ... what surprises me about the first flaw, the one about predicting an outcome based on certain characteristics of the outcome, is that it has nothing to do with misunderstanding how baseball productivity works. It has to do with ... just doing an invalid regression. It's a metholdology issue, which I wouldn't have expected in a paper that's cited approvingly so much.

At Saturday, April 17, 2010 10:37:00 AM, Anonymous Guy said...

Here's the paper if you want to read it:

I believe other economists (Zimbalist and/or Fort, perhaps?) have critiqued the Scully method. So I don't think it's a consensus among economists. Maybe Millsy or Rodney can shed light on this if they happen by.

BTW, the other "foundational" paper of sports economics, Rotternberg's paper arguing that richer teams will end up with the best players even under the reserve clause, is also online:


Post a Comment

<< Home