Tuesday, October 30, 2007

r-squared abuse

I bet you think a Ferrari is expensive. You're wrong. I did a statistical study, and I found out that 99.999% of Bill Gates' wealth is explained by factors other than the cost of a Ferrari.

That leaves only 0.001%, not even a thousandth of one percent. That's a very small number. Clearly, buying a Ferrari doesn't affect your wealth!

Not falling for it? You shouldn’t. And neither should you fall for this Andrew Zimbalist quote (as discussed in a recent Tangotiger blog post):

"If you do a statistical analysis [of] the relationship between team payroll and team win percentage, you find that 75 to 80 percent of win percentage is determined by things other than payroll," says Andrew Zimbalist, a noted sports economist ...

This figure seems to be quoted by economists everywhere, in the service of somehow proving that "you can't buy wins" by spending money on free agents. But there are a few problems. First, it's misinterpreted. Second, it doesn't measure anything real. And, third, as Tango points out, you can make the number come out to be wildly different, depending on your sample size.


(For purposes of this discussion, I'm going to turn Zimbalist's statement around. If 75% is *not* determined by payroll, it implies that 25% *is* determined by payroll. I'm going to use that 25% here, since that's how the result is usually presented.)

First, and least important, Zimbalist misstates what the number means. It's actually the r-squared of a regression between season wins and season payroll (for the record, using 2007 data for all 30 teams, I got .2456 – almost exactly 25%). That r-squared , technically speaking, is the percentage of *the total variance* that can be explained by taking payroll into account. Zimbalist simply says it's "win percentage," instead of *the variance* of win percentage. But the way he phrases it, readers would infer that, if you look at the Red Sox at 96-66, payroll somehow accounts for 25% of that.

What could that mean? Does it mean that payroll was responsible for exactly 24 wins and 16.5 losses? Does it mean that without paying their players, the Red Sox would have been .444 instead of .593? Of course not. Phrased the way Zimbalist did, the statement simply makes no sense.

But that's nitpicking. Zimbalist really didn't mean it that way.

What he *did* mean is that payroll accounts for 25% of the total variance of wins among the 30 MLB teams. But there's no plain English interpretation of what that number means. The only way to explain it is mathematically. Here's the explanation:


Look at all 30 teams. Guess at how each one was supposed to do in 2007. With no additional information, you have to project them all at 81-81.

Now, take the difference between the actual number of wins and 81. Square that number. (Why square it? Because that's just the way variance works). So, for instance, for the Red Sox, you figure that 96 minus 81 is 15. Fifteen squared is 256 225.

Repeat this for the other 29 teams – Arizona is 81, Milwaukee is 4, and so on. Add all those numbers up. I did, and I got 2,488. That's the total variance for the league.

Now, get a sheet of graph paper. Put payroll on the X axis, and wins on the Y axis. Place 30 points on the graph, one for each team. Now, figure out how to draw a straight line that comes as close as possible to the 30 points. By "as close as possible," we mean the line – there's only one – that minimizes the sum of the squares of each of the 30 vertical distances from the line to each point. (Fortunately, there's an algorithm to figure this out for you automatically, so you don’t have to test an infinity of possible lines and square an infinity of vertical distances.)

That line is now your new guess at how each team would do, adjusted for payroll. For instance, the Red Sox, with a payroll of $143 million, would come out to 89-73. Arizona, with a payroll of $52 million, comes out at 77-85.

Now, repeat all the squaring, this time using the projections. The Red Sox won 96 games, not 89, so the difference is 7. Square that to get 49. The Diamondbacks won 90, not 77, so the difference is 13. Square that to get 169. Repeat for the other 28 teams. I think you should get something around 1,877.

Originally, we had 2,488. After the payroll line, we have 1,877. The reduction is around 25%.

Therefore, payroll explains 25% of the variance in winning percentage.


This entire process, taken together, is linear regression. And it's all correct – mathematically, at least. But the statement, "payroll explains 25% of the variance in winning percentage," has meaning only within the definitions of linear regression. It means very little in terms of baseball. Indeed, the 25% is not a statement about baseball at all, or even a statement about payroll, or about wins. It is a statement about what happens when you start accounting for *squares of differences in wins*. I think of it, not completely correctly, as a statement about "square wins." And who cares about square wins?

Most statements that involve a number like 25% have a coherent meaning – when you hear those statements, you can use them to draw conclusions. For instance, suppose I tell you "toilet paper is 25% off today." From that, you can calculate:

-- the same amount of money that bought 3 rolls yesterday will buy 4 rolls today
-- if it cost $10.00 per jumbo pack yesterday, it's $7.50 today
-- when the sale is over, the price will increase by 33%
-- if I use eight rolls a week, which normally costs $4.00, and I buy an 8-week supply today, I will save $8.00, which works out to 12.5 cents per roll.

That's how you know you got useful information – when you can make predictions and calculations based on the figure.

But suppose I now tell you, "payroll explains 25% of the variance of winning percentage." What can you tell me? Nothing! Even if you're very familiar with regression analysis, I challenge you to write an English sentence about wins and payroll – one that doesn't include any stats words like "variance" – that uses the number 25%, but refers to wins and payroll. I think it can't be done, at least not without taking the square root of 25%.

You can't even tell me, from this analysis, if payroll is an important factor in wins or not. If we can't make any statement about what the 25% means, how do we know whether it's important or not? Our intuition suggests that payroll must be not very important, because the percentage is "only" 25. But that's not true, as Tango said. I'll get back to that in a bit.


The thing is that the regression analysis DID tell us lots of useful information about payroll and wins. The 25% figure is actually one of the least important things we get out of it, and it's strange that economists would emphasize it so much.

The most important thing we get is the regression equation itself. That actually answers our most urgent question about payroll – how many extra wins do high-spending teams get? The answer is in this equation (which I've rounded off to keep things simple):

Wins = 70 + (payroll / $7.4 million)

This gives us an exact answer:

-- in 2007, on average, every $7.4 million teams spent on salaries added an extra win above 70-92.

Did payroll buy wins? Yes, it did – at $7.4 million per win. Can't get much more straightforward than that. If you take that number, then figure out how many wins the free-spending teams bought ... well, work it out. The Yankees spent $190 million, which should have made them 96-66. The Red Sox, as we said, should have been 89-73. The Devil Rays, at only $24 million, should have been 73-89.

And that's the answer, based on the regression, on how payroll bought wins in 2007. Based on that, you may think that means payroll is important. Or you may think payroll is unimportant. But your answer should be based on $7.4 million, not "25%" of some mathematical construction.

(By the way, I think $7.4 million is unusually high – there were few extreme records in 2007, which made wins look more expensive last year than in 2006. But that's not important right now.)


Which brings us back to Tango's assertion, that the 25% can be almost anything depending on sample size.

Why is that true? Because the 25% means 25% of *total variance*. And total variance depends on the sample size.

Suppose there are only two things that influence the number of wins – payroll, and luck. Since payroll is 25% of the total, luck must be the other 75%. (This is one of the reasons statisticians like to use r-squared even though it's not that valuable – it's additive, and all the explanations will add up to 100%. That's very convenient.)

Let's suppose that over a single season, payroll accounts for 100 "units" of variance, and luck accounts for 300 "units":

100 units – payroll
300 units – luck

Payroll is 25% of the total.

But now, instead of basing this on 30 teams over one season, what if we based it on the same 30 teams, but on their average payroll and wins over two consecutive seasons?

Over two seasons, payroll should cause variance in winning percentage exactly the same way as over one season. But luck will have *less* variance. The more games you have, the more chance luck will even out. Mathematically, twice the sample size means half the variance of the mean. And so, if you take two seasons, you get

100 units – payroll
150 units – luck

And now payroll is 40% of the variance, not 25%.

If you go three seasons, you get 100 units payroll, 100 units luck. And now payroll is 50%. Go ten seasons, and payroll is 77% of the variance. The more seasons, the higher the r and the r-squared. That's why Tango says

"If [games played] approach infinity, r approaches 1. If GP approached 0, r approaches 0. You see, the correlation tells you NOTHING, absolutely NOTHING, unless you know the sample size."

(Update clarification: this happens ONLY if payroll and luck are the only factors, and payroll reflects actual talent. In real life, there are many other factors, and it's impossible to assess talent perfectly -- so while the r-squared between payroll and talent will increase with sample size, it won't come anywhere near 1.)

And it goes the other way, too – the higher the variance due to luck, the lower the percentage that's payroll. Suppose instead of winning percentage over a *season*, you use winning percentage over a *game*. I did that – I ran a regression using the 4,860 games of the 2007 season, where each winning percentage was (obviously) zero or 1.000.

Now the r-squared was .003. Instead of payroll explaining 25% of total variance, it explained only 0.3%!

Payroll explains 25% of winning percentage variance over a season
Payroll explains 0.3% of winning percentage variance over a game.

But the importance of payroll to winning has not changed. Payroll has the same effect on winning a single game as it does on a series of 162 games. But, on a single-game basis, the variance due to luck increases by a huge factor, and that dwarfs the variance due to payroll.

It's like the Ferrari. It explains only 0.001 percent of Bill Gates' net worth. But it explains 1% of a more typical CEO's net worth, 100% of a middle class family's net worth, and 1,000,000% of a beggar's net worth. The important thing is not what percentage of anything the Ferrari explains, but how much the damn thing costs in the first place!

And for payroll, what's important is not what percentage of wins are explained, but how much the win costs.

As Tango notes, the r-squared figure all depends on the denominator. Just like the Ferrari, you can wind up with a big number, or a small number. Both are correct, and neither is correct. Just seizing on the 25% figure, and noting that it's a small number ... that's not helpful.

A pinch of salt has more than 1,000,000,000,000,000,000,000,000 molecules, but weighs less than 0.000001 tons. And payroll explains 25% of season wins, and 0.3% of single game wins. As Tango notes, it's all in the unit of measure. The r-squared, without its dimension, is just a number. And it's almost meaningless.

The r-squared is meaningless, but the regression equation is not. Indeed, the equation does NOT depend on the sample size at all! When I ran the single game regression, with the 4,860 datapoints, I got an r-squared of 0.03, but this equation:

Wins per game = 0.43 + (payroll / $1.2 billion)

Multiplied by 162, that gives

Wins per season = 70 + (payroll / $7.4 million)

Which is EXACTLY THE SAME regression equation as when I did the season.


And so:

-- the r-squared changes drastically depending on the total variance, which depends on the sample size, but the regression equation does not;

-- the r-squared doesn't answer any real baseball question, but the regression equation gives you an answer to the exact question you're asking.

That is: if you do it right, and use the regression equation, you get the same answer regardless of what sample you use. If you blindly quote the r-squared, you get wildly different answers depending on what sample you use.

So why are all these economists just spouting r-squared? Do they really not get it? Or is it me?

Labels: , ,

Saturday, October 27, 2007

Mitchel Lichtman on speed and defense

Over at The Hardball Times, a great article by Mitchel Lichtman investigating the connection between a player's speed and the quality of his defense. It's called "Speed and Defense."

Lichtman estimated every player's speed by using a version of Bill James' Speed Score, but a version that doesn't use defense (to avoid "cheating"). He then checked to see if the faster players played better defense than the slower players.

He found that they did. At every position (except catcher, which wasn't included in the study), fast fielders had a better
UZR than the slow fielders. As you would expect, speed was more important for outfielders and less important for infielders. The highest effect was in center field. Here are the numbers. Differences are in runs per 150 games. Remember, this is the difference between fast and slow – players with "average" speed are not included:

1B –- 4.5 runs
2B –- 4.5 runs
3B –- 1.9 runs
SS – 11.2 runs
LF –- 5.9 runs
CF – 10.6 runs
RF –- 6.0 runs

After that, Lichtman calculated whether speed is more important in bigger parks than smaller parks. He found that it is.

Fast players improved by 0.9 runs when moving to a small park (from a medium or large park), while slow players improved by 5.0 runs. However, when moving to a *large* park, the fast players improved by 7.6 runs, against only 3.3 runs for the slow fielders.

(Note that all players appear to have better UZRs in small and large parks than in medium parks – Lichtman's article suggests reasons why this might be the case.)

The actual study is very much worth reading ... as far as I can tell, Lichtman's methods, comments, and caveats are all right on. Also, his estimates of the square footage of all fields in MLB might come in useful for other studies.

Hat Tip:
The Book Blog

Labels: , ,

Tuesday, October 23, 2007

NBA home field advantage larger when home team is behind

In the NBA, the home team advantage declines steadily over the course of a game. From 2002-03 to 2003-04, the advantage is 1.28 points in the first quarter, but only 0.45 points in the fourth:

1st quarter: 1.28 points
2nd quarter: 1.07 points
3rd quarter: 0.89 points
4th quarter: 0.45 points

The total, including overtime, is about 3.74 points.

All this comes from a
study by Marshall B. Jones (fakeable self-identification required for download) in the just-released new issue of JQAS. It's called "Home Advantage in the NBA as a Game-Long Process."

Why does the home field advantage (HFA) decline? It could be because, when the home team is ahead early in the game, it doesn't play as hard. Here are the HFAs, by quarter, when the home team is ahead at the beginning of the quarter (Jones gave the results separately for the two seasons – I averaged them out):

2nd quarter: +0.05 points
3rd quarter: +0.27 points
4th quarter: -0.43 points

But when the home team started the quarter *behind*, the HFA is strong:

2nd quarter: 1.71 points
3rd quarter: 1.77 points
4th quarter: 1.84 points

So is this true, that a team doesn't try as hard when it has the lead? Perhaps teams are more likely to bench their stars when they have the lead. Or maybe they play a different style – trying to use up the clock? – to maximize their chance of winning. Or maybe the players don’t care as much, which
Bill James suggested as a possibility in a slightly different context.

None of these explanations have to do with HFA explicitly, but, rather, with the fact that when the stronger team is trailing, it performs especially well. The paper doesn't distinguish between the possibilities, but I'd bet it's a "stronger team" effect and not a "home team" effect.

Regardless of the explanation, I found this to be a highly unexpected result. I would have expected a bit of a letdown, perhaps, in the fourth quarter, when the game is almost certainly won. But after one quarter? What's going on? Does anyone have any ideas?

By the way, in overtime, the HFA per minute was about the same as in the first quarter. This is what you'd expect for any of the above theories.

One thing I disagree with in the paper is this:

"An NBA team playing at home should be leading at the end of the first quarter. If it is behind, it has lost much of the advantage it had when the game started. Before the game starts, the home team can expect to win the game roughly 62.0% of the time. If the home team is behind at the end of the first quarter, that percentage drops to 44.2% in 2002-03 and 43.8% in 2003-04. The home advantage is not something that the home team retains regardless of how it performs during the game. If the home team lets itself be outscored in the first quarter, then the advantage it had when the game started is lost."

The implication is that the first quarter is especially critical when it comes to HFA. But I don't think it is. First, as we have seen, a team that's behind after one quarter has a high HFA in the rest of the game. And, second, a home team that falls behind in the first quarter is probably the victim of a much stronger team. That would always be the case, regardless of whether the HFA is "frontloaded" or not.

Labels: , ,

Monday, October 22, 2007

Presentation videos online

Videotaped presentations from the "New England Symposium on Statistics and Sports" are now available online here. Presenters include Dan Rosenbaum, Alan Schwarz, Justin Wolfers, and others.

I haven't watched them yet, just found 'em.


Wednesday, October 17, 2007

New Bill James study on Pythagoras

In the light of lots of discussion on why the Diamondbacks beat their Pythagorean Projection, and what that means, Bill James wrote up a new study on whether such teams improve beyond expected in the following season.

Bill sent the study to interested readers on the SABR Statistical Analysis mailing list, but kindly allowed me to post his article, and the accompanying spreadsheet, for anyone interested:


You will notice that at the end of the article, Bill asks for comments ... if you comment here, I'll post to the list. I already posted some comments, which I will reprint below in small font. You probably want to read the study first.


Bill, thanks for the article! I thought I'd comment, since nobody else has (oops, Dvd Avins posted as I was writing this).


Summarizing Bill's study, if I've got it right:

If you take teams that overachieved – that "should have" been about .482 according to Pythagoras, but were instead actually .538 -- they wound up at .496 the next season.

If you take the most closely matched teams in runs scored and runs allowed, but that *underachieved* Pythagoras – they "should have been" .478, but were actually .447 – they would up at .474 the next season.

Converting everything to a 162 game season, to make things easier to understand:

Group 1: Should have been 78-84. Were actually 87-75. Next year were actually 80-82.
Group 2: Should have been 77-85. Were actually 72-90. Next year were actually 77-85.

The difference: three games (actually, 3.7 if you don't round everything). Adjusting for the fact that Group 1 was (according to Pythagoras) about 0.6 games better than Group 2 in the first place, brings us back to about 3 games.

In terms of runs scored and runs allowed, the difference is only about 2.3 games. The other 0.7 comes from Pythagoras. That is, the teams with the pythagorean advantage of 13 wins the first year had a pythagorean advantage of only 0.7 wins the next year.


First, I think it's plausible that the 0.7 win advantage is real. Pythagoras just counts runs; it doesn't count how important they are. Bill once wrote (and several others have verified) that each run given up by a stopper is, in terms of wins, worth double what it's worth to an average pitcher. So if the stopper on one team has an ERA 1.00 run better than another, and he pitches 90 innings, the 10 runs he saves are actually worth 20. That will mean his team beats Pythagoras by one win.

It's probably reasonable to assume that the teams that beat Pythagoras the most would have better stoppers than the ones who "un-beat" Pythagoras the most. 0.7 wins – 7 runs difference in stopper talent – seems reasonable to me.


That brings the unexplained difference down to 2.3 wins. What could explain that?

Dvd Avins suggested that it's management making changes: that, next year, when the overachieving team drops back to normal, the team will make some improvements.

Or, perhaps the changes might come in the off-season. The team that went 87-75 thought it was really an 87-75 team, and went out and signed an expensive free agent – the one guy they thought could take them over the top to the playoffs. The 72-90 team did not.

But an average of one free agent per team, over 100 teams, seems large, especially when Bill's sample covered all of baseball history, much of which didn't allow for easy free-agent signings.

However, these 87-75 teams are different from other 87-75 teams, in that the players' season records (not including pitcher W-L) look like the records of an 79-83 team, not an 87-75 team. That means that there's more opportunity for management to make changes. When your team scores 120 more runs than they allow, almost everyone looks good. But when your team gives up more runs than they allow, there's going to be more players that obviously seem in need of replacement. This may be *especially* true if the team is perceived to be a playoff contender.

That is, take two identical teams who give up a few more runs than they score. They both have a below-average DH who hits .260 with 15 home runs. The team that went 87-75 might consider it urgent to replace him – they think they need just one or two moves to make the playoffs. The team that went 72-90 knows they need a lot more than that, and may not spend money on free agents until their young players start improving.

Anyway, this may be wrong ... I’m just thinking out loud.


As for significance testing ... Tom Tango has figured that in MLB these days, the SD of team wins is 11.66 games. That breaks down as 9.7 games (in 162) for team talent, and 6.3 games for luck.

For the teams in Bill's list, 9.7 games for talent is too much, since the teams were not chosen randomly, but specifically to match the other group. So let's assume that instead of 9.7 games, the SD of the talent difference between the matched pairs is, say, 3 games.

That makes the SD of actual next-season wins come out to about 7 games (the square root of 6.3 squared plus 3 squared).

Since there were 100 pairs in the study, the SD of the average is 7 divided by the square root of 100, which is 0.7. And so the 2.3 win difference is three-and-a-bit SDs. Obviously that's significant, and so not just chance. But the management effect could be causing it.

Suppose you wanted to get that down to 1 SD, to feel comfortable calling it random variation. You'd have to reduce the difference by 1.6 wins. The most plausible way to do that is to attribute those 1.6 wins to management.

Suppose you have those two identical teams, but one overshoots Pythagoras to win 87 games, and the one undershoots to win 72 games. Both teams may try to improve ... but is it plausible to argue that, averaged over all 100 pairs, the 87-win teams will deliberately go out and improve their teams by 1.6 wins more than the 72-win teams?

I honestly don't know if that sounds reasonable. It's only 16 runs, though ...

Labels: , ,

Bill James on competitive balance research

Ten days ago, Bill James wrote an article for the Boston Globe on what he thought would be the next big things in sports research. Bill's main answer: competitive balance. Is it bad for basketball that the best teams win games, divisions, and championships more often than other sports? How can the games be made more competitive? That, according to Bill, is where research is going in the next generation.

Economists, who have been looking at these issues for a while now, were a little upset that Bill didn't seem to know about them.
Dave Berri wrote,

" ... although James believes he is presenting “new” questions, much of his column focuses upon issues that are “old hat” to economists who have studied sports over the past few decades."

At The Sports Economist, Skip Sauer wrote,

"But while the answers are elusive, the study of competitive balance is not "virgin territory." Anyone answering the call of Bill James (and perhaps Bill himself) might profit from using google scholar, a fabulous little tool. The result it delivers is not consistent with the notion that this is 'virgin territory' ... "

And Sauer then gives us an actual
screen print from Google, showing some 3300 academic papers containing the words "competitive balance."

Over at "The Book",
Tangotiger argued that

"Bill James is a self-confessed non-follower of research, publicly stating that he doesn't keep up. He really shouldn't then be commenting on what has or has not been studied, since most would assume that he keeps up with the field."


I have to agree with these comments. Bill James did indeed err in implying that competitive balance is virgin territory for sports research. It most clearly is not. However, I disagree with some of the other comments from the sports economists, the ones that argue that the research has come up with strong answers to Bill's questions.

To summarize Bill's comments on basketball, he argues that the NBA has a competitive balance problem because

(a) players don’t try hard in the regular season, knowing that the better team wins so often that one play probably won't make a big difference;

(b) the playoff picture is decided in December, which reduces fan interest in the second half of the season;

(c) with playoff series as long as they are, the better team is favored so overwhelmingly, and upsets are so rare, that the sport becomes less interesting to the fans; and

(d) it is possible to correct these problems by changing the rules of the game and season.

At "The Wages of Wins," David Berri acknowledges that competitive balance in basketball is lower than in other sports. But he questions whether or not fans care. His reasons?

(a) attendance and revenues in the NBA are way up in the past ten years or so.

And that's it. Berri does mention that the issue of competitive balance and attendance has been studied – and he gives a few references – but he doesn't mention any of the results. He doesn't address Bill's discussion of whether you can increase competitive balance by changing the rules of the game. (Indeed, he has previously implied that such a thing is impossible, arguing that basketball has an imbalance because of "the short supply of tall people," rather than by the fact that basketball has so many possessions with a high chance of scoring on each.)

At Skip Sauer's blog, the argument is similar. Indeed, Sauer quotes Berri in noting that "The NBA does not ... have a problem with attendance." To his credit, though, Sauer notes explicitly that we don’t actually know much about fans' demand for balance; and, again to his credit, he does quote "some well-known results:"

"One tentative conclusion from people who have been thinking about this issue for some time (i.e. most of us), is that while competitive balance is clearly essential in some degree, the payoff function around the optimum may be really flat. The two most successful leagues in the world, the NFL and EPL, have vastly different degrees of balance, suggesting other factors are likely much more important in generating fan interest ... "

And that's fair enough. But that doesn't answer Bill's question, which was, "what level of competitive balance is best for the league?" Sauer is answering a completely different question: "how important is competitive balance compared to other factors?"

And, with respect to Sauer and Berri, the fact that NBA attendance is increasing doesn't mean that competitive balance is unimportant. McDonald's is doing well even though Big Macs cost more than they did ten years ago. Should we infer that customers don’t care about price? People are buying relatively fewer Buicks than in the 60s, even though Buicks are much better cars than they used to be. Does that mean buyers don't care about quality?

The logic here just doesn't make sense to me, especially coming from economists. Isn't it possible that attendance in the NBA might be even stronger if you tweak the game a little more to the fans' liking? Isn't it a bit naive to look at increasing attendance and blindly conclude that the current level of balance must be exactly what the fans want?

I agree with Bill that we don’t know how fans react to different levels of balance, in the short or long term. On the one hand, I can see how some fans don’t like it when the better team is almost certain to win. On the other hand, I kind of enjoyed it last Monday when the Cowboys were 6:1 favorites over the Bills and almost lost. And I also enjoy an occasional blowout. It's reasonable to assume that some fans prefer imbalance, while some prefer balance, isn't it?

So which set of fans is more important? Should leagues cater to one set over the other? Should they try to balance the two somehow?

Is there any study that tries to figure out the optimum? There might be, but I haven't seen it. Indeed, I have seen studies that beg the question, by assuming the more uncertain the outcome, the more the fans like it.
That doesn't make sense to me.

That's "within game" competitive balance. What about competitive balance within a season? Or competitive balance over a number of seasons?

MLB attendance and revenues are way up. But, from my standpoint, I'm less interested in baseball than I used to be. There are many reasons for this – one is the fact that baseball cards are so expensive that I don't know the players anymore. But another is that team spending is so unbalanced that I feel like I'm watching payrolls more than players. This year, the Yankees had an amazing second half and made the playoffs for the Nth consecutive year. Am I impressed? Well, no – they spend so much more than any other team that *of course* they keep making the playoffs. If I were a Yankee fan, how could I have pride in my team, knowing that they win through the pocketbook?

In the past, when my team won, I could be proud of them for drafting well, or judging talent, or making good trades, or even just putting it all together and having a good year. These are, perhaps, weak reasons for feeling pride in the accomplishments in a bunch of strangers, and maybe I'm not typical. Maybe most fans can feel just as much pride that their ownership is successful enough in their shipbuilding business that they can spend some big bucks on their team. To each his own.

But which type of fan is dominant? And will that change over time? Right now, MLB might be making lots of money because the best teams happen to be in the biggest cities. But over the long term, will that get boring? Will Yankee fans be more likely to lose interest when it sinks in more and more that their team's success is just being bought? Will other cities lose interest for the same reason?

Will a salary cap make fans more loyal, if they know that money is taken out of the equation? It would to me. I've been waiting 40 years for my Leafs to win the Stanley Cup – but if they were to have bought one, by spending three times as much on salaries than any other team, I would have probably been too disgusted to celebrate. It seems to me that to take pride in a team, you need them to have won a fair fight. Now, as I said, it could be that other fans aren't like me – but isn't this something that someone should be studying?

Taking this a bit further, is it just coincidence that the cities with the most rabid fans appear to be the ones with a history of failure? Will interest in the Red Sox start to drop once the World Series becomes a distant memory? Is it possible that it's actually in the long-term financial interest of the Cubs and Leafs to *not* win a championship, and milk their fans' longing for another few decades? If so, perhaps competitive balance in *winning seasons* is a good thing, but competitive balance in *championships* isn't. Who knows?

I haven't read all the studies or
blog posts that Berri and Sauer (here are some of Sauer's) listed on the topic of competitive balance. But the ones I *have* seen don’t really address the more complex questions. Some of them concentrate on the math – which sports have more competitive balance, and which less? Some of them discuss what certain changes – say, a luxury tax – will have on the level of balance. Some of them do some regressions on balance vs. attendance. All reasonable issues, but none that put much of a dent in Bill's question – "what makes a league succeed?"

"The issue of what is good for leagues is virgin territory," Bill wrote. That's not correct – the issue is out there, and sports economists have produced a significant literature on the topic. But what have we learned from that literature?

I would argue that Bill is almost correct: the *questions* are not virgin territory, but the *answers* are.

Labels: , , ,

Thursday, October 11, 2007

Tom Verducci's theory on overworking young pitchers

Tom Verducci says that if you give a young pitcher 30 more innings than he's ever pitched before, you're abusing him, and there's a good chance he'll get hurt, or "take a major step backward in [his] development."

How does he know? Anecdote, mostly. But he lists the six pitchers under-25 who qualified in 2005. Three of them (Liriano, Chacin, and Kazmir) got hurt, and the other three (Cain, Duke, and Maholm) saw their ERAs jump. (If those are really the only six pitchers who met the criteria, then I have to admit that's a pretty good anecdote.)

But still: does this criterion for the "year-after effect" really make sense? There are several reasons why it might not:

1. Regression to the mean. When a pitcher throws more innings than expected, it's probably because he's pitched better than before; it's hard to set a new high when you're giving up lots of runs in the early innings. And if your performance exceeds expectations, there's a very good chance it's because you were lucky. And so chances are you'll regress the next season. That has nothing to do with overwork or arm trouble, just luck.

2. Total Innings. A pitcher who exceeds his previous high by 30 IP is also likely to have thrown a good number of pitches. How does Verducci know it's the difference between this year and last that causes any effect? Maybe it's just the total number of pitches.

3. Season Length. Wouldn't you expect it to be harder on a pitcher's arm to set a new high in innings over, say, half a season, instead of over a full season? You'd be throwing more pitches per start, or more pitches over a shorter period of time. Suppose a manager reads Verducci, and notices that his pitcher is 29 IP over his previous high, even though it's only July. Should he really sit that young pitcher for three months? That doesn't make sense to me, but I'm not an expert. (Also, Verducci includes post-season innings. If that pitcher is 29 IP over at the end of the regular season, does Verducci really think those three extra playoff starts, on normal rest, are going to cause such a large problem?)

4. 30 innings doesn't seem like a lot. Thirty innings is about one inning per week. That doesn't seem like all that much. For a starter, it's about one inning per start, which perhaps is a lot – that one extra inning comes when the pitcher is somewhat tired. But if the starter set a new high, he's probably pitching well, which means he's giving up fewer hits, which means that he's not throwing as many more pitches as the 30 inning figure would suggest. That's especially true since managers are likely to limit young starters to a certain pitch count.

Anyway, this issue needs a real analysis, not just anecdotes. One way to study this question is to do paired comparisons. For instance, find two pitchers with similar ages and career statistics, but where one exceeded the 30 IP threshold, and the other did not. Does the less-experienced guy do worse later?

Or find two pitchers where both exceeded their innings by 30, but one is younger than the other. Does the young guy suffer more ill-effects than the old guy?

I'm betting that if you did a real study, you'd find any such effect is small – and, moreover, you find more of an effect using some measure other than "innings over previous high."

In any case, Verducci's article appeared last November. He listed pitchers in 2006 who met his criteria for possible problems. How did they do in 2007?

Cole Hamels: excellent year, better than last year
Justin Verlander: better than last year
Anibal Sanchez: shoulder injury
Jered Weaver: great 2006, regressed to mean in 2007
Sean Marshall: mediocre 2006, improved to mean in 2007
Scott Olsen: good 2006, bad 2007
Jeremy Bonderman: a little worse than expected in 2007
Adam Loewen: out with elbow injury
Anthony Reyes: equally mediocre both years
Scott Mathieson: injured in 2006
Boof Bonser: got a little worse in 2007
Chien-Ming Wang: exactly as good in 2007 as 2006
Rich Hill: improved a bit in 2007

Three injuries (two if you don't count Mathieson), maybe one collapse among the non-injured pitchers, and a few improvers. Make of that what you will.

Hat Tip:


Wednesday, October 10, 2007

An updated "Moneyball Effect" study

According to "Moneyball," walks were underrated by major league baseball teams. The Oakland A's recognized this, and were able to sign productive players cheaply by looking for undervalued hitters with high OBPs. This (among other strategies) allowed them to make the playoffs several years on a small-budget payroll.

If this is correct, then, once "Moneyball" was published and the A's thinking was made public, the OBP effect should have disappeared. Teams should have started fully valuing a player's walks, and the salaries of players excelling in that skill should have taken a jump.

About a year ago,
I reviewed a study that claimed to find such a sudden salary increase. I wasn't convinced. Now, the same authors have updated their study, with better stats and more seasons worth of data. Again, they claim to find a large effect. And, again, I am not convinced.

The authors, Jahn K. Hakes and Raymond D. Sauer, took the years 1986-2006 and divided them into four time periods. They regressed salary against OPB and SLG. They found that the value of a point of SLG didn't change much over that time period, but OBP did. It increased gradually across the first three periods, but starting in 2004, after the release of "Moneyball," it took a huge jump.

They repeated the study using a measure of bases on balls, instead of OBP (the latter includes hits, which might have confounded the results). Again, they found a huge jump in remuneration for walks starting in 2004.

The numbers are striking, but I'm not sure they mean what the authors think they do. There are several reasons for this. In comments to
a post at "The Sports Economist," Guy points out some of them. (The points are Guy's, but the commentary here is mine.)

First, the study grouped together all players, regardless of whether salary was determined by free agency, arbitration, or neither (players with little major league experience have their salaries set by the team; I'll call those "slaves"). In the regression the authors used, the hidden assumption is that for all three types, player salaries increase the same way. That is, if an additional 20 walks over 500 PA will increase a free agent's salary by 10%, it will also increase an arbitration award by 10%, and a team will even offer a slave 10% more.

That's not necessarily true. Suppose that free agent salaries rise because 20% of teams read "Moneyball." That's probably enough that almost 100% of high-OBP players have their salaries bid up. But if the same 20% of arbitrators read "Moneyball," what happens? Only 20% of salaries will increase. And, actually, it'll be less than that, because most of the teams won't be emphasizing walk rate at the hearing.

I'm sure you could come up with scenarios where the changes in compensation are due to changes in patterns between the three groups, rather than walks. For instance, suppose that slave salaries are increasing faster than free-agent salaries, and slave OPBs are increasing faster than free-agent OPBs. That could account for the observed effect. I'm not saying this is true, because I have no idea. But there are lots of hypothetical scenarios that could also account for what the study found.

Second, the authors used a very low cutoff for inclusion: only 130 plate appearances. They do include a multiplier for plate appearances, so that each PA is worth x% more dollars. However, as Guy points out, the study assumes that the performance of a part-time player is as accurate an indication of his talent as it would be for a full-time player. This can cause problems. Anything can happen in 150PA; someone who OBPs .400 in that stretch is probably a mediocre player having a lucky year, not an unheralded star.

Also, salaries probably don't correlate all that well to plate appearances. Someone with 200 PA might be a regular who got injured, or it could be a pinch-hitter to had to play regularly for a month because someone else got injured. So the assumption that salary is proportional to PA adds a lot of noise to the data.

In addition, suppose that salaries for star players are increasing faster than for part-time players. That would make sense; there is a much larger pool of mediocre players than regulars, and the competition among the ordinary players keeps their cost down. When the Rangers decided to spend $252 million, they used it to buy Alex Rodriguez, not to give ten bench players $25 million each.

If that's the case, the observed effect could be an increasing difference in walks between regulars and bench players, rather than an increase in the market value of walks.

Suppose in 2002, full-time players OBP .400 and part-time players are also at .400. In 2004, full-timers are still at .400, but part-timers drop to .300. If that happened, that would certainly account for the observed jump in OBP value. The regression would notice that suddenly the spread between the .400 guys and the .300 guys was on the rise, and would attribute that to their walks instead of their full-time status.

Did this actually happen? I don't have data for 2004 handy, but, in 2003, the full-time guys (500+ AB) outwalked the replacement guys (160-499 AB) by .103 to .093 (BB per AB). That's a difference of .010. I ran the same calculation for a few other years (this is a complete list of the years I checked. It averages all players equally, regardless of AB, and includes pitchers):

2003: .103 - .093 = .010
2002: .105 - .098 = .007
2001: .107 - .091 = .016
1997: .107 - .102 = .005
1992: .101 - .096 = .005
1987: .103 - .102 = .001
1982: .099 - .091 = .008

So there is some evidence of a recent increase in the amount by which regulars outwalk non-regulars, which corresponds to the gradual increase in OBP value the authors found. I don't have data for the jump years 2004-2006, but maybe I'll visit Sean Lahman's site and download some.

Thirdly – and this is now my point, not Guy's – there is a lag between a player's performance and his salary being set. Most free-agents would have negotiated their 2004 salary well before "Moneyball" was released. If you take that into account, the huge effect the authors found must be even huger in real life, having been created by only a fraction of the players!

This should actually make the authors' conclusions stronger, not weaker, except that if you find the size of the jump implausible, it's even more so when you take this effect into account.


Now, I'm not saying that there isn't a Moneyball Effect, just that this study doesn't measure it very well. How *can* you measure it? Here's a method. It's not perfect, but it'll probably give you a reasonably reliable answer.

First, find a suitable estimator of a player's expected performance in 2007. Bill James used to use a weighted (3-2-1) average of the player's last three years, which seems reasonable to me. You could regress that to the mean a bit, if you like. Or, you could use published predictions, like
PECOTA, or Marcel.

Now, take that estimate and figure the player's expected value to his team in 2007. Use any reasonable method: linear weights, VORP, extrapolated runs, whatever. Let's assume you use VORP.

Take all full-time players ("full-time" based on expectations for 2007, to control for selection bias) who signed a free-agent contract during the off-season. Run a regression to predict salary from expected VORP. Include variables to adjust for age, position, and so on, until you're happy.

Now, run the same regression, *but add a variable for BB rate*. That coefficient will give you the amount by which the market over- or undervalues walks. That is, suppose the regression says that salary is $2 million per win, less $10,000 per walk. That tells you that if you have two identical players, each of whom creates five wins above replacement, but where one walks 20 times more than the other, that one will earn $200,000 less. That would mean that walks are undervalued – you get less money if you have fewer hits and more walks, even if the walks exactly compensate for the hits.

Repeat this for all years from 1986 to 2006 (adjusting salaries for inflation). If the Moneyball Effect is real, you should find that the coefficient for walks is negative up to 2003, then rises to zero after 2004.

A fun side-effect is that you can include all kinds of variables, not just walks – batting average, home runs, and so forth – to see which skills are more or less valued through the years. For instance, you could check for a "Bill James" effect, to see if the perceived value of batting average drops through the 80s and 90s. You could include RBIs, to see if the market pays more for cleanup men than leadoff men. And so on.

Labels: , ,

Saturday, October 06, 2007

How often does the best team win the pennant?

There are 30 teams in Major League Baseball. What is the chance that the best team will wind up winning the most regular season games?

According to
this article in Science News, the answer is 30%. That figure comes from a new study by (among other, unnamed authors) Eli Ben-Naim, a New Mexico physicist.

I don't know how Ben-Naim came up with that result, because I couldn't find the study referred to in the article. There were references listed, but they're not much help. One is
a study I reviewed a couple of weeks ago, which used an oversimplified model of team talent. It didn't deal with the probability that the best team wins.

In the other Ben-Naim study referred to in the article, the author again uses his oversimplified model (in which every game is won by the underdog with fixed probability, regardless of the actual relative talent of the two teams). He argues that if you want to ensure that the best team wins the most games, you need to play a huge number of games (on the order of N cubed, where N is the number of teams). However, he finds, you can reduce the number of games considerably if you play a series of round-robins, where each round robin eliminates a certain percentage of the teams remaining.

It's an interesting study, using only analytical math (no empirical observations or simulations). But it doesn't address the question of how often the best team wins, at least as far as I could tell.

So where does the 30% come from? Either Ben-Naim did a one-off calculation for the reporter, or there's a forthcoming study.

However, there are at least two existing sabermetric studies addressing this question, studies that Ben-Naim does not reference in his two articles.

In the "1989 Baseball Abstract" (by Brock Hanke and Rob Wood, self-published), there's a guest article by Bill James entitled "How Often Does the Best Team Win the Pennant?" James simulated 2,000 seasons and found that the best team in baseball won its division 72% of the time. (That's back in olden times when there were only four divisions.)

As a rough approximation, if the best team wins its division 72% of the time, it should win *all four* divisions with probability .72 to the power of 4. That's 27%, impressively consistent with Ben-Naim's 30% figure.

And in a BTN article in May, 2000 (
see page 15), Rob Wood found that, in a 12-team league with a .060 standard deviation of talent, the best team wins the pennant 52% of the time. If you assume that the best team would also have won the *other* league 52% of the time, the chance is 27% (.52 squared) that the best team wins the most games in both leagues. Again very consistent.

Ben-Naim is also quoted as saying that if you want to raise the probability from 30% up to 90%, it takes a huge number of games -- 15,000 per team.

To check that, I ran a simulation. I assigned random talents of the 30 teams, from a normal distribution with mean .500 and standard deviation .060 (about 9.7 games over 162). I then compared the top two teams and figured the binomial probability that the top team would beat the second after 15,000 games. I repeated this for 10,000 seasons.

(Whether or not the best team wins depends, mostly, on the difference in talent between the best two teams. That's because 15,000 games is enough time for the SD of the luck separating them to shrink to about 1 win per 162 games. If the best team is, say, a 97-win talent while the second-best team is only a 94-game talent, the chance of the 94 beating the 97 is more than 3 SDs, which is effectively zero.

However, if the best team is only fractionally better than the second-best – say, 96.2 to 96.1 – that's only one-tenth of an SD, and the best team has almost a 50% chance of losing.

So, most of the time, the best team wins easily. But a small fraction of the time, the runner-up in talent is good enough to give the best team a run for its money.)

As it turned out, over 15,000 games, the best team won 93% of the time, again consistent with Ben-Naim's claims. You can't get closer than "consistent" because my simulation was oversimplified. Specifically:

-- I didn't play teams against each other; rather, I played 30 indpendent binomial schedules. This reduces luck (since "upsets" affect only one team instead of two), and would tend to inflate my percentage.

-- I considered only the top two teams in talent. It's possible that the number 3 team might also be close to the others, and manage to beat them out. Ignoring that third team would also tend to inflate my percentage.

-- I used a standard deviation of .060. This is the actual SD of team talent. However, the distribution is tighter on the top end of teams than the bottom (as Bill James pointed out in the Blue Jays comment of the 1984 Abstract). Perhaps I should have used .070 for the bottom teams, and .050 for the top teams, or something. Again, my decision not to do that inflated the observed percentage. (Quick check: dropping the SD to .050 does reduce the frequency, but only to 92%).

So, overall, Ben-Naim's numbers look reasonable. I look forward to seeing his methods when his paper comes available.

(Hat Tip: Tangotiger)

UPDATE: The 30% was indeed a one-off calculation, so there is no forthcoming paper. Ben-Naim kindly visited to let us know. See the first comment.

Labels: , ,

Wednesday, October 03, 2007


Alan Reifman, of "Hot Hand" fame, now has a blog on sabermetric analysis of volleyball.