## Friday, April 27, 2007

### The "defense first" strategy in college football OT -- part 2

In a post a couple of days ago, I noted a study that found a strong advantage to choosing "defense first" in college football OT. The first couple of paragraphs of that previous post describe the overtime rule ... you can go back and read them if you're not familiar with it.

That study found that the team that goes on offense last in the first OT (which I'll call "the second team" or "team 2") beat the team that went on offense first ("the first team" or "team 1") 54.9% of the time.

With some of the numbers listed in the original study, and some additional assumptions, we can try to figure out a *theoretical* probability with which the second team will win, and compare that to the observed 54.9%. For this theoretical calculation, I'm assuming the two teams are equal in skill.

The original study gave the distribution of first team scoring. I've combined 6- and 7-point touchdowns to keep things simple (which won't affect the results much):

.235 – team 1 scores 0 points
.299 – team 1 scores FG
.466 – team 1 scores TD

What is the distribution of second team scoring? It depends what the first team does, and we have to guess a bit.

Suppose the first team scores a touchdown. Then, the second team never goes for a field goal. So it's in what otherwise would be a field goal situation .299 of the time, but will have to go for a touchdown anyway. Suppose from fourth-and-something, they will score a touchdown 50% of the time, and score 0 points 50% of the time. In those cases, that would change their distribution to:

.385 – team 2 scores 0 after team 1 TD
.000 – team 2 FG after team 1 TD
.615 – team 2 TD after team 1 TD

Now, suppose the first team scored a field goal. We'll assume the second team plays exactly the same way as the first team:

.235 – team 2 scores 0 after team 1 FG
.299 – team 2 FG after team 1 FG
.466 – team 2 TD after team 1 FG

Finally, suppose the first team scored zero. It had a 76.5% chance to score, but failed. The second team must have a greater than 76.5% chance to score, because it's going to go for a field goal in some cases where the first team might have chosen to go for a touchdown (and fumbled or something). Let's call it 80%.

.200 – team 2 scores 0 after team 1 scores 0
.800 – team 2 FG or TD after team 1 scores 0

The chances of the first team winning in the first OT is the sum of these probabilities:

.180 – team 1 TD, team 2 zero (.466 * .385)
.070 – team 1 FG, team 2 zero (.299 * .235)
.000 – team 1 TD, team 2 FG (never)
------------------------------------------
.250 – Total chance team 1 wins in this OT

The chances of the second team winning in the first OT is this sum:

.188 – team 1 zero, team 2 TD/FG (.235 * .800)
.139 – team 1 FG, team 2 TD (.299 * .466)
------------------------------------------
.327 – Total chance team 2 wins in this OT

And the chance of a tie is 1 minus the above two totals, which works out to

.423 – Total chance of this OT ending in a tie

Now, the chance of the (original) second team winning the game, is this sum:

Chance of winning in the first OT + (Chance the first OT is a tie * chance of winning the second OT) + (Chance the first two OTs are ties * chance of winning the third OT) + ...

Also, if a given OT ends in a tie, the first team has to go second this period, and the second team has to go first. So the probabilities are switched in the even-numbered OTs. Therefore, the above sum works out to:

.327 + (.423 * .250) + (.423^2 * .327) + (.423^3 * .250) + ...

The sum of that infinite series (which is actually two intertwined geometric series) works out to .526.

Under the assumptions listed above, the chance of the "defense first" team beating the other, equally matched team in OT is .526.

But as we saw in the other study, the actual chance was .549. Why is our estimate different?

One possibility is that we're assuming evenly-matched teams. In real life, college games are often mismatches. However, mismatches should minimize the effect. The more mismatched the teams, the less likely winning the coin toss should affect the result (if a team is good enough, it'll keep scoring TDs and win even if it has to go first). So the real life number for equally matched teams should be higher than .549, not lower.

So what's going on? Why do these estimates not match the observed results? One of three possibilities:

1. My calculations and logic are wrong;
2. The assumptions are wrong;
3. Favored teams won a lot of coin tosses just by luck.

I'm assuming it's not luck. Any suggestions?

Labels:

## Thursday, April 26, 2007

### The "defense first" strategy in college football OT

If a college football game is tied after four quarters, the game goes to overtime.

The college OT rule is different from the NFL rule. It works a lot like extra innings in baseball. Each team gets one possession at its own 25. If, after both teams have completed their drive, one team is ahead, the game is over. But if it's still tied – perhaps each team scored a field goal on their one possession – it goes to another "inning," and each team gets another possession. This continues until the tie is broken.

A coin flip decides which team gets the ball first in overtime. The team winning the toss gets to choose whether to go on offense first, or on defense first. Almost always, it chooses defense first. (The choice applies only to the first OT; in each subsequent OT, the order is flipped from the previous.)

Is choosing "defense first" a good strategy? That's the subject of a paper from the newest JQAS, just released yesterday. The study is called "An Analysis of the Defense First Strategy in College Football Overtime Games," by Peter A. Rosen and Rick L. Wilson. (Download is free if you give an e-mail address.)

Rosen and Wilson looked at 328 overtime games they found since this overtime rule was instituted in 1995. They found that it appears that forcing the other team to go on offense first is, indeed, a good strategy. How good? The team going on defense first won 54.9% of overtime games.

.549 Team going on defense first
.451 Team going on offense first

The reason for the advantage is kind of obvious: the team "going second" (on offense) already knows what the first team did. And so, it won't make the mistake of settling for a field goal when the "first team" (on offense) scored a touchdown. On the other hand, the first team *can* make that "mistake." Faced with fourth-and-10 on the opponent's 20, it'll go for the 3 points, because it doesn't know yet that the opponent will score a TD.

It's important to keep in mind that this argument is not a proof. It points out the advantages of going second, but none of the disadvantages. What are the disadvantages? Well, they're probably small, but, in theory, teams can defend differently depending on the score. If the first team fails to score on their possession, a field goal will beat them. And so, they might set up some sort of defense that minimizes the chances of the second team scoring (it would increase the chance of a touchdown, but not as much as it decreased the chance of a field goal).

So, by theory alone, we couldn’t be sure, in advance, that going second is the optimal strategy. We needed the numbers to confirm – and they do.

(Here's an example of where the advantage might be reversed. Suppose that in overtime, there's a rule you're only allowed to go for a field goal if the defense explicitly allows you to. In that case, the advantage of choice switches sides -- whether you go for a touchdown or a field goal depends on the *defense*, not the offense. So in that case whoever is *defense* last has the advantage. They would use it by *never* letting the second team try a field goal if they had failed to score themselves.)

Alas, after showing us the results, the study doesn't really give us much extra useful information. The authors do run a regression, predicting the odds of winning from several variables: point spread, home/road, "momentum" (who scored last before overtime), and "pressure" (the points the first team scored in overtime).

As you would expect, "pressure" dominates the equation -- once you know how many points the first team scored, it's much easier to predict who's going to win. Only pressure and point spread turn out to be significant. The authors are surprised that home field advantage doesn't turn out to be important, but they don’t seem to realize that HFA is already included in the point spread.

They do note that the second team advantage seems to have declined recently. From 1995 to 2000, their winning percentage was .624; but between 2001 and 2006, they won at a rate of only .492 – less than .500! The authors don't have any firm explanation for this, and I can't think of one either. (Perhaps underdogs won a lot of coin tosses in the last several years, or something.)

In their conclusions, the authors state that going second is a good strategy. But then they write that "in situations where pressure is seven points (the offense first team scores a touchdown) and the defense first team is not a large favorite, the offense first team holds a considerable advantage." Er, yes, that's true, but I'm not sure that has anything to do with which strategy is better, and I'm not sure what the authors are trying to say here.

Indeed, they seem to treat "pressure" the same as all the other variables all the way through the study, which leads to confusion. The other variables are known prior to the overtime, and it makes sense to predict the outcome from them. But predicting the outcome of overtime from what actually happens in the overtime, and combining that with all those other variables ... well, it seems to me you can't learn much about strategy that way.

Labels:

## Tuesday, April 24, 2007

### Methanometrics

My natural gas bill arrived yesterday. On it was a note that said

"Your year-to-date gas consumption has increased by 11% compared to last year. Temperatures have been 6% colder."
This confused me a bit.

First, 6% colder? What does that mean? You can't go by the usual temperature measures. Suppose last year it was 2 degrees, but this year it was only 1 degree. Is that 50% colder? (Sure wouldn't feel like it.) Or if the average this year is minus one, do you say it was 200% colder? Or if last year it was zero, and now it's –1, that must be infinity colder, right?

If I remember my grade 12 chemistry, the only scale in which you can meaningfully calculate percentage is the Kelvin scale (or the Rankine scale, for you Fahrenheit devotees). You get the Kelvin temperature by adding 273 to the Celsius temperature. If I understand it, Kelvin is a measurement of a real quantity, so that 200 actually is twice as hot as 100, in a real physical sense.

So suppose last winter the average temperature here in Ottawa was –5 Celsius (23 F). That's 268 Kelvin. Six percent colder than that is about 252 Kelvin, which is –21 Celsius (-6 F). But there's no way it was *that* cold this winter. We had a few nights of –21, but it couldn't have been the *average*.

"Temperatures have been 6% colder," then, is just plain wrong.

(Besides, temperatures can't be "colder" than each other, just "lower." That's for the same reason that \$10 is "more" than \$5, not "richer" than \$5. But I digress.)

So what does that 6% mean, then? Maybe they mean that it was 6% colder *relative to inside*. Suppose again that last year was –5 C, and indoor temperature is 20 C (68 F). That measn the outdoors were 25 C ( 45 F) colder than indoors. So this year would be 6% more than that, or 26.5 C (47.7 F) colder than indoors, which would mean the outside temperature was –6.5 C (-20 F).

Well, that actually sounds reasonable. Maybe that's what they're talking about. But why didn't they just say so? Maybe it's too many words: "this year, the relative temperature between outdoors and indoors was 6% less." Yeah, that's kind of unwieldy. But at least it's *correct*!

Regardless, why are they telling me that in the same note that they mention my 11% increase in consumption? Are they implying that the two numbers should be the same? I don't know enough physics to comment on whether 6% more degrees should mean 6% more fuel. They probably know what they're doing, and they have physicists on staff, so probably they're right. But then, you'd think those physicists would have spotted the error in the 6% figure.

Anyway, this is more marketing than information. It does seem that the gas company is trying to hint that I'm using too much energy. Along with the bill, they sent me a bunch of coupons in the mail to use on fluorescent light bulbs (which, paradoxically, would have me use even more gas, because my incandescent bulbs give off heat and reduce my furnace usage). So I assume it's all part of a campaign to subtly encourage me to reduce my overall consumption.

If they do have ulterior motives, my skepticism is activated, and I'm not willing to trust their numbers. They're going to have to fix up their errors, and then convince me that their calculations are correct.

In short, if they want me to do more to stop global warming, they're going to have to show me their Al Gore Rythm.

## Monday, April 23, 2007

### Another academic champion of peer review

Here's an interesting guest posting by Steve Walters at the "Wages of Wins" blog.

Walters warns us against putting too much faith in any individual sabermetric study,

"that we need to be careful before we conclude that some “study” by anyone actually “proves” something."

Which is actually excellent advice – studies are often flawed, or incomplete, and there's often statistical error. But I find Walters' arguments a little odd.

He starts out by mentioning Bill James' famous study (in his 1982 Abstract) on player aging. James analyzed all players born in the 30s, and found that they tended to peak, both individually and as a group, at age 27.

Ha! responds Walters. We shouldn't have believed James. Because, 20 years later, Professor Jim Albert did another study, and found that, indeed, while players born in the 30s did peak at 27, players both before and after peaked later. In fact, players born in the 1960s appear to have peaked at almost 30!

Walters writes,

"James’s findings were… well, flukey. ... Why do I bring this up? Emphatically not to suggest that profs always know more than best-selling writers like Bill James. The point is that it’s actually damned hard to figure out what’s really, really true by sifting through numbers. Sometimes profs do it better than intelligent laymen, and sometimes the reverse is true."

---
(Ironically, a careful reading of both studies shows that they're not really contradictory. "Guy" points out in the comments to Walters' post that the James study was based on 502 hitters. The Albert study was based on only 61 hitters, those with at least 5,000 plate appearances. As Guy writes, "it’s possible — indeed, likely — that a sample of players with longer than average careers will have peaked at a later than average age."

Also, Guy notes that the two studies used different measures -- one used bulk value, and one used hitting rates. So, again, the two studies aren't constrained to give the same results.

So perhaps Walters jumps to conclusions when he argues that the "27" hypothesis is convincingly disproven. But I will proceed as if it is.)
---

Perhaps I'm hypersensitive on the subject of academic vs. non-academic researchers, but geez ... "best-selling writer?" "Intelligent layman?"

When it comes to sabermetrics, calling Bill James a "best-selling writer" is kind of missing the point, like calling Abraham Lincoln a "calligrapher" because he wrote the Gettysburg Address in longhand.

And Calling Bill James an "intelligent layman" because he doesn't have a Ph.D. is like calling Adam Smith a layman because he never took an econometrics course. It's like calling Shakespeare a layman because he never studied King Lear at the doctoral level. It's like calling Isaac Newton a layman because he never took a high-school calculus course.

As for the broader point, this particular case is not a question about who does it "better." Bill James found, correctly, that one group of players peaked at 27. Jim Albert found, correctly, that players born in other decades peaked at other ages. This isn't a case of "better," or "worse". It's just the scientific method. One researcher extends the work of another, and sometimes finds slightly different results. That's how science proceeds.

From that standpoint, Walters is correct; it's always "damned hard to figure out what's really, really true." That's the case whether it's numbers, or whether it's medicine, or whether it's physics. A researcher publishes a result, and, if the evidence is convincing, it's accepted as true – but only until other evidence comes along. And when that happens, the original researcher is not at fault, nor has he done anything "worse" than the guy who proves his theory wrong.

But perhaps Walters just picked a bad example, or found Bill James to be an irresistibly juicy target. Because he immediately starts talking about errors in logic or methdology, rather than just insufficient evidence:

"[Sometimes] a researcher’s methodology may unintentionally twist things in a particular way. Or a boatload of statistical subtleties may confound things."

That's absolutely true. There are lots of studies, both academic and "amateur," that have flaws – huge, obvious flaws. (Some of them I've reviewed on this blog.) The Bill James study he quotes, though, isn't one of them. But they do exist, and in fairly large numbers.

So when should we trust, and when shouldn’t we? I'd argue that it's just common sense. Don't rely on anything just because it says so in a single paper. Assume that the more a result is cited and used, the more likely that it's been replicated, or found to be sound. If you see two conflicting results, try following the implications of the results and see which ones make sense. And if you're still not confident, read the paper itself and see if the methodology holds up.

Walters has a different solution: rely mostly on academic peer review.

Does academic peer review work in sabermetrics?
I say no – I think academic peer review has largely failed to separate the good work from the flawed.

And peer review certainly wouldn't have worked in the Bill James case. How would the peer reviewers have noticed a flaw? And what would the referees have said?

"You know, Mr. James, that's a good piece of work. But it's possible that aging patterns for players born in the 30s may not be the same as for other decades. So we have to reject your paper."

Or maybe, "we weren't sure if the results are generalizable, so we pulled out our Sporting News guides, and spent three weeks repeating your study for all other decades. And it turns out they're different. So we're rejecting your paper."

Or, "Even though it's only 1982, how do we know that players born in the 1960s, who aren't even in the major leagues yet, may peak closer to 30? So I'm afraid we have to reject your paper."

None of those seem very realistic ... I'm not sure how Mr. Walters thinks any economist, in 1982, would have spotted the results as "flukey." Or on what other criterion they would have rejected it. In every respect, the study is truly outstanding.

Even if you accept that academic peer review works for sabermetrics – which I think it doesn't – Walters admits that it takes "excruciating months" to go through the process ... and a few few jealous, picky, anonymous rivals get to dissect our work."

And it's a basic theorem of economics that when you tax something, you get less of it. Forcing researchers to endure "excruciating months" of "dissection by jealous rivals" is a pretty hefty tax. And, fortunately, sabermetric research is something anyone can do. Retrosheet provides free, high-quality data to everyone. You don't need to dissect rats in a dedicated laborarory, or have access to expensive particle accelerators, in order to discover sabermetric knowledge.

As a result of these two factors, a low-tax alternative jurisdiction has sprung up. The non-academic sabermetric community, fathered by Bill James in the 80s, has flourished – and almost all our wealth of knowledge in the field today has come from those "amateurs." This is whether the knowledge now resides. And it is my belief that of the true "peers" of the best sabermetric researchers today, at least ninety percent of them work outside of academia. There are several websites where studies can get instant evaluations from some of the best sabermetric researchers anywhere. Academic peer review simply cannot compete, not just in turnaround time, but also in quality.

One more excerpt:

"When you’re consuming statanalysis, ask yourself whether the author is an expert or pseudo-expert—and even then whether other experts have had a crack at debunking the work. (E.g., it’s notable that [The Wages of Wins] is from a renowned university press, and that much of the research on which it’s based was initially published in refereed journals.)"

It's ironic -- but on this last quote, I agree with Walters completely.

Labels: ,

## Sunday, April 22, 2007

### AL/NL payroll gap now even bigger

A couple of months ago, I posted about the AL being superior to the NL. One of the factors was the salary gap – in 2006, the average American League team spent \$84 million on payroll, while the National League teams spent only \$72 million.

Today, in the New York Times "Keeping Score" column,
Dan Rosenheck notes that the difference in payrolls has increased. The NL moved only slightly, to \$74MM, but the American League jumped substantially, to \$93MM.

The gap is now almost \$20 million. That's about four wins per team per season. (Or maybe it's five. But because of interleague play, let's call it four.) So a National League team that goes 83-79 is probably no better than an American League team going 79-83.

By the log5 method, AL teams should play .524 ball against NL teams this year. That should give the AL a 132-120 record.

Last year, the AL was 154-98. I still think that was just a fluke.

Hat tip:
The Griddle

Labels: ,

## Thursday, April 19, 2007

### Why so few blacks in MLB? A convincing theory

Why are there now so few blacks in baseball? According to this very interesting CNNMoney.com article, by Chris Isidore,

The percentage of [American] black major league players is now 8.4 percent ... That's a touch less than half the level it was at only 10 years ago. Some teams, such as the Atlanta Braves and Houston Astros, have no black players on their rosters.

Why is that? Where have all the black players gone?

For one thing, Isidore writes, major-league teams are devoting more and more of their scouts to check out players outside the United States. That's because those players aren't eligible for the draft, and can be signed directly. If you scout a player in Louisiana, there's only a 1-in-30 chance you'll wind up with his rights. But if you nurture a prospect in Venezuela, you have a pretty good chance of getting him to sign a contract with you.

But, according to Isidore, there's another important factor, and that's the fact that (as documented in "Moneyball,") teams are increasingly drafting players out of college, instead of high school.

And more white players go to college than black players, for socio-economic reasons. Result: fewer black draft choices.

According to the article, when the draft began in 1965, 56% of drafted players were high-schoolers. In 2005, it was down to 35%. Since it probably takes 15 years for the full effect of those changes to show up in rosters, that seems like a pretty good explanation. I found Bill James' famous draft study from his 1984 newsletter. From James' charts, here are the proportions of draft choices who were high school players:

1965: 71% (article: 56%)
1967: 100%
1969: 88%
1971: 96%
1973: 74%
1975: 74%
1977: 80%
1979: 68%
2005: 35% (from article)

I don't know for sure why James' 1965 figure doesn't match the article's. But James used a weighted formula (so early choices counted more than late choices), and only considered the top 50 draftees, so that's probably the difference.

In any case, if the numbers are correct, the percentage of high-school players fell slowly but steadily until 1979. Then, between 1979 and 2005, it dropped in half. One thing you could do to check further, is actually count what proportion of college players are white, compared to high-school players. That would help confirm the hypothesis.

But I think this theory is pretty good ... I bet this is a big part of the answer.

Labels: , , ,

## Wednesday, April 18, 2007

### Review of "The Baseball Economist"

I reviewed J. C. Bradbury's "The Baseball Economist" over at "The Griddle." Comments welcome here or there.

Labels:

## Tuesday, April 17, 2007

### NBA teams will "tank" for draft choices. Why don't NFL teams do the same?

Two recent posts from the Sports Law Blog talk about NBA teams losing games on purpose. They do that in order to finish worse in the standings, and improve their chances of getting an early pick in the draft.

In the NBA, the draft order is determined by lottery; the worse a team's record, the more lottery tickets it gets for high picks. If a team isn't going to make the playoffs, it might be in its interest to try to lose, or at least not try so hard to win. Nobody deliberately tanks, but coaches may make extensive use of second-tier players, for the ostensible reason that they need to evaluate them for next year.

The
first post talks about ways to stop this from happening by changing the incentives. For instance, if all non-playoff teams had an equal shot at the lottery, regardless of record, there would be no reason to lose. Or if every rookie was a free agent, you wouldn't need a draft at all.

The
second post is, in my opinion, more interesting. It asks why we see "tanking" happen in basketball, but not in other sports. It doesn't happen in the NHL or NFL. Plus, the NFL doesn't use a lottery, so you'd think the tendency to lose meaningless games would be stronger, not weaker.

The article comes up with five reasons the NBA is unique in this regard:

-- in the NBA, there's often only one or two impact players, and then a steep dropoff in quality. So a number one pick could be extremely valuable, while the number two pick is not a lot of use.

-- the value of a superstar in the NBA is much higher than in other sports, because there are only five men on the court, and top players may see action for almost the entire game.

-- teams who draft a superstar often improve by a huge margin in the following season. (This seems to be simply a consequence of the previous point.)

-- since there's not as much money wagered on NBA games as NFL games, there's less outrage when teams don't try hard to win.

-- "nobody cares" when bad teams lose; there are 82 games in the season, so no game is that big an event.

My vote goes to the first two points. And I'd add a third:

-- it's easier to lose a game in basketball than in other sports. You only have to substitute five players, rather than 30 or 40 in the NFL. There's a big dropoff between superstars and bench players, so the players you sub in are substantially worse. And, finally, in the NBA, a substantially worse team will almost always lose to a substantially better team – there just aren't all that many upsets.

This makes the task of losing a whole lot easier. And that's important; if you're going to take flak for obviously trying to lose, at least you want the strategy to work.

Labels: , , , ,

## Saturday, April 14, 2007

### Can payroll buy wins? "Percentage of variance explained" doesn't tell you

I've always been uncomfortable with studies that express their results in "percentage of variance explained." For instance, in "The Wages of Wins," the authors run a regression of wins on salaries, and get an R-squared of .18, which means that "payroll explains about 18% of [the variance of] wins." Since 18 percent is a small number, they argue that the relationship is not strong, and the authors conclude that "you can't buy the fan's love" by spending money to get wins.

I don't think that's correct – and for reasons that have to do with statistics, not baseball. The percentage of variance explained does NOT necessarily tell you anything about the strength of the relationship between the variables.

(Technical note: I'm going to use "percentage of variance explained" most of the time, but you can probably just substitute "R-squared" if you prefer. They're the same thing.)

-----

First, the percentage figure doesn't tell you, outright, the importance of wins – it tells you the importance of wins *relative to the total importances of all factors*. If those total importances go up, the percentage goes down.

One of those other factors is luck. The more games you have, the more the luck evens out. So if you were to analyze five seasons instead of one, the luck would drop, so the payroll-to-luck ratio would increase. Then, instead of payroll explaining only 18% of OPS, maybe it would explain 30%, or even 40%.

Going the other way, if you ran a regression on a single day's worth of games, there's a lot more luck. Over an entire season, there's no way the Yankees are going to be worse than the Brewers – but on a single night, it's quite possible that New York might lose and Milwaukee might win. So, on a single game, payroll might explain only, say, 3% of the variance.

The relationship between salary and expected wins is a constant: if paying \$200 million buys you an (expected) .625 winning percentage, it should buy you that .625 over a day, a week, a month, or a year. But depending on how you do the study, you can get a "percentage of variance explained" of 3%, or 18%, or 30%, or 40%. So, obviously, the number you *do* come up with can't, by itself, tell you anything about the strength of the relationship between payroll and wins.

-----

We know that smoking causes heart disease, and that the relationship is pretty strong.

Now: suppose you do a regression. How much of the variance in heart disease can be explained by lifetime smoking?

There's no way to tell. It depends on the distribution of smokers.

Suppose, out of the entire world population, only one person smokes. His risk of heart disease increases substantially. But how much of the variance heart disease is explained by smoking? Close to zero percent. Why? Because even in the absence of smoking, there's substantial variance in the population of six billion people. There's age, there's diet, there's exercise, and there's genetic predisposition, among many other things. The variance contributed by the one smoker, compared to the variance caused by six billion different sets of other causes, is effectively zero.

Now, suppose that half the world smokes, and half doesn't. Now, there is lots more variance in heart disease rates. Instead of just variance caused by genetics and eating habits, you now have, on top of that, 50% of the population varying hugely from the other half in this one area. If smoking is extremely risky compared to eating and genetics, you might find that (say) 40% of the variance is explained by smoking. If it's only moderately more risky, you might get a figure of 15% or something.

What if everyone in the world smokes except one person? In that case, everyone's risk has risen by the same amount, except for that one non-smoker. So the variance caused by smoking is, again, very low. It's the same situation when only one person *doesn't* smoke as when only one person *does* smoke. And so, again, about zero percent of the variance is explained by smoking.

So, in theory, how much of the variance in heart disease should be explained by smoking? It could be almost any number. It depends on the variance of smoking behavior just as much as it depends on the effects of smoking.

If you did an actual study, and you found that 18% of the variance in heart disease was explained by cigarette use, what would that tell you about the riskiness of smoking?

Almost nothing! It could be that (a) there is lower variance in how much people smoke, but the risk of smoking is higher; or (b) there is higher variance in how much people smoke, but the risk is lower. Either of those possibilities is consistent with the 18% figure.

Going back to the baseball example: if 18% of the variance is explained by salaries, it could be that (a) teams vary little in how much they spend, but money buys wins fairly reliably, or (b) teams vary a lot in how much they spend, but money buys wins fairly weakly.

Which is correct? We can't tell yet. The 18% number, on its own, simply does not tell you anything whether money can buy wins.

-----

So if the percentage of variance explained doesn't tell you a whole lot about the relationship between the variables, then what does? Answer: the regression equation.

"The Wages of Wins" doesn't give full regression results for their payroll study, but they do mention the figure of about \$5 million dollars per win. So the computer output from their regression probably looks something like

R-squared = .18
Expected Wins = 65 + (Payroll divided by 5 million)

The .18 tells us little, but the equation for wins tells us almost everything we need to know – that the actual relationship between wins and payroll is \$5 million dollars a win.

Unlike the R-squared, the \$5 million per win should work out the same regardless of whether our regression is based on a day, a season, or even five seasons. And it should come out the same regardless of whether teams vary a lot in spending, or just a little.

If you want to know the strength of the relationship between X and Y, the R-squared won't tell you. But the equation will.

Of course, the estimate will be much less precise for samples in which there's less data. If we took 162 different single days, and ran 162 different regressions, some would come out to \$20 million a win, some would come out to \$7 million a win, some would come out to \$0 a win, and some would even come out negative, to maybe -\$10 million a win. But those 162 estimates should average out to \$5 million a win, and cluster around \$5 million with a normal distribution.

If you only had a single day's worth of data, you might find that the 95% confidence interval for the cost of a win comes out very wide. For instance, the interval might say that a win could cost as much as \$20 million, or as little as –6 million. That huge range doesn't constitute very useful information – in fact, zero is in that interval, you can't even statistically significantly conclude that money buys wins at all! And so you'd probably decide not to conclude anything based on the equation, at least until you could rerun the study for a full season's worth of data to get a more precise estimate. But you might still be tempted to look at the r-squared of .03, and say that "only 3% of the variance is accounted for by the model."

You can say it, but, taken alone, it doesn't tell you much about whether there's a strong relationship in real-life terms. Only the regression equation can tell you that.

-----

So what *does* the "percentage explained" figure tell you? It tells you how much more accurate your predictions of wins would be if you had the extra information provided by salary.

Suppose that at the beginning of the season, you had to predict how many wins each team would get, knowing absolutely nothing about any of them. Your best prediction would be that each team would win 81 games. Some of your predictions would be off by many games; one or two might be right on. The standard deviation of your errors would probably have been about 12 games, which means the average variance would have been the square of that, or about 144.

Now, suppose you had, in advance, the information from the regression: the r-squared of .18, each team's payroll, the finding that each \$5 million in payroll would have bought one win. You would now adjust your prediction for each team based on its salary. You'd predict the Yankees would be at 97 wins, the Nationals would be at 76, and so on.

The R-squared of .18 would tell you that the extra information covers 18% of the variance of 144. Of the 144 points of variance, 26 points are "explained by payroll". What's left is 118 points. The square root of 118 is about 11, so the extra information allowed you to cut your typical error from 12 games down to 11 games.

The "18%" figure answers the question, "How valuable is knowing the team's payroll if you're trying to predict team wins for the year?"

And that's a completely different question – and a less important one -- than "Can increasing your payroll buy you more wins?"

One last example, just for the sake of overkill. Suppose, in 2020, a vaccine for cancer is invented. It works 90% of the time. Almost the entire population rushes out to get vaccinated.

At that time, you ask yourself these two questions:

1. If I'm trying to predict whether Joe Smith will get cancer, how valuable is knowing if he had the vaccine? Answer: it's not valuable at all – almost everyone has had the vaccine, so knowing that Joe is one of them doesn't give me any useful information. The "percentage of variance explained" is zero.

2. But how strong is the relationship between the vaccine and the incidence of cancer? Answer: extremely strong.

-----

"The Wages of Wins" study shows that payroll does indeed buy wins, at a rate of about \$5 million each. The "percentage of variance explained" is almost completely irrelevant.

Labels: ,

## Wednesday, April 11, 2007

### The MLB hitting explosion: another view

What caused the increase in MLB offense starting in the 1990s? J. C. Bradbury speculated in a recent New York Times article, and I had some comments here.

Now, I find a "Nine" article from 2002, by Benjamin G. Rader and Kenneth J. Winkle, called "Baseball's Great Hitting Barrage of the 1990s." (
Subscription required.) There, they note that there was a similar hitting increase in the minor leagues, but it started two years later, in 1995-96 instead of 1993-94. Numbers below are per 100AB:

90-95: 27.4 H, 14.3 R, 2.2 HR
1995 : 27.1 H, 14.1 R, 2.3 HR
1996 : 29.4 H, 15.9 R, 3.0 HR
96-99: 28.0 H, 15.5 R, 3.1 HR

Then, they list several possibilities for the MLB increase, and comment on each.

1. The Ball. The authors call this the "most questionable" of the possible hypotheses, citing Bud Selig's denial, and tests commissioned by MLB that found no difference in the physics of today's balls compared to yesteryear's.

I don't find this as convincing as the authors do.

2. The Bat. According to manufacturers, the authors say, the avaerage bat weight dropped from 33 ounces in 1991 to 31 ounces in 1996. "Whereas earlier in the century, players occaionally went for several weeks, and sometimes for an entire season without breaking a bat, players in the 1990s frequently broke several bats in a week of play."

This hypothesis has the advantage that it can probably be checked, if there are records of which players used which bats when. But even if fully 50% of players changed bats, those players would have to have increased their home run count by 80% to account for the observed 40% increase.

3. Cozier Ballparks. The authors dismiss this one after looking at foul territory, distance to the fences, and fence height. They don't look at runs scored in the parks, which would be a more direct way to the evidence.

4. Dilution of Pitching Talent. The authors correctly point out that expansion would have diluted batting talent at the same time.

5. Muscles. "Although we are unable to offer statistical support ... it seems likely that the increasing strength of the hitters has contributed significantly to the offensive barrage." The authors cite dietary supplements, but note that average player size and weight were "virtually identical" in 1990 and 1998.

6. A New Style of Hitting. The authors argue that the high strike was taken away from pitchers early in the 1990s, and quote a Kirk McCaskill complaint from 1994. They say that it also became unacceptable to throw inside, which again benefitted the hitters, who could stand closer to the plate. And, finally, they argue that with the availability of videotape, coaching was able to improve the performance of hitters much more than the performance of pitchers.

I'm not sure what to make of this one; it's mostly anecdote. And would all these factors have happened suddenly, between 1992 and 1994?

Hat Tip: Baseball Think Factory, which points to a
sequel to this study.

Labels: ,

## Saturday, April 07, 2007

### A new "protection" study using ball/strike data

Here's a baseball study described on a brand new blog by Ken Kovash, who works for "Freakonomics" economist Steve Levitt.

Kovash sets out to check whether "protection" exists. But rather than checking the hitter's batting line for evidence, Kovash checks what the pitcher throws him. He finds two statistically significant effects:

-- pitchers are more likely to throw fastballs when the on-deck hitter is better;
-- pitchers are more likely to throw strikes when the on-deck hitter is better.

I can't completely evaluate the study, because Mr. Kovash's blog just posts an outline of the method. I can't even be sure how to interpret the results, because he gives a coefficient without saying whether it's an increase in the probability, or an increase in the log of the odds ratio.

But I think that either way, the results are barely "baseball significant." Assuming the coefficient is a straight increase in the proportions, then:

-- An increase of .200 in the OPS of the on-deck hitter would increase the chance of a strike by about 3/10s of a percentage point (so if the pitcher would normally throw 60% strikes, he would throw 60.3% strikes instead).

-- Similarly, with the same .200 increase, the pitcher will throw 0.2 percentage points more fastballs – say from 40% to 40.2%, or whatever.

What surprises me is not necessarily that the differences are so small, but that such tiny effects are statistically significant – I guess that's what happens when you have four full years worth of data.

Also, you could argue that the "chance of a strike" number doesn't actually show "protection," since it could be caused by the batter's actions -- swinging at outside pitches he wouldn't normally swing at, or some such.

Hat tip:
"Freakonomics" blog.

Labels: ,

### So what *did* cause the home run explosion?

In a recent post, I suggested that the recent increase in home run rates might be due to players realizing that if they bulk up, some of their fly balls turn into home runs, and that might be the easiest way to increase offense.

But commenters here, and at BTF, pointed out that the increase pretty much took place instantly, over the two year period 1993-1994. And it's not likely that players bulked up instantly, so that theory is out.

So what was it that made home runs suddenly increase so much? Tangotiger pointed out
this study at highboskage.com, which suggests it was a juiced ball. Which makes sense to me, except – what about strikeouts? It wasn't just home runs that increased, but also Ks.

On his blog, Tangotiger gives us several rates of increase from 1992-94:

pre93 post93
----- -----
0.285 0.298 BABIP
0.769 0.739 contact
0.292 0.336 XBH/H
0.318 0.351 HR/XBH
0.147 0.167 HR/K
0.530 0.487 BB/K
0.217 0.254 K/outs

(glossary: BABIP = batting average on balls in play. Contact = (roughly) non-K/AB. XBH = extra base hits.)

And, of course, home runs per game, up 43%:

0.721 1.033 HR/G

A juiced ball could easily explain the increase in home runs, but what about the increase in strikeouts? How would a juiced ball cause strikeouts? It's possible that the strike zone could have been enlarged at the same time, but walks were also up those years.

One possibility is that teams suddenly decided to go for more home runs at the expense of more strikeouts. But why would that have happened all of a sudden in 1993, continued into 1994, and then levelled off until today? And it would take a lot of teams to do that to make a 43% difference in home runs. Why would they all do it at once?

A possible, but implausible, hypothesis is this:

Teams suddenly discovered that the ball would be livelier in 1993. Either the commissioner told them, or they figured it out early in spring training. They informed their power hitters that deep fly balls were likely to leave the park this year. The hitters responded by swinging for the fences more, thus increasing their strikeouts at the same time as their home runs; pitchers responded by pitching more carefully, thus also increasing walks. In 1994, the trend continued; and, furthermore, singles hitters found themselves out of a job because what were previously marginal power hitters now were able to hit well enough to take their jobs. That pushed the effect forward for another year. But by 1995, all the adjustments had been made, and offense levelled off.

If that were true, we'd expect to see more fly balls hit. But we don't. I figured the FB/(FB+GB) ratio for the years 1992-1994:

1992 ... 41.6%
1993 ... 41.1%
1994 ... 40.9%

(Technical note: a ground ball was any Retrosheet play with /G. A fly ball was any play with /F, or a home run. These numbers might be very slightly off because I averaged all the monthly figures without weighting by PA.)

My programming, or the Retrosheet data, could be off, but it does look like that in those years, fly balls went *down*, not up. So what's going on?

Another interesting thing is that the changes didn't occur gradually, month-over-month – almost all the change was between seasons:

Apr 1992: 9 HR per 500 PA
Sep 1992: 9 HR per 500 PA

Apr 1993: 11 HR per 500 PA
Sep 1993: 11 HR per 500 PA

Apr 1994: 14 HR per 500 PA
Aug 1994: 13 HR per 500 PA

What is also apparent here is that there were two separate increases, one to start 1993, and another to start 1994. And strikeouts show the same pattern – jumps between seasons, rather than within them.

So it wasn't that it was slowly and gradually dawning on hitters that the ball was juiced: any factors responsible appear to have been in place at the beginning of 1993 and 1994.

I'm at a loss. What could be going on that home runs, strikeouts, and walks would all increase at the same time?

Labels:

## Thursday, April 05, 2007

### 2007 Yankees to win 110 games, says math prof's simulation

According to math professor Bruce Bukiet, the Yankees will win 110 games this year.

Bukiet apparently uses a simulation to project the outcomes of games, and ran it on the 2007 season.

He also suggests a plan to reinvent the "win probability" approach:

Were the model to be commercialized, it could be updated on a play-by-play basis, which fans could monitor to see how every play changes the outcome of a game. “I think some fans would think that’s cool,” Bukiet said.
It's not possible to evaluate Bukiet's system from the article, but I don't think there's been any team in baseball history who would have been *expected* to win 110 games. The very few teams to win that many games mostly did so by luck. For what it's worth, here's my presentation where I found the best teams since 1961 only had 102-game talent (the 1969 Orioles and the 1998 Braves).

Also, TradeSports has the Yankees with only a 39-40% chance to win as many as 98 games, never mind 110.

(Thanks to John Matthew for the pointer.)

Labels: ,

## Wednesday, April 04, 2007

### Doug Drinen on NFL draft choices

Doug Drinen has a series of three posts on the Massey/Thaler NFL draft paper, looking at the theoretical consequences of the findings, and comparing them to what actually happened.

Worth reading the whole thing, but here's a quick summary:

-- if you try to predict the sum of a team's next three years' actual wins from the value of its draft choices this year, you get an r-squared of only about .07 (which is an r of about .26). So there is some predictive value, but not much.

-- there "traditional" draft choice values are not (statistically significantly) less predictive of future success than the Massey/Thaler values (which are different because they consider value for money, instead of just bulk value). Even if you ignore the lack of statistical significance, the difference is very small.

-- if you sum all the Massey/Thaler draft pick values for all 32 teams, they are all so close that they're virtually the same. The worst teams (first picks) do get a very slight benefit over the other teams, but it's pretty close to zero.

I previously reviewed the Massey/Thaler paper here.

Labels: , , ,

## Tuesday, April 03, 2007

### Is expansion responsible for the home run explosion?

Economist J. C. Bradbury, author of the recent (and enjoyable) book "The Baseball Economist," weighs in today on the New York Times op-ed page. Bradbury argues that recent historic highs in home runs and strikeouts are not because of steroids, but because of expansion.

His argument goes something like this:

-- Stephen Jay Gould showed, in his book "
Full House," that the lower the standard deviation of a statistic, the better the players are, overall, at the skill it measures.
-- The standard deviation of performance is higher lately for both pitchers and hitters, which suggests a dilution of talent.
-- Dilution of pitching talent gives elite home run hitters lots of inferior pitchers to tee off on (and similarly for pitcher strikeouts).
-- The dilution of talent is most likely caused by expansion.
-- And so, expansion is the most likely explanation for the recent explosion in home runs.

I don't agree. Well, I agree partially – I think expansion is certainly responsible for part of the increase, but only a small part. There are three reasons, which I'll state here, then explain:

-- expansion can be shown to have only a small effect on batting statistics;
-- only home runs have exploded, while other stats have stayed much the same; and
-- population effects mitigate the effects of expansion.

First, expansion will cause only a small increase in home runs, not a large one. Suppose the major leagues expand from 28 to 30 teams. And suppose Joe Slugger hit 25 home runs before. How many will he hit now?

Well, 28/30 of the pitchers he faces in the expansion year will be the same ones who would have had jobs if expansion hadn't occurred. So Joe will hit 25 times 28/30 home runs against those pitchers, or 23.3 home runs.

That leaves 2/30 of the pitchers, who have jobs only because of expansion. Suppose they give up 50% more home runs than the average established pitcher. (That's probably high – if the average ERA is 4.50, 50% more than that average is 6.75. Expansion pitchers aren't *that* bad.) So against the expansion pitchers, Joe will hit this many homers: 25, times 2/30, times 1.50, which is exactly 2.5 home runs.

In the pre-expansion year, Joe hit 25 home runs. Post expansion, he'll hit 25.8 home runs. The difference is 0.8. That's about 3% more.

Three percent more home runs can't explain the recent explosion. Without the 1998 expansion, the calculation shows that Barry Bonds would have hit only 70.7 home runs in 2001, instead of 73. McGwire's record would have been 68 instead of 70. Sosa would have been 64 instead of 66.

Bradbury writes that home runs per game are up 30 percent in the last decade. The last decade included only the one expansion. That could explain a 3% increase, not one ten times as large.

--

Second: why have only home runs and strikeouts risen? If the expansion hypothesis is correct, all hitting stats should have increased. Gould took special pains in his book to talk about .400 hitters, and how the recent increase in talent made it all but impossible for batters to hit .400 these days. So why haven't batting averages gone up too? Actually, I don't know that they haven't, although we haven't seen any monster batting averages in at least a couple of decades. But what about, say, triples, or ERAs?

A quick check: in 1996, there were 855 triples hit, or about 31 per team. In 2003 (the last year covered in my copy of Total Baseball), there were 934, or about 31 per team. Why hasn't dilution affected triples?

--

Third, there's the effect of population. There are more major-league players now than ever, but, also, the pool of people from which they are chosen has also increased. On page 99 of his book, Bradbury notes that, in 1970, there was one major leaguer for every 338,687 US residents. In 2000, it was one per 375,229. Going by these raw numbers, it looks like the quality of baseball should be higher now than it was then – not lower. Why weren't records being broken in 1970 the way they're being broken now?

In fairness, Bradbury acknowledges this argument and addresses it. For one thing, he says, the increase in population is, in substantial part, due to the population living to an older age. That portion of the increase obviously doesn't increase the pool of potential baseball players, and so the raw numbers are misleading. For another thing, he argues that there are so many more sporting opportunities now, relative to the past, that the increase in talent might nonetheless wind up being spread thin because of all the additional sports that attract away the talent. Put another way, although baseball roster spots may have increased only 25% since 1970, roster spots *in all sports* may have increased by much more than that.

And so, Bradbury argues, we shouldn't use population statistics as a proxy for talent – we should use only the standard deviation of observed performance.

It's a reasonable argument, and I think Bradbury might have a point that we can't rely on raw population statistics to compare 2007 to, say, 1950. But what about more recent times? The most recent expansions took place a little over a decade ago, and opened up about 15% more roster spots. Between 1990 and 2000, the population increased about 13%, and the majors started scouting increasing numbers of players from outside the US, so the population increase roughly matches the effects of expansion, even (likely) after taking into account the aging popluation.

So can we really argue that other sports sucked away substantial amounts of baseball talent specifically in the last decade? Are there lots of American soccer superstars who would have otherwise become major-league pitchers? Have the NFL and NBA expanded so much, or increased salaries so much more than MLB, that would-be baseball players are defecting to football and basketball? And is the effect so large that it would increase home runs by 30% in one decade?

It doesn't seem plausible to me.

--

So if it's not expansion, what *has* caused the modern-day upsurge in power and strikeouts? Here's my hypothesis. (I don't have any evidence to support it, but, hey, what the heck.)

Here's what I think is happening: players realize that it's hard to hit major-league pitching. And they realize that top pitchers are getting better and better, and so harder and harder to hit against.

But they've also figured this out: if they get bigger and stronger, their stats will get better even if they don't do anything different at the plate. If they work out over the winter, and maybe even dabble in steroids, they can do exactly what they did before, but some of what used to be warning-track fly balls will now become home runs.

Put another way: it's hard to improve your hitting by changing your grip, or your batting stance, or the way you react to a breaking ball. But it's easy to improve your hitting by bulking up. So that's what players do.

And that means that a larger percentage of major-league players wind up being power hitters. In the past, the guy who hit .250 with only moderate power might get beat out of a job by the .290 slap hitter. Now, he works out over the winter, gets strong enough to turn 6 fly balls into home runs, and that makes him a .262 hitter with 20 home runs instead of 14. Now, it's the slap hitter who's out of a job.

More power hitters means more strikeouts. Even if Joe Pitcher doesn't get better at all, he finds that he's not facing contact hitters who strike out every 12 AB – he's facing power hitters who strike out every 6 AB. Bingo, more strikeouts. That would be true even if the overall quality of the batters he faces didn’t change – as long as their power profile changes.

Here's how you could figure that out: find every batter from 20 years ago who was above average in runs created per game. Repeat for players from last year. Compare the two groups. If last year's group struck out more, and also hit more home runs, that would be confirmation that the increase in strikeouts didn't come solely because the pool of talent got diluted.

Labels: ,

### NCAA: Should you bench your superstar so he doesn't foul out?

In his column today in Salon, King Kaufman suggests a new strategy for NCAA teams. They should run at the opposing team's best player until he commits a couple of fouls. At that point, his coach will take him out of the game for awhile, to keep him from fouling out early. When he comes back in, they should run at him again, until he does foul out.

I don't know enough about basketball to be able to guess if this would actually work (although the idea seems interesting). But it does seem to me that the strategy of benching the fouling player until later doesn't make much sense.

Suppose, as in Kaufman's example, a player commits two quick fouls in the first three minutes. What's the point of benching him? Suppose if you didn't bench him, he would foul out after an average of, say, 25 minutes. By benching him for awhile, you might get his 25 minutes later in the game, instead of earlier. But so what? Points scored early in the first half count exactly as much toward the final score as points scored late in the second half. Unless you think that this particular player plays better in the clutch than at other times, and that there's a good chance of the situation becoming clutch without him, the benching does absolutely no good.

As Kaufman puts it,

"It's kind of like never driving your car so you don't get a flat tire, because if you get a flat tire, you can't drive your car."

The strategy obviously has a negative expectation. There's a reasonable possibility that even if you let the guy play, he won't actually foul out. In that case, you get 35 minutes out of him instead of 25. Benching him takes that possibility out of the equation. Why would you waste ten minutes of the best player on your team?

The best reason I see for taking the guy out after two fouls is if he's committing those fouls carelessly, and you want him to take a time out and relax so he'll stop doing whatever dumb thing it is he's doing. But that's not what Kaufman implies is happening. He describes coaches reflexively pulling their star in the first half after two fouls, and then in the third quarter after his third foul – the only purpose of which is to make sure he's available later in the game.

That doesn't seem to have a positive expectation. In the best case, it's neutral, and in the worst case, it's negative. Am I missing something?

Labels: