Sunday, October 31, 2010

Can estimates of wheat production improve our evaluation of baseball talent?

Suppose that in a certain amount of playing time, a baseball player has 10 home runs. What's your best estimate of his true talent? One easy answer is: 10 home runs.

We know we can get a better estimate if we do a regression to the mean for all similar players. However, "Stein's Paradox" has a stronger result, one that sounds like it's just plain nuts. According to Stein, you don't have to use home run numbers to get your mean. You'll still get a more accurate estimate if you average the number "10" with a bunch of completely unrelated other samples, and then regress to the mean of all those numbers.

From the Wikipedia page:

"Suppose we are to estimate three unrelated parameters, such as the US wheat yield for 1993, the number of spectators at the Wimbledon tennis tournament in 2001, and the weight of a randomly chosen candy bar from the supermarket. Suppose we have independent Gaussian measurements of each of these quantities. Stein's example now tells us that we can get a better estimate (on average) for the vector of three parameters by simultaneously using the three unrelated measurements [and regressing each one to the mean of all three]."

As I said, it sounds nuts. But it's an established result. A longer explanation, with baseball examples, is here (.pdf). It's a 1977 piece from Scientific American, by Bradley Efron and Carl Morris. (Hat tip to Craig Heldreth of "Less Wrong".)

After a bit of thought and reading, I believe I may understand why it's true -- true, but not useful. I'm still a little fuzzy on it, but I'm going to write out what I think is the explanation. If there's anyone reading this who understands this stuff well, please let me know if I got it right.


Let me start with the boring, uncontroversial stuff.

Suppose a baseball batter hits .750 in a single game. What's your estimate of his true talent? It's going to be a lot less than .750. You're going to have to regress your estimate towards the mean of all players, and you might wind up estimating that he's really a .300 hitter who got lucky. (There are formal ways to make a more precise estimate, but we'll just keep it intuitive for now.)

We were able to make reasonable estimates because we already knew what mean of the population we were regressing him to: maybe about .260 or something. But what if we didn't know that?

For instance, I just made up a new offensive stat. I won't tell you what it is. But I selected a player at random, and he had a 13.43 last season. What do you estimate his talent is?

So far, your best estimate of his talent is that very same 13.43. Why? Because you don't know how good or how bad 13.43 is -- you don't know the mean to compare it to. If 13.43 is the equivalent of a .350 average, you'd regress down. If 13.43 is the equivalent of a .200 average, you'd regress up. But without an idea of where the center is, you don't know what to do with the 13.43, and your best estimate is to just leave it alone.

But now, what if I give you *three* random players? The first one is still 13.43. The second one is 9.31, and the third is 18.80. Now things are different. You now have some idea of where the mean is -- maybe around 14 -- and so you can regress all three guys accordingly. The first one regresses up a tiny bit, the second one moves up substantially, and the third one drops down. Maybe your intuitive estimates are now 13.6, 11, and 16. Or something like that.

What "Stein's Paradox" tells you is this: even though you want to estimate all three of these players separately, you get better estimates if you estimate each player based on a formula based on the observations for *all three* players.

That doesn't seem all that paradoxical ... sabermetrics has known that for a long time. (However, in 1977, when the Scientific American article was written, this might have been a surprising result.)

Where the *real* paradox appears to arise is when the three things you're looking at have nothing to do with each other. Because, as Charles Stein proved in 1955, you *still* get a better estimate when you regress to the mean, even when it looks like there's no reason to, as in the wheat/Wimbledon/candy example at the top of the post.

Suppose our formal statistical estimates are 10,000,000,000 billion bushels for US wheat, 200,000 for Wimbledon, and 50g for the candy bar. The mean is around 3,000,000,000, and we'll get a more accurate combination of three estimates if we regress all three numbers a bit closer to 3,000,000,000.

That's just weird. If we estimate the weight of a candy bar at 50g, why should we increase our estimate just because the US wheat harvest is in the millions?

It doesn't seem right, and I think that's why it's called a "paradox".


How to explain it, then? Well, for one thing, the regression to the mean for the wheat/Wimbledon/candy case is going to be infinitesimal, and so almost useless. As the article puts it,

" ... the James-Stein estimator does substantially better than the [non-regressed] averages only if the true means lie near each other ... what is surprising is that the estimator does at least marginally better no matter what the true means are."

My translation:

"... regression to the mean works really well if the talent levels are close together relative to the size of errors of the estimates ... but if they're not, then it doesn't work that great."

Or, translated into baseball:

"... regression to the mean works well in batting averages, because the SD of talent in batting average is not that much different from the amount of luck in a season's totals ... but regression to the mean doesn't work well in the wheat/Wimbledon case, because the SD of the means (10,000,000,000, 200,000, and 50) is much higher than the SD of the measurement errors."

So now we're getting closer to our intuition, which says that regressing to the mean makes sense for batting averages compared to other batting averages, but not for wheat estimates compared to candy bar estimates.


Still, why does the wheat/Wimbledon/candy regression to the mean work even "at least marginally better" than no regression to the mean? It seems the effect should be zero, since those three means appear to have nothing to do with each other at all.

That is, suppose you weigh the candy bar and get an estimate of 50g. And then someone comes up to you and says, "I estimate there were 10,000,000,000 bushels of wheat produced in 1993."

Why, then, should you suddenly say, "Ha! If you change your estimate to 9,999,999,999.999997, and I change my estimate to 50.000000001, we'll reduce our combined error!" That doesn't seem like it should be true, no matter how many decimals I go to.

But I think it does work.

To see why, let me start with a baseball example where I have only one player. I pick a player A at random, look at his season stats, and compute his "blorgs created" (BC) as 375. Should I regress to the mean? It doesn't matter, because, with only one player in the sample, his score IS the mean. That means: without knowing the properties of "blorgs created", the chance of his talent being less than 375 is the same as the chance of his talent being more than 375.

Now, introduce another random player B, and observe that he had a BC of 300. It looks like I don't have any more information about A -- but I do. Before, I only knew he was at 375. Now, I know he was at 375, and ALSO that 375 happens to be the highest value of everyone in the sample. So 375 is now more likely to be a good number, which means it was likely the beneficiary of good luck, which means my estimate of A's talent should be less than 375.

If I look at another random player, C, and he's at 322 ... well, now I have even *more* reason to believe that A's talent is less than 375. And so on.

We can generalize that to:

If you look at a bunch of different outcomes that have some randomness in them, the highest outcome will likely be higher than its talent, rather than lower.

Suppose you have ten players of varying batting talents, and each one takes 500 AB. What is the chance that the highest batting average will turn out to have been higher than the talent of the player who hit it?

It's higher than 50 percent, a lot higher. Here's why.

Consider the player with the highest talent -- call him Bob. Bob has a 50 percent chance of doing better than his talent (and 50 percent of doing worse). If he does better, then it must be true that the highest BA exceeded the talent. So there's 50% right there.

Now, if he does *worse* than his talent, consider the second-best guy, Ted. Maybe Ted is only a little bit worse than Bob.

What's the chance that Ted will finish ahead of Bob's talent? Well, there's a 50% chance that Ted will finish ahead of Ted's talent. And we decided that Bob is only a little bit more talented than Ted. So the chance of Ted beating Bob's talent is maybe 49%.

If Ted beats Bob's talent, then it must be true that the highest batting average is higher than the talent of the player hitting it -- either Ted finished first and beat his own talent (and Bob's), or one of the other eight guys did.

So now our probability is 74.5% -- 50% that Bob beat his talent, plus 49% of 50% that Bob didn't beat his talent but Ted did. And we haven't even got to the other eight guys yet! (Not only that, but we haven't considered the chance that Ted may have beat his own talent but not Bob's, but still finished first in batting.)

If we do add the others in, we'll probably wind up pretty close to 100%. In real life, we don't have ten guys at the top all so close to each other, but I'd still bet that the chance is up past 99%.

Again, what this means is: the highest result is probably higher than the talent, and needs to be regressed to the mean.

This holds even if the measurements are of different things. Suppose you take all the NHL goal scorers and combine them with the MLB home run hitters. The highest value, whether MLB or NHL, still needs to be regressed to the mean, for exactly the same reasons. And you can even combine those with other things, like, say, your estimate of the number of candies in a pack. The highest value, whether Jose Bautista, Alex Ovechkin, or M&Ms, probably needs to be regressed to the mean. It doesn't make a difference that all the measurements aren't of the same thing.

However: it DOES matter that the talent levels be close together. Why? Because, if not, then the argument a few paragraphs above will fail. Suppose you have a sample consisting of Babe Ruth, Rod Carew, Jose Oquendo, and Ozzie Smith. For that sample, what are the odds that the highest home run value will be higher than its talent? It's not 90% any more -- now, it's almost exactly 50%. Why? Because the winner is almost always going to be Babe Ruth. So the chance that the highest home runs exceeds the player's talent is exactly the change that Babe Ruth exceeds his talent, which is 50%. In the previous example, there were at least 10 guys with a legitimate chance to exceed their talent and also lead the league. Now, there's only one.

That is, suppose the Babe's talent is 50 HR. If you've got nine guys behind him with talent of 48, then it's very likely, well over 90%, that at least one of the 10 guys will hit over 50 HR and exceed his talent. But if the other guys are only at 5 HR, then nobody has a chance of hitting 50+ home runs except the Babe himself, and the chance that Babe exceeds his talent is only 50%.

That means that when you have a tight distribution of HR talent, you have to regress to the mean. If you have a wide spread of HR talent, then you don't.

Well, you *almost* don't. There is a small but non-zero chance that Rod Carew could, just by luck, hit more home runs than the Babe and surpass his talent. Because the chance is small but positive, the amount of regressing to the mean is also small but positive.

Suppose there's a 1 in 100,000 chance that Rod Carew will lead the league instead, even though his talent level is only 5 HR. Then, if the league leader hits X homers, your estimate of the talent of the league leader is (.99999 * X) + (.00001*5). That is, you DO have to regress to the mean, a tiny, tiny bit.

And that brings us to bushels of wheat vs. Wimbledon fans vs. candy bars. If you are absolutely sure that there is no way the actual candy bar weight could be higher than the number of bushels of wheat, then you don't have to regress to the mean, and Stein's Paradox doesn't apply.

But you can never be absolutely sure. There's always the infinitesimal probability that, because of measurement error, the bushels of wheat got lucky and hit 10,000,000,000 home runs instead of 50. Which means there's an infinitesimal probability that the 10 billion should actually be just 50. Which means there's an infinitesimal amount of regressing to the mean you have to do to account for that infinitesimal probability.

My guess is that for this example, the amount of regression you have to do is so small, so far down the tail of the probability distribution, that you couldn't even calculate it if you wanted to. But as small as it is, it does exist, as Charles Stein proved.


But what if the numbers are fairly close, but still unrelated, like MLB home runs per season, and NBA points per game? Couldn't we *still* get a very slight benefit by taking the batting average and adding in some irrelevant data? If we're trying to figure out how good a home run hitter Alex Rodriguez is, based on his 2008 record, won't it help, even just a little bit, to add in NBA player's game scoring numbers and regress to those, too?

I don't think so. I think there's one hidden assumption in Stein's Paradox that doesn't apply here: the assumption that we know *nothing else* except the individual numbers themselves.

If we were to add in NBA numbers, we'd be violating that assumption: we'd know which numbers NBA, and which were MLB. Stein's Paradox would help us only if we chose to "forget" that information, and the "forgetting" would lead to a big loss of accuracy.

Suppose I mix a bunch of NBA and MLB numbers 100 in total, and don't tell you which are which. Here are 10 of them:

2, 5, 10, 10, 17, 32, 50, 50

And I want to estimate the talent of each of those players.

Stein's Paradox tells me that if I want to estimate one of the "50" guys, I shouldn't just guess "50" -- I should regress him to the mean. There is a mathematical formula to tell me how much to regress him. Because I can't tell the NBA guys from the MLB guys, the regression will be the same in both cases.

But, now, suppose I know which of those guys are NBA guys:

10, 10, 32, 50

And which ones are MLB guys:

2, 5, 17, 50

Now, by running Stein's formula on each of those two groups separately, I can do better. Specifically, the MLB "50" guy is going to be regressed some (maybe to 35 home runs of talent) and the NBA "40" guy is going to be regressed more (maybe to 25 PPG of talent).

So, having the unrelated data made things worse, not better, because the Stein formula ignores that the new data has different properties (a different prior) than the old data.


What Stein's Paradox says is this:

1. You can do better by properly regressing to the mean than by not regressing to the mean.

2. If you have no other information about the data that informs your analysis, here's a formula for the amount to regress that will be an improvement.

Fine, but, for practical purposes, not very useful. First, we *already* regress to the mean. When looking at an observation, we NEVER accept it as a direct estimate of the talent behind it. We never say, "well, the Red Sox are 2-0, so their talent must be 162-0". We never say, "George Bell hit three home runs on opening day, so we estimate his talent at 486 home runs per season." We never say, "Jose Bautista hit 54 home runs last year, so we should expect the same next year."

Second, we *do* have lots of other information. For instance, we know something about the "prior distribution" of home runs and points per game. We know that we should avoid estimating a talent level of 35 PPG, because NBA players just don't have that expected level of performance. But we *can* estimate a talent level of 35 HR, because there are many players who *are* that good.

Stein's Paradox is a very interesting mathematical result, but it's not all that applicable in real life. It's like saying, "You'll get from Boston to Los Angeles faster if you jog instead of walking." You'd be right, but who was planning on going on foot anyway?


Labels: , ,

Wednesday, October 20, 2010

Maybe government should subsidize the Québec Nordiques

In the past few weeks, there's been a little bit of publicity on the part of Quebec politicians trying to get the NHL to bring back the Nordiques. While deliberately avoiding making any kind of commitment, the NHL has said that a new arena in Québec City is a prerequisite for the return of the team. And so, the Province of Québec has agreed to pay about half the cost, and is asking the federal government to pay the other half.

As far as I've read, Conservative Prime Minister Stephen Harper is not committing himself either way. Back in 2000, the Liberal government announced a plan to help out money-losing Canadian NHL franchises, but was forced to back down days later after the public reacted with outrage. At the time, Harper was head of a conservative think-tank, and came out strongly opposed:

“Canadians are being forced to subsidize millionaire hockey team owners and that’s a misconduct. ... It’s a policy which hurts taxpayers and won’t help pro hockey. After all, giving the Ottawa Senators a few million tax dollars just means they might be able to sign a second string winger.”

Now, my own views tend to be strongly libertarian. I generally favor less government, lower taxes, and privatizing government functions. But, when it comes to hockey teams, I believe the case for a state subsidy is very strong, on economic grounds, stronger than for almost any other type of government spending. I think there might be a solid argument for why the government of Canada should indeed spend money on an arena for the city of Québec, if that's what it takes to bring the NHL back.

First, let me explain what my argument is not. It's not about how building the arena will create jobs. It's not about how fans will spend money on tickets and souvenirs, thus boosting the local economy.

Those arguments don't hold water. For one example of a clear argument as to why, here's an op-ed piece by two think-tankers from the (free market) Fraser Institute. (JC Bradbury has a few posts on the subject too.)

Basically, it goes something like this: the residents of Québec City have a certain amount of disposable income to spend. If they don't spend it on hockey, it doesn't mean they'll flush their cash down the toilet. They'll just spend it on other things, like movies or restaurants.

That is, suppose the Nordiques generate $75 million in economic activity. The existence of the Nordiques didn't cause that amount to materialize, like magic, in the pockets of people in Québec. So, obviously, the $75 million in spending on the team has to be balanced out by $75 million *less* spending elsewhere.

As for jobs, there's no reason to think that the new hockey jobs created will by higher in quantity or quantity than the jobs lost by restaurant or cinema workers. So, on balance, having a new arena won't create significant economic activity in dollar terms.


So if that's not the argument, then what *is* the argument? Basically, it's that the benefits to the residents of Québec City outweigh the costs of the subsidy.

That is: suppose the various governments have to contribute $300 million to get the team back. I'm arguing that the existence of the team will raise the happiness of the fans by more than $300 million, even after taking into account all the other money the fans themselves will wind up spending on the team.

That may sound ridiculous. How can you put a number on the happiness of the fans? How can it be bigger than how much they spend on the team?

There's a concept in economics called "consumer surplus." That's defined as the difference between what you paid for a product, and the maximum you would have paid if you had to. If you're addicted to Tim Hortons coffee like I am, you'd gladly pay $3 for your first cup in the morning. But Tim's only charges $1.59. Effectively, when I buy my first coffee of the day, I've made myself $1.41 richer, just like that. I don't have an extra $1.41 in my pocket, but I'm still $1.41 better off than if Tim Hortons closed down, and I had to drink some kind of crappy coffee somewhere else, from which I derive little or no surplus.

Consumer surplus is a huge, huge part of everyone's quality of life, even though we take it for granted. We all have our personal favorite products, one way or another. Maybe you can't live without your iPhone, or your internet service, or the pad thai at your favorite restaurant. Imagine if those products no longer existed. If your iPhone disappeared, and you got back the $400 that you paid for it, wouldn't you be a lot worse off? To you, the iPhone must be worth more $400, or you wouldn't have bought it. How much more? It depends on the person. If you're an Apple fanatic, and the iPhone is something you can't live without, you might have been willing to pay up to $1,000 for it ... in which case your consumer surplus is $600. Or maybe $600 was your limit, and so you only get $100 in surplus. (Indeed, it's possible that you thought you'd like the iPhone, but you hate it, and it was only worth $300 to you -- in which case you have a *negative* surplus. But that's rare ... if you look around, most of the things around you are things that you're glad you bought and would buy again.)

Now, the fact that a product has a lot of value to a lot of consumers doesn't mean that it needs to be provided by government. If a product is in demand, someone will provide it at a profit. Apple is willing to manufacture an unlimited number of iPhones at $400, so that anyone who values it at more than that will make a consumer surplus "profit."

But with a hockey team, it's more difficult, because a hockey team has the characteristics of what economists call a "public good". Specifically, a public good has two properties: I can enjoy it without paying for it, and my enjoyment of it doesn't deprive anyone else of any of their enjoyment of it.

Neither of those two conditions to an iPhone. I can't use one without paying for it, because Apple won't let me. And if I buy an iPhone, nobody else can use that iPhone without depriving me of it.

But, for the Nordiques ... if I become a big fan, I can do that for free. And my being a big fan doesn't hurt anyone else -- in fact, it may *help* everyone else, in that they derive more pleasure being part of a bigger fan base. Indeed, even fans of the rival Canadiens might benefit: the more Nordiques fans, the more happiness in Montreal when the Habs beat them 10-0.

Public goods are usually things that are truly "public," like city-owned parks, or lighthouses, or armies. But sports teams can qualify too, even though they're privately owned.

Now, not everything Nordiques-related is a public good. Tickets to the game, for instance, are certainly not, because the two conditions aren't met. Only those who pay get to see it live, and if you buy a ticket, that's one less ticket left for anyone else to buy. But the *existence* of the team, that's certainly a public good. Here in Ottawa, there are probably tens of thousands of residents who are big fans of the Ottawa Senators, even though they may never buy a ticket. They enjoy the team for free -- well, some of them may spend a few extra pennies for a sports channel on cable, and you could argue that sitting through commercials is a non-zero cost of watching a game on TV. But their consumer surplus is still very high.

Take me, for instance. I usually go to a few Senators games every year, but last year I went to none. I don't think I even bought anything with the Sens logo on it, and I didn't buy any of the pay-per-view games. But I had friends over a few times to watch Sens games on cable, and we had a pretty good time. Cost: $0. Benefit: substantially more than $0.

Of course, just because there's a consumer surplus from a certain activity doesn't mean that government should subsidize it. Usually, when there's demand for a product, there's the possibility of making a profit on it, and the market will make exactly enough of the product to maximize consumer surplus. Colby Cosh makes that case for pumpernickel bread -- you don't subsidize it just because bread is important to society.

That's because for pumpernickel, you don't get any benefit unless you actually pay for it. Consumers don't get a lot of happiness from just following the pumpernickel market in Québec City, or from just knowing that pumpernickel exists -- they have to actually eat the pumpernickel. They can't do that unless they pay for it. The physical reality of pumpernickel means that everyone who wants to eat it winds up paying at least what it costs to make.

But for a hockey team, that isn't true. A resident of Québec may value the existence of the Nordiques at $100 a year (not including the value of seeing games in person). In fact, a million people in the Québec area might value the Nordiques at $100 a year, for a total of $100 million in value. But the Nordiques may have no way of collecting that $100 million, and, as a result, the team might not be profitable. If there were a way to somehow get all the fans to chip in what the team is worth to them (or even half what the team is worth to them), someone might create it. But there's no such way to do that.

When this situation happens in other areas, people are quick to ask the government to step in and provide the service. The Canadian Broadcasting Corporation, for instance, received a subsidy of almost $1 billion from the federal government in 2006. The idea is the same: that Canadians get a benefit from the service that the free market cannot provide, that even Canadians who don't watch the programming somehow benefit (from CBC news exposing government scandals, say), and that it's impossible to bill each Canadian individually.

And so the issue, like that of the Nordiques, is this: does the "consumer surplus" the CBC provides really outweigh the billion dollar subsidy? As you can imagine, left-wing types tend to say "yes," and others are more likely to say "no". In fact, right-wing Canadians argue that the CBC is *harmful* and has *negative* benefits because of the left-wing slant in its news coverage.

For a hockey team, there are no such political issues ... seldom do you hear anyone say that the existence of a local sports team actually makes it a worse place At worst, people will say they don't really like hockey, but I've never heard anyone actually say that the City of Ottawa would be a better place to live if the Senators left town.


I think the evidence for the Nordiques as a public good is very strong, and those like Colby Cosh (who, by the way, I normally agree with) are simply ignoring the huge improvement in quality of life when you have a local team to root for. I'm frustrated by how much hostility there is to subsidizing a hockey team, from the same people who are so enthusiastic about subsidizing other enterprises that aren't even public goods. For years, there was a drive to get a new Opera House for Toronto, and finally governments contributed $56 million to subsidize its construction. But ... who benefits from an Opera House? Pretty much only people who go to the opera ... and opera is an art form appreciated by very few people in the first place.

The place seats about 2,000. Assuming there's an average of one performance a night, and every performance is sold out, that's about 700,000 tickets a year. At an rate of five percent a year, the interest on $56 million is $2.8 million a year. So the governments are subsidizing patrons to the tune of $4 a performance. Assuming a ticket costs $40 (say), that's about a 10% subsidy.

The Nordiques would presumably sell out 41 games a year, at 20,000 seats each. That's 840,000 tickets -- almost exactly the same as the opera house. So even if a government subsidy benefitted only ticket buyers -- as Colby Cosh implies -- why should opera goers get $4, but Nordiques fans get nothing? Especially when opera fans are probably a lot better off financially than hockey fans (many of whom are children).

But, as I argued, the Nordiques benefit not just 840,000 ticketholders, but at least a million people who would become fans at some level. The proposed arena subsidy is $360 million. That's $18 million in interest per year, or $18 for each of a million fans who might benefit. For $18, you can buy an opera fan a 10 percent subsidy for five performances -- or you can bring someone entire season of hockey (on TV, admittedly, instead of in person) that otherwise wouldn't exist.

(Of course, people who benefit from the opera house will claim that there's all sorts of externalities, benefits to people like me who would never buy a ticket and don't care whether or not opera even exists. They'll talk about how it improves the culture, and makes the social fabric stronger, and exposing children to it makes them smarter. I think that's all a bunch of baloney.)

Subsidywise, hockey seems to be more than competitive with opera. Why, then, is there so much hostility?

Money, of course. Everyone hates rich people. And everyone running hockey seems to be rich. The owners are rich, the corporations who buy season tickets are rich, and, of course, the players are rich. Giving middle-class taxpayers' money to rich owners and players seems like the wrong thing to do.

But it's not, not really. We give money to rich people all the time. We pay Bill Gates to give us a copy of Microsoft Windows. In fact, we pass copyright laws to prevent making or selling copies of Windows, in order to guarantee that anyone who wants one has to make Bill Gates richer. The Government of Canada spends millions of dollars to provide Windows on almost every computer in the public service.

That is: we have laws that specifically give Microsoft a legal monopoly on Windows, so they can charge as much as they like per copy. And we pay it. The Government itself spends millions of dollars every year for licensed Microsoft software. We support these laws this because we recognize that, without the possibility of a return on investment, few people would ever make the effort to create valuable products like Windows, and we'd all be worse off.

Haven't we done the same for sports? Of course. The Stanley Cup belongs to the NHL. The WHA wasn't allowed to make a copy and call it the Stanley Cup. When it put a team in Toronto, it couldn't call them the "Maple Leafs," it had to call them the "Toros". We have decided that sports leagues should have a legal monopoly on their product, just as software companies and movie studios do.

When Colby Cosh argues against subsidies for hockey teams, he says,

"[teams] are owned by a profit-maximizing cartel which limits access and squeezes every penny it can get out of that access."

Well, yes, that's right, just like Microsoft. And it was government gave them that right to be a profit-maximizing monopoly. Because of that, the NHL found that it could best maximize its profits by limiting the number of teams. Recently, the team entered into an agreement with its players to pay them a certain percentage of revenues. That now means the players have a rational, financial reason to oppose expansion. If you put a team in small-market Quebec, it brings the average revenue per player down, and the players get less money.

It's very conceivable that a team in Québec City might not be profitable, even for the fat-cat NHL owners and players -- at least not without government help.


So what should happen?

Well, what *shouldn't* happen is where the current situation is going -- where the government builds a new arena and hopes that convinces the NHL to give them a team. That's just bad business. If it doesn't work, it's just millions and millions of dollars thrown away.

No, what the government should do is treat this as a business deal with the NHL, the same way software is a business deal with Microsoft. The government should sit down with Gary Bettman and say, look, we want a team in Québec. What do we have to do? They should work out a deal, which may involve a new arena, or a subsidy to the owner, or a subsidy every year, or a tax break.

Then, the government should look at the costs to taxpayers, and try to estimate the benefits to taxpayers. If it works out that hockey fans across Canada will benefit from having a team in Québec more than what it costs to subsidize the team, then do it. But make it a real contract. The NHL agrees to put the team back, and the governments agree to make payments. None of this, "well, come back once you've built an arena and we'll talk."

You need to have a formal agreement, in writing.


As I said, I'm normally not comfortable with the idea that government should subsidize activities that can't support themselves. So I'm sympathetic to arguments that it's not the government's job to buy a hockey team for its fans.

If that's your argument, then let me re-cast my argument another way, that might be more in keeping with your views:

By the *government's own standards,* and by the standards of people who normally favor government intervention in the economy, having a hockey team in Québec is much, much more important than almost all the other money government spends on culture. It's a much better deal than an opera house. It's a much better deal than the CBC's programming. It's a much better deal than TVOntario. It's a much deal bargain than the National Film Board. It's a much, much, much better deal than "Voice of Fire."

Canadians love their hockey. When you weigh the costs against the benefits to hundreds of thousands of hockey-mad residents of Québec City, a subsidy to an NHL team seems like a huge bargain.

I can't believe I'm saying this, but ... yes, we should consider using taxpayers' money to bring back the Nordiques..

UPDATE: added "maybe" to the title of the post to better reflect what I'm trying to say.

Labels: , ,

Tuesday, October 19, 2010

How should the mainstream media report on sabermetric research?

A lot of ideas for posts to this blog come from mainstream media reports of academic sports studies. Often, those turn out to be flawed. Take, for instance, the recent article claiming that .300 hitters hit well over .400 in their last AB because they are highly motivated to succeed. It turns out that the result is caused by selective sampling, and not actually by batters "hunkering down" (as the New York Times put it, in its headline).

However, the media don't normally follow up after a study turns out to be flawed. As a result, there are probably a lot of people running around today believing that batters do actually exhibit clutch behavior when hitting .299. After all, two Ph.D.s said so, it passed peer review, and the New York Times reported it as fact.

That can't be a good thing, to wind up with people believing something that turns out to be false -- not just from the standpoint of sabermetrics, but also from the standpoint of the press.

Is there a better way to report these things?

My first reaction is that the press should report scientific findings the same way they report claims from political think tanks or interest groups -- with a skeptical tone, and with a response from those who might disagree. But that won't happen, for several reasons.

1. The paper is often not yet published and not yet available, so who's going to be able to say what's wrong with it when they can't even see it?

2. It takes a considerable amount time for anyone else to read and digest the paper, especially if it uses complex methodology. Reporters don't have that kind of time to wait.

3. If the paper has already been peer reviewed and accepted for publication, there is a presumption that the paper is correct. And it's been thoroughly reviewed by experts with doctorates. What could an amateur skeptic bring to the story?

4. It doesn't even matter if the paper is wrong. The story is not "players hit .463 in their final at-bat because they're motivated." The story is "Academics say that players hit .463 in their final at-bat because they're motivated." That's true, and newsworthy, even if the embedded claim turns out to be false, because, at the time of publication, there's a strong possibility it might be true.

5. Reporters rely on friendly sources. They don't want to get a reputation for being hostile to academics who come to them with newsworthy ideas. Would the two authors of the .300 paper have talked to the reporter had they expected to have their paper challenged? I doubt it.

So what's the solution?

One thing I'd like to see is for the press to insist that, if they're going to publish a story, the study has to be publicly available at the time the story comes out. To their credit, the authors of this particular study had a working version of the paper on the web. But, sometimes, the paper won't come out for days or weeks. To me, when someone says, "I've discovered X is true but I won't allow you to see the evidence until next month," that shouldn't be a story. Further, it should be something that *academia* frowns upon. Science, after all, is supposed to be open and free, not something you exploit so that your institution looks good. If you're not going to allow the world to see the evidence until November 22, you shouldn't promote it until November 22.

But, having said that ... I have to admit that if academics *do* promote a finding before the paper comes out, it's still a story -- if the academics are credible, knowledgeable experts.

And, not to sound like I'm bashing academia, but ... when it comes to sabermetrics, the Ph.D. economist is usually *not* the expert -- the sabermetric community is the expert. And that, I think, is how the press needs to see it.

In my experience, the way the mainstream media works is that, when they quote a lower-credentialed party, they will almost always go to the higher-credentialed party for a counterpoint. But when they quote the higher-credentialed party, that's often enough.

And that kind of makes sense. When some amateur insists that Saturn's rings are made of beer, you publish it as a novelty story if it's interesting, but you make sure you quote a real astronomer saying the guy is nuts. On the other hand, when you're writing a story about Saturn on the science page, you quote the astronomer, but, obviously, you don't need to go to the amateur for the "beer" counterpoint.

That's a reasonable way of doing it. But what the press doesn't understand yet is that when you evaluate credentials, you have to give the nod to *subject matter expert* (as Tango calls it). In this case, that's the sabermetricians. In this case, it's the Ph.D. who's the amateur, because the subject matter isn't established economic knowledge -- it's established *baseball* knowledge. Any established sabermetrician would instantly realize that a .463 average for a .300 hitter isn't plausible at all, given what's known about clutch hitting. They'd have been able to provide a decent opposing point of view for the article, and perhaps even convince the reporter to write about the issue with a skeptical eye.

Perhaps, for that to happen, sabermetrics needs to lose a little of its "nerd" image. But, even so, it's a reporter's responsibility, when writing about scientific research, to be aware of what's mainstream expertise and what's not. Academics who don't specialize in sabermetrics, and are making surprising or outlandish claims, definitely fall into the "not" category.

Labels: , ,

Thursday, October 07, 2010

Do batters really hit .463 when gunning for a .300 average? Part III

Last post, I forgot the most important findings. Here they are now.

I searched for all PAs from 1975-2008 in the last two games of the season, where the player was hitting less than .300, but, if he got a hit that PA, he'd be at .300 or over. In those plate appearances, the players hit .300.

Then, I looked for players who were at .300 or above, but, if they made an out that PA, they would drop below .300. In those plate appearances, the players hit .297.

Labels: , , ,

Wednesday, October 06, 2010

Do batters really hit .463 when gunning for a .300 average? Part II

Monday, I reviewed a study that found that players hitting .299 late in the season managed to get to .300 suspiciously often, hitting over .400 in their last plate appearance of the season. The authors argued that this is a result of the players expending extra effort to improve performance with .300 on the line. However, I argued that it was just a case of selective sampling: many players, once they hit .300, are benched for the remainder of the season, so cause and effect are reversed: getting a hit causes a certain at-bat to be the last.

After I wrote that, I found that the authors say their results account for substitutions. In footnote 4, they say that they didn't actually use the last plate appearance, but the last *scheduled* PA. So if they were pinch hit for after getting a hit to get to .300, that player would go into the study as "substituted for" instead of "base hit". However, the authors mentioned only pinch hits, and it's possible that if the player was taken out for defensive substitute, or a pinch runner, his hit would still have stayed in the study.

Because the paper is so unclear, I decided to try to reproduce the results. Using Retrosheet data like the authors did, I found every player from 1975 to 2008 who was hitting .299 before his last plate appearance of the season. (However, unlike the authors, I used only last PAs after September 25.) My results were similar to the study's.

According to the New York Times article describing the study, there were 62 players who went into their last PA at .299, and recorded an at-bat. My attempt, however, found 68 (listed here). Those "last PAs" break down as follows:

-- 33 played the entire final game and hit .242 in their last PA

-- 13 were replaced during the final game and hit .692 in their last PA before being replaced

-- 22 didn't play the final game and had hit .636 in their last PA before being sat for the rest of the season.

I have six players the study didn't. My best guess is that 6 of the 13 who were replaced during the final game were pinch hit for, and those were the ones the original study omitted. I'm pretty convinced that's what's going on and that I reproduced the results fairly. That's because the batting averages seem to be about right, but mostly because both the original study and my replication show that these players had zero walks -- and zero walks in 60+ PA is very unusual.

If that's the case, and the study only controlled for pinch hits, there's still a large amount of selective sampling in the 22 players that didn't play the final game. 14 of those 22 guys got a hit. How did that happen? Normally, it would take about 47 AB to get to 14 hits.

What happened is that there probably *were* 47 of those guys. The other 25, who made an out, dropped to .297 or .298, and therefore weren't benched -- they played again to try to get past .300. Therefore, they weren't included in the study, because that hitless AB wasn't their last.

So, the conclusion remains: the effect the authors found is simply the result of players quitting immediately after they get the hit that pushes them over .300.


Just in case you're not convinced, I decided to check how well players hit late in the season when they're shooting for .300, in a method that avoids the sampling bias. Instead of checking just their last plate appearance, I checked a group of plate appearances chosen in advance.

I looked at every team's last two games of the season, and took every PA by a player who went up to the plate hitting .299. That is: not just their last PA, but *any* such PA in the last two games. That way, there's no bias: that PA winds up in the sample if he gets a hit then quits for the year, but, unlike the original, it also winds up in the sample if he makes an out, and bats again later to try to make up the ground he lost.

The results: not much different from other at-bats. Those players hit .313, going 101 for 323 with 23 walks.

What about batters hitting .300, who, if they made an out, would drop to .299? Those guys hit only .288.

Batters hitting .298, still close enough to three hundred to have a decent shot, hit .310. And batters currently at .302 subsequently hit only .289. In chart form:

.298: .310 in 365 AB, 45 walks
.299: .313 in 323 AB, 23 walks
.300: .288 in 233 AB, 30 walks
.301: .289 in 277 AB, 37 walks

Not a whole lot different than you'd expect. Combined, the .299/.300 group hit .302. That's far from the .463 cited by the Times, for the biased sample.

I then tried something slightly different. I stopped the clock before the team's last two games, found all players who were between .297 and .302 at that point in time, and then checked their performance afterwards, regardless of how their batting average may have moved up or down during those two days. That's under the assumption that if the batter is that close that late in the season, every at-bat is critical in his quest to finish at .300 or above.

The results: those players hit only .302.

However, they did hit higher than other nearby groups, as seen below. Plus, you'd expect the players in the .297-.302 group to regress to the mean a bit, and maybe hit .290 or something. So not only did they beat the surrounding groups, but they also beat their expectation.

.291 - .296 — 692/2434 (.284), 229 BB
.297 - .302 — 593/1962 (.302), 205 BB
.303 - .308 — 421/1587 (.265), 167 BB

This is very slight support for the hypothesis that players on the cusp do succeed in pushing themselves a little harder towards .300. I say "very slight" because the standard deviation for these batting averages is about 10 points, so the results certainly aren't statistically significant. Personally, I expect they're mostly random, and that if you did this same study for other seasons, you'd find the effect goes away. But there still might be something there.


So I think the case is pretty well proven:

1. The factoid, "players hitting .299 or .300 batting a whopping .463 in their final at-bat" is true -- but it's the result of cherry-picking the AB in the sample. If the player got a hit to pass .300, it was likely to *become* his last at-bat, as he tended to sit out the rest of the season. But if he made an out, the AB wouldn't be his last.

2. If you look at at *all* AB, not just the cherry-picked "last" ones, players around .300 hit only slightly better than expected, not statistically significant at all.

So it's fair to say that, while there does seem to be motivation to hit .300, that doesn't seem to translate into higher performance when it matters. However, there is strong evidence that players who have achieved the .300 mark late in the season are motivated to stay out of games in order to avoid dropping back to .299 or below. It's the result of that motivation that biases the sample, creating the false impression that batters demonstrate superstar talent in those situations.


P.S. Lots of discussion on this issue at "The Book" blog. Thanks to commenters there, especially Guy and MGL, for their assistance.

Labels: , , ,

Monday, October 04, 2010

Do batters really hit .463 when gunning for a .300 average?

A player enters his final plate appearance of the season batting .299 or .300. Presumably, he wants to finish at .300. What happens?

He hits very, very well. According to the New York Times, describing a soon-to-be-published academic study,

"[The academic authors] found that the 127 hitters at .299 or .300 batted a whopping .463 in that final [plate appearance], demonstrating a motivation to succeed well beyond normal (and in what was usually an otherwise meaningless game)."

.463! Holy crap!

But ... do you see what might be going on? Selective sampling. The "final plate appearance of the season" situation is not known beforehand. It could just be that if that player gets a hit, and passes .300, he's removed from the game. So the "final appearance" sample will be biased in favor of players who got a hit.

Suppose you play a game with dice. Every roll is an at-bat. Every 1 or 2 is a hit. The expected batting average is .333. You plan to simulate a player's season of 500 AB. However, as soon as his batting average passes .300, you stop dead, and that's the end of his season.

What will you find? Over 99 percent of the time, the last AB will be a hit. (The only time it won't is if you go all 500 AB without ever passing .300.) Those dice "players" will hit over .990 in their last AB, the one where they first achieved .300. It's obviously not because the die has a motivation to succeed.

That is: it's not that the situation of being able to pass .300 makes their last AB productive. It's probably that being productive while passing .300 *makes the AB their last*. At least that's what I think is going on.

In fairness, there is a bit of evidence that there may be a motivational component too. First, when hitting .299, none of the 61 players involved walked in their final plate appearance. 0 walks for 61 does suggest those players were specifically gunning for .300.

Also, the study included batters hitting both .299 and .300. Those hitters already at .300 obviously weren't sat out, at least not right away. There were 66 players already at .300 ... those guys must have hit pretty well in order to keep the overall average at .463. (If they hadn't, presumably the study would have talked only about the .299 hitters.)

If it were indeed all a result of benchings, how many benchings would it take? Regressing to the mean a bit, suppose the 127 hitters in the sample had talent of about .290. The difference between .290 and .463 in 127 AB is the difference between 37 hits and 59 hits. That means 22 hits need to be explained by benchings. How many benchings would that take? More than 22 (because some of the benched guys might have passed .300 anyway in later AB.) Maybe we can take 31 as an estimate ... if those 31 weren't benched, 9 would have got a hit next time up, staying at .300, and 22 would make an out, dropping back below .300.

Or, what's the farthest we can go without statistical significance? Two SDs of batting average in 127 AB is about .080, or 10 hits. If, by random chance, the hitters were 2 SD better than normal in those situations, then there's only 12 hits left to explain, and only about 17 benchings are required.

(You can probably go even a bit lower if you take into account that some of the 127 PA were walks, meaning the .463 is based on less than 127 AB. UPDATE: David Pinto finds that it could have been 57-for-123.)

Is 31 benchings out of 127 batters reasonable? Is 17 benchings reasonable?

I don't know, but benchings can't be that rare. In 1980, I remember Bobby Mattick sitting Alvis Woods after he got to .300 in the last game (I can't find a reference, but the box score confirms my memory, for what that's worth). And, yesterday's NYT article actually mentions a more recent case (but doesn't seem to realize the implication for the study's findings):

"Five years ago, in a meaningless 162nd game against the Yankees, [David] Ortiz entered batting .299 for the season; he struck out in the first inning to drop to .298 and walked in the third, knowing he still had a few more chances to swing for .300.

One inning later, Ortiz singled to reach .300. He batted one more time in the sixth — he walked, refusing to swing at anything that might result in an out — and was, because of the statistical awareness of Manager Terry Francona, replaced on the bases to make sure that .300 season average would last forever."

So I'm skeptical. I guess we have to wait for the study, by Devin Pope and Uri Simonsohn, to come out to find out what's really going on. The Times says it's been accepted for publication in "Psychological Science", but not yet available.

Or, if anyone wants to reproduce the study, and check to see how many .300 hitters ended their seasons a couple of AB earlier than expected ...

UPDATE: commenter David N. kindly posted a link to the actual study.

If you look at the study, the authors actually show evidence that pinch-hitting is the cause! However, they didn't get the significance of that data.

In their "last scheduled plate appearance of the season", the average batter was pitch hit for 7 percent of the time.

But batters with a .298 or .299 average were pinch hit for only 4.1 percent of the time. Batters with .300 or .301 were pinch hit for 19.7 percent of the time.

And, most importantly, batters hitting exactly .300 were pinch hit for 34.3% of the time!

That basically confirms that the authors' results are likely to be the result of cherry-picking. If you're hitting .299, you get a chance to get a hit in your last AB to jump to .300. But if you're already hitting .300, you often don't get a chance to drop back to .299, getting an out in your last AB.

You know how when you looking for something you lost, you always find it in the last place you look? Well, the same thing applies here. When you're looking for .300, you find it with a hit in the last AB you take.

UPDATE: Over at "The Book," commenter Guy points out that I'm overstating the case a little bit. In computing the batting average, the study did ignore players who were pinch-hit for in their last game. However, it did *not* seem to ignore players who were pinch-run for, or replaced defensively before their next plate appearance. So the selective sampling issue remains.

Labels: , , ,