Saturday, April 30, 2011

The Canadian election gag law

Non-sports post.


Canada votes in a federal election on Monday. But it's not a free election, as Americans would see it. That's because of our election "gag law."

The law prevents any individual or organization from advertising meaningfully during an election. Not just advertising for a political party, or advocating who to vote for, but even on any issue that might be associated with any party or candidate. And even if the ad doesn't even mention a party or candidate.

Effectively, during an election, only the political parties get to speak.

(I say "effectively" because a token amount of advertising is allowed. A "third party" can spend $3,765 per riding (district), or $188,250 overall. Needless to say, it's hard to make much of a splash for that amount of money, when the parties have millions.)

In the debate about the gag law that's been going back and forth over the last few years, the public sentiment seems to be that they don't want special interest groups "buying votes." But that's wrong. You're not "buying" votes with advertising. The only way to turn money into votes is to convince voters that your position is correct.

There's no guarantee that will work, especially if your position *isn't* correct. But, let's consider a case where it *does* work. Let's suppose group X spends millions of dollars trying to convince the public to support legislation in its favor, and, after the advertising is done, more people support X than before.

Is that a bad thing? Is that "unfair"?

No, of course not. It's a good thing. Before the advertising, the public was obviously ignorant of the issue. Now hearing X's argument, they're saying, "hey, you know, they've got a point."

That's what a free society is all about, isn't it, being allowed to air your views and grievances and trying to change people's minds? If you feel that it's somehow unfair that X was able to turn support in its favor, aren't you really saying that it's better off if the public remains ignorant?

Or maybe your position is that there should be a "level playing field," that if X gets to spend money, and the anti-X group doesn't have money, the public doesn't get to hear both sides. But that's the opposite of what happens in practice. If X advertises, and X has a controversial position, then that starts a debate. The press reports on it, the parties are asked to comment on it, there are letters to the editor, and editorials in the paper. The money X spends actually *creates* a level playing field. If X doesn't spend, the issue never comes up, the debate never starts, and nothing ever changes. THAT is the unlevel playing field, where the status quo gets 100% of the voters' attention, and the views of the minority get zero.

And, besides, there's NEVER a "level playing field," in terms of equal number of words on both sides. It's never the case that one side gets exactly the same money and attention as the other. Where I live, I've seen ads that advise you to wear a helmet when you ride your bike. I've never seen an ad that tells you, hey, don't bother, the risk is pretty low and not worth the hassle. There are commercials telling you to buy American to create jobs for American manufacturing workers, but no commercials asking you to buy foreign to create jobs for American farmers. There are lots of anti-marijuana ads on TV, but no pro-marijuana ads. There are anti-bullying ads, but no pro-bullying ads.

You will almost never find a 50/50 split. But, often, you'll find a 100/0 split. Why aren't their cries of injustice about that?

The truth is, people never want restrictions on speech when they agree with the speakers. If you think a law is unjust and the status quo is evil and immoral, you want to protest, and advertise, and advocate, and scream from the rooftops. If you think the law is fine, and those who oppose it are evil and immoral, that's when it suddenly seems NOT FAIR that they get to propagandize and falsely persuade unsophisticated voters just for their own selfish interests.

We think it's OK for the anti-bullying lobby to "buy" public perception with their advertising, because, in this case, we agree with the message, and we *like* the anti-bullying lobby. On the other hand, who's going to spend money during elections? People we *don't* like. Evil corporations. Shady special interest groups. Rich people.

But the whole point of freedom of speech is that *everyone* has the right to speak, not just people we happen to agree with. This "buying votes" and "level playing field stuff" is just an excuse.

I think that kind of censorship is wrong. But, hey, maybe it's *me* who's wrong. Maybe it IS perfectly legitimate to censor the public during elections, and allow only the politicians to decide what gets said and what doesn't. In fact, most people think it's me that's wrong. The Supreme Court of Canada things I'm wrong. They upheld the gag law. They decided that it's OK for the NDP (Canada's main left-wing party) to spend tons of money advertising that banks and oil companies should be taxed more, but it's OK to legally prohibit those banks and oil companies to advertise in rebuttal. They think it's OK for Canadians to hear only one side of the debate.

Of course, the learned members of the Supreme Court are a lot smarter than I am. They're so smart that they know what the truth is already, even if they only hear one side of the argument. I'm just a normal guy, who needs to hear both sides.


Anyway, there are so many arguments for scrapping the gag law that I don't even know where to start. I sat down at my computer to start listing a few, and within half an hour, I had maybe fifty of them. You could probably add fifty more. Sorry to bore you; feel free to stop reading any time.

Even better, check out Gerry Nichols on the gag law; some of my arguments were inspired by his.


1. A few years ago, the Liberal (left-centre) government passed a law that the political parties would get funding from the government. Each party would get $2 for every vote they received in the last election.

But, that means that the parties that did well would be able to spend more money than the parties that did worse! Isn't that just as unlevel a playing field? Is it really fair that the environmentalist Green Party gets to speak only 20% as much as the Conservative party? Does money buy votes, or doesn't it? If it does, aren't you just handing votes to some parties at the expense of others?

2. Also, the (left-wing) NDP gets less money than the (right-centre) Conservatives, because they got fewer votes. Now, if the unions were allowed to advertise on behalf of the NDP, that would simply make the NDP and Conservatives *more equal* in their advertising budget. By the logic of the "level playing field", wouldn't that be a good thing?

3. What's "fair"? If party X gets more votes than party Y because it has a pretty color and a charismatic leader and the public knows little about the issues, the gag law considers that fair. But if party Y gets more votes than party X because two corporations and a union and a social justice think tank fought it out with duelling TV ads ... that's somehow not fair?

4. If money buys votes, is that because Canadians are so stupid they knee-jerk vote for whatever they hear? Do voters watch oil company ads, and they believe what they hear unskeptically? I doubt it. But even if that's true, won't they also fall for *politicians'* propaganda?

5. Often, all three major parties agree on an issue, either explicitly or implicitly. For instance, in 1984, all three parties in Ontario agreed on public funding for Catholic high schools. That meant that Catholics would have their kids' education paid for by taxes, while (for instance) Jews and Muslims would have to pay out of pocket (in addition to their taxes). That seems obviously unfair, right? And, indeed, there was a huge public outcry. But, if this happened as Federal policy today, it would be *illegal* for opponents of this discriminatory policy to advertise enough to make it a big campaign issue. And, since all three parties agreed on the policy, almost 100 percent of advertising would be in favor. That's a level playing field?

6. There are issues that all three parties want to bury. Abortion is the biggest one. Even the Conservative party, which attracts a lot of religious right-wing support, doesn't want to talk about it. When asked, I've heard Stephen Harper (the current Conservative prime minister) say, uncomfortably, that he has no intention of reopening the debate or legislating on it (there has been no new abortion law since the Supreme Court struck down the old one in 1988).

The other major parties are the same -- nobody wants to bring it up, because they can only lose votes by doing so. Doesn't that make it *more* important to allow "third parties" to bring it up? Even if you're pro-choice (as I am) and like the law the way it is, is it really fair for Canada to prohibit others from bringing it up just because the leaders, and most of their voters, prefer not to have to address it?

7. If that argument doesn't resonate with you, because you're pro-choice, imagine the reverse. Imagine that abortion was completely illegal, but none of the leaders wanted to talk about it, and the majority of Canadians agreed that it should be illegal. Is it really fair for the government to censor Planned Parenthood's ads during the election? Is it really a good idea that womens' groups aren't allowed to buy newspaper space demanding their rights?

8. If all three parties want to discriminate against gays, it's illegal for gays to respond. If all three parties want to fire all the teachers, it's illegal for the teachers to respond. If all three parties want to enact a poll tax, it's illegal for anyone to respond. How is that good for democracy?

9. Right now, even if I agree with Stephen Harper (say) on issue X, I have to let him speak for X his way. Is it really democracy when I can't express the issue myself, but have to let Stephen Harper do it for me, in a way that's inferior? Filtering opinions through a political party is like filtering single-malt scotch through his kidneys -- what comes out bears little resemblance to what went in.

10. In general, it's the fringe views that benefit the most by being heard. The gag law effectively eliminates any out-of-the-mainstream views. And many, many views commonly held today were once out of the mainstream.

11. The gag law applies only during elections. But isn't that the time when it's *most* important for people to be able to speak? Often, it's the only time their fellow citizens are listening!

12. Many voters complain about "attack ads" that don't meaningfully discuss issues. But "third parties" are usually about a specific issue. Allowing free speech will actually improve the substance of the debate.

13. If you let people advertise, and you force them to identify themselves in their advertising, that brings the lobbyists out into the open. Isn't that a good thing?

14. There is a general public belief that corporations shouldn't be able to advertise. But why not? They're regulated by government. They pay taxes. Some of them may indeed be treated unfairly. Why shouldn't they be able to speak up?

15. If a candidate proposes some unfair law against corporation X, who does it affect? Not the executives personally; they go on with their jobs and keep collecting their salaries. It's the shareholders who lose. Sometimes there are thousands of shareholders, many of whom are not that well-off, and who may be counting on the corporation's profits to fund their retirement or their children's education. It's virtually impossible for those thousands of people to form a lobby group. Shouldn't it be legal for the corporation itself to advocate for the interests of those shareholders?

16. And what about the customers? They're often the biggest losers. Suppose you're a big fan of Apple, and you're saving up to buy an iPhone. A candidate declares that if he's elected, iPhones will be banned, in order to benefit RIM, the Canadian company who makes the rival Blackberry.

Stupid law, right? But we, the public, are the biggest losers, even bigger than Apple's shareholders; the benefit we get from iPhones dwarfs the small profit Apple makes off them. So, if Apple protests that law, aren't they acting in the interests of their customers as much as their shareholders? And isn't Apple the best placed entity to advocate for those customers?

17. Corporations are seen as selfish and self-interested. But politicians are too. And that's worse, because politicians can pass laws to benefit themselves. To take just one example: elected members of parliament legislated themselves gold-plated pensions, that kick in after only six years in office. Occasionally a bit of controversy about that will resurface. But, nobody can complain about it during elections, which is when it really matters! In this case, the gag law is in the interests of nobody except the politicians themselves.

18. Generally, it costs a lot of money and publicity to change peoples' minds, whether about race, gay rights, global warming, or -- signficantly -- which candidate to elect. As a result, incumbents win a vast majority of seats, by inertia. The gag law thus confers a huge benefit on the politicians who enacted it, making it that much harder for opponents to convince voters to try someone new. Indeed, the gag law benefits any politician who is ever elected, serving to help protect him or her from losing power.

19. If, in 1981, IBM had been allowed to pass a gag law on Apple, would the world be better or worse today?

20. Not all views are equal. Suppose the government proposes a racist law. Immediately, billions of dollars are raised to oppose it. Only thousands of dollars are raised in favor of it. Is that really bad? Is it really more important to have a "level playing field" between racists and non-racists, that the non-racists shouldn't be allowed to advertise?

21. As it stands now, if you donate money to a political party, you get 75% of it back as a tax credit. If you donate money to a group opposing a government policy, you get 0% of it back as a tax credit. So it takes four times as much money to oppose government policy than to support it. So, if you really want a level playing field, shouldn't third parties be allowed to spend *four times as much* as politicians?

22. Mainstream media are not included in the gag law. So a newspaper, for instance, can editorialize in favor of a certain policy or party, and that goes out, legally, to thousands and thousands of readers on their doorsteps in the morning. Effectively, if you're a rich corporation that owns a newspaper, you get to "advertise." If you're a rich corporation that doesn't, it's illegal.

23. As Gerry Nichols has pointed out, citizens are not "third parties". Citizens are "first parties." It's politicians that are second and third parties, as elected representatives only under the citizens' consent. The right of citizens to speak is MORE important than the rights of politicians to speak, not less.

24. As I mentioned earlier, every taxpayer gives $2, with or without his/her consent, to the party he/she voted for. (BTW, doesn't that create an incentive not to vote? I don't want the party I'm voting for to get my $2. I might decide, I'll stay home, so they'll have to get their money the old fashioned way -- by earning it. But I digress.)

A better law would be to give $2, or even $5 or $10 or $20, to every citizen, to put towards whatever advertising he or she wanted. That way, you'd get a diversity of views, not just what the three parties want to say.

You'd have pro-life ads, and pro-choice ads, and "tax the banks" ads, and "lower corporate taxes" ads. The students would contribute to "lower tuition" ads, and the gay rights and anti-gay rights people would go at it in opposing commercials (and probably the pro-gay ads would seriously outnumber the opponents' ads -- but nothing wrong with that.)

Wouldn't it be amazing? Instead of just having politicians attacking each other's character, you'd have ads actually arguing about issues and legislation. You'd have a real debate.

25. People have a fantasy that the way you get change is by electing the right politician, the one who thinks like they do and will courageously face down the special interests, and heroically make everything right. That doesn't work(as a lot of disappointed people who voted for Obama would agree), and has probably never worked.

The way you get change is by slowly, gradually, moving forward in a dialogue with your fellow citizens. Think about, for instance, gay rights. Thirty years ago, gay marriage was illegal in Ontario. Now, it's legal. What happened?

It wasn't a wise and charismatic leader giving a stirring speech and passing a law. It was a change in attitudes among Ontarians. The legislature and the courts just followed along, taking positions they would not and could not have made until fairly recently. Even if the premier had believed in gay marriage thirty years ago, he wouldn't have had the votes ... and, in any case, it would have been political suicide.

Traditional politicians, and traditional political advertising, do not move the dialog along. Third-party ads do. A gag law against third-party ads is a small-c conservative policy, serving to reinforce the status quo.

If you want to help make the world better, you need to talk about the issues more, not less.

26. And new views just take a lot more persauding than others, even if they're correct. Consider gay rights again. Fifty years ago, could you have convinced a reasonable person that gays should be accepted in society? Maybe. But it would take a lot of arguing, wouldn't it? There are a lot of preconceptions you'd have to overcome.

You can probably think of a public policy position that's well-accepted, plausible-sounding, and politically correct, but just plain wrong. Mine might be, "marijuana use is harmful and should be illegal." Do you agree with me on that one? If not, take a second to come up with one of your own.

I bet that whatever fallacy you're thinking of, it can probably be expressed in just a few words. Now, how many words would it take to rebut them? A lot more. In fact, I bet that you could spend ten or twenty times times as much time and money arguing for your point of view, and that still wouldn't be enough to overcome the plausible-sounding one.

Some positions are complicated (or "nuanced", as some would say), orders of magnitude complicated than the opposing position. In those cases, money makes the playing field *more* level, not less.


27. Finally, ask yourself this: If candidate X says that blacks are inferior to whites, should blacks have the right to take out ads urging people not to vote for him? What if candidate Y says global warming is a myth, or Jews control the world, or the Muslim head scarf should be banned, or smoking should be allowed in public schools? Should environmental groups, or Jewish groups, or Muslim groups, or the Canadian Cancer Society, not have the right to rebut?

If they don't have that right -- and candidate X and Y win anyway -- I don't see how you really say that the election was fair, or even free.

Labels: ,

Monday, April 25, 2011

Did NFL teams discriminate against black coaching candidates? Part II

I posted recently about a "Rooney Rule" study that appeared in the Journal of Sports Economics. In that paper, the authors found that, from 1990 to 2002, NFL teams with black head coaches won 1.1 games per season more than teams with white head coaches. The authors took this as evidence that the NFL was discriminating against black candidates -- hiring only the best black coaches, and not the average ones.

A few more thoughts on the issue:

1. I'm not a subject matter expert (SME) on NFL coaching, but it seems to me very, very unlikely that a sample of 29 coaches, no matter how you selected them, could be, on average, as much as 1.1 games better than average. That seems way too high. Maybe one coach could, sure, under very specific circumstances (say, if he figures out he should start Tom Brady instead of Drew Bledsoe). But the average of 29 coaches? That would be nearly impossible, wouldn't it?

And it's not like the study chose the best 29 coaches -- they chose the only 29 black coaches there were. That means the best black coaches of the 29 would have to be substantially better than 1.1 wins, season after season. That, again, seems implausible.

It's a critical question, because, if the effect is too big to be coaching, the study is no evidence at all -- it literally has zero value!

Here's the logic. If you argue that the 1.1 games is statistically significant, then you're saying that there's evidence that the teams with the black coaches are significantly different, in some way, from the teams with the white coaches. You may believe that the difference is the coach's race. But since 1.1 is too big an effect to be just the coaches, the difference must be, in part, something else. So, since there must be something else going on, you have very little basis for thinking that there's evidence that even *any part of it* is coaching. After all, whatever the "something else" is, it could be just as easily responsible for all of the 1.1 as part of it. In fact, it could be responsible for *more* than 1.1 games, and the black coaches might be *worse* than the white coaches!

If you get an effect size that couldn't possibly be what you're looking for, then all you have evidence for is that there's something else causing the effect. That means there are confounding factors your study hasn't controlled for, which means you have no evidence at all for your particular hypothesis. That doesn't mean you're wrong -- it's not that you have evidence against it, it's just that you have no evidence *for* it.

This is a little bit counterintuitive -- it means a small effect is better evidence than a large effect. If you get statistical significance with a difference of 1.1 wins, that means nothing. But if you If you get statistical significance with a difference of 0.1 wins, now at least there's a chance that you're seeing something real.

2. In a different post a while ago, I quoted Bill James on psychology:

"... in order to show that something is a psychological effect, you need to show that it is a psychological effect -- not merely that it isn't something else. Which people still don't get. They look at things as logically as they can, and, not seeing any other difference between A and B conclude that the difference between them is psychology."

After this coaching study, it occurs to me that Bill's argument holds for *any* possible cause, not just psychology. Racial bias, for instance. Editing Bill's quote:

"... in order to show that something is a racial bias effect, you need to show that it is a racial bias effect -- not merely that it isn't something else. Which people still don't get. They look at things as logically as they can, and, not seeing any other difference between A and B conclude that the difference between them is racial bias."

The typical study will spend a lot of time and paragraphs and numbers persuading you that there is evidence that A and B are different at a statistically significant level. But then they'll give you only a few sentences *about what that evidence really means*. Shouldn't it be the other way around?

It's as if you're on trial for murder, and the prosecution spends five days nailing down how many millions of dollars you stand to inherit from the victim. They call a stockbroker, a banker, a real estate agent, all of whom testify for hours about how much the guy left you in his will, down to the penny. And then, after all that, the prosecutor says to the judge, "so, obviously, the accused must have done it. We rest our case."

That's backwards. Showing that A and B are different is the easy part -- it's just regression. The hard part is figuring out *why* A and B are different. Most of the effort should go into the argument, not into the statistics.

3. A reader was kind enough to send me a similar study from "Labour Economics." It's called "Moving on up: The Rooney rule and minority hiring in the NFL," by Benjamin L. Solow, John L. Solow, and Todd B. Walker. (Here's a press release.)

The authors create a model to predict whether a "level-two" assistant coach is promoted to head coach, based on performance, age, calendar year, years of experience, and race. It turns out that race is not significant, either before or after the Rooney Rule. Nonetheless, the coefficient for "minority coach" (most are black) is slightly negative (-0.6 SD) before, and slightly positive (+0.8 SD) after.

If you choose to interpret the pre-2003 coefficient at face value, even though it's not statistically significant (which I don't recommend), it's equivalent to two extra years of high-level coaching experience.

Labels: , ,

Friday, April 15, 2011

Can managers induce "career years" from their players?

Over at the "Ask Bill" section of Bill James' website, there was some discussion last week about the 1980 Yankees (subscription required; start at April 7). They finished 103-59 despite a team that didn't look that great on paper. Was it that manager Dick Howser somehow got more out of the players than expected?

A few years ago, I did a study that tried to estimate how much a team was affected by the "career years" or "slump years" of their players. (Go here, look for "1994 Expos".) What I did, basically, was take a weighted average of a player's stats the two years before and two years after, regress it to the mean a bit, and use that as an estimate of what the guy "should have" done that year. Any difference, I attributed to luck. In the 1980 Yankees case, it was 12 games of "career years" from their hitters, and effectively zero for their pitchers.

A bit of discussion followed; Bill James wrote that he wasn't convinced:

"I am leery of describing as luck things that we don't understand. It may well be that players had good years because Howser or someone else was able to help them have good years."

Fair enough. In response, I posted a short statistical argument that if it *was* the manager, it couldn't happen very often, and another reader (Chris DeRosa) disputed what I said (partly, I think, because I didn't say it very well).

Since "Ask Bill" is not a good place for a long explanation, I thought I start again here and better explain what I'm talking about.


Suppose we knew the exact talent level of every team in the majors. That is: for every single game, between any two teams, we know the exact chance either team will win. If both teams have an equal chance, it's exactly like flipping a fair coin. If the favorite has a 64 percent chance of winning, it's like flipping a coin that has a 64 percent chance of landing heads.

In real life, this pretty much the way it works. If not, the Vegas odds on baseball games wouldn't be so close to even. If you could look at the specifics of a game and have a 90% idea of who would win that day, Vegas would routinely offer 9:1 odds on underdogs. And they don't. That means that a huge part of who wins a baseball game is unpredictable.

So, a team's season record is like a series of 162 coin tosses -- heads is a win, tails is a loss. Mathematically, using the binomial approximation to the normal distribution, you can show that the SD of team wins over a season, for a .500 team is about 6.3 wins. That is, you expect 81-81, but you could easily wind up 87-75, or even 69-93, just due to luck.

The SD drops as the team gets better or worse than .500, but it doesn't drop much. If it's a .600 team, rather than a .500 team, the SD due to "coin tossing" is still 6.2 wins. Even for a .700 team, the SD is still about six games a season -- 5.8, to be exact.

Also, there's no need to keep the assumption that all games are the same. Suppose, before every game starts, you know the exact talent of both teams, and even the exact home field advantage for that game. You can even be omniscient enough to adjust for the weather, and injuries, and the fact that the starting pitcher had a big fight with his wife last night. Before the game starts, you'll have an extremely accurate estimate of the chance of the home team winning.

Still, that chance will be substantially less than 100%. You'll still have a huge amount of luck happening. Your estimate is almost always going to be less than, say, .700. It is absolutely impossible to get much better than that, for the same reason it's impossible to predict what the temperature will be exactly one year from now.

In theory, it could be predictable -- but the predictability is over uncountable numbers of molecules, beyond any possible computing capability humans could ever devise. So what is left is essentially random.

That means that, when we total up your wins and losses for the season compared to talent, no matter how accurate your talent estimates are, you're going to find that your SD is *still* around 6.2. That's a unalterable, natural limit of the universe, like the speed of light.


If you have a model for estimating team talent, a good test of that model is how close your error can get to the natural lower bound of 6.2 wins.

The most naive model is when you predict that every team will wind up 81-81. If you check that, you'll find that the standard error of your estimates is around 11 wins. If you use a prediction method like Tom Tango's "Marcel", you'll get substantially closer. You could also check any other predictions, like the Vegas over/under line. I don't actually know what those are, but I'm guessing they'd be around 8 or 9 wins.

My model is at 7.2 wins. I'm pretty sure it's better than Marcels and Vegas, but that's only because it uses more data. Oddsmakers are predicting the team's talent *before* it happens; I'm predicting it after. Obviously, I have a huge amount more information to work with. From looking at the rest of Norm Cash's career, I know that Norm Cash wasn't as good a player in 1962 as his 1961 suggested, and I can adjust accordingly. Marcel looks only backwards, so it doesn't know that.

If that seems like I'm cheating, well, not really. I'm not using the method to show how good a predictor I am. I'm using it to try to figure out, after the fact, how good a team actually was. I'm not trying to predict the future; I'm trying to explain the past.


My method works like this. Suppose you have a team that talent of X wins, but, instead, it got Y wins. The difference between Y and X is, by definition, luck. How might we measure that luck?

I think that these five measurements completely add up to the amount of luck, without overlapping:

-- how much the team's hitters got lucky and had a career year;
-- how much the team's pitchers got lucky and had a career year;
-- how much the team differed from its Runs Created estimate;
-- how much the team's opponents differed from their Runs Created estimate; and
-- how much the team's wins differed from its Pythagorean Projection.

The first two items deal with the raw batting and pitching lines. The second two items deal with converting those lines to runs. And the last item deals with converting those runs to wins. (You don't have to consider the opposition's "career year", because the opposition's career year in hitting is your career year in pitching, and vice-versa.)

Any source of luck you can think of winds up in one of those five categories. A pitcher has a lucky BABIP? That shows up as a career year. Team gets lucky and hits unusually well in the clutch? Partly career years, partly beating their Runs Created estimate. Team gets lucky and goes 15-6 in extra inning games? Shows up in their Pythagorean discrepancy. Your shortstop has a lucky defensive year? That shows up in a pitcher's career year (which is based on opposition batting outcomes, and therefore includes defense).

It's all there.


So, for every team since 1961, I figured out their luck in each of the five categories. As I said earlier, the "career year" luck was by players' talent estimates based on the four surrounding years. The Runs Created and Pythagorean estimates were straightforward.

After all that, the unexplained discrepancy, as I said above, was 7.2 games.

That seems very close to the law-of-the-universe binomial limit of 6.2 games. The difference, however, is substantial: it's 3.7 games. (It works that way because 7.2 squared minus 6.2 squared equals 3.7 squared).

What does that 3.7 represent? It's not luck we haven't accounted for, because, I think, we've accounted for all the luck. We haven't accounted for it perfectly -- Pythagoras and Runs Created aren't exact. And, of course, the way I estimated a player's talent isn't perfect either.

So, here's what accounts for that extra 3.7 game standard deviation:

1. imperfections in Pythagoras and Runs Created
2. the fact that my method of estimating talent for "career years" is probably not that great
3. managerial influence in temporarily making players better or worse for a single season (Billy Martin's 1980 pitchers?)
4. injury patterns that make players look better or worse (but not injuries affecting playing time; that's reflected in the estimates already)
5. other sources of good or bad single years that aren't luck or injuries (steroids? Steve Blass disease?)
6. other things I'm forgetting (let me know in the comments and I'll add them here).

If I had to guess, I'd say that #2 is the biggest of all these things. My method just looks at four years. It may not be regressing to the mean properly. It doesn't distinguish between starters and relievers. It doesn't consider age (which is fine for most ages, but not for, say, 27, when it should give an extra boost over the average of 25, 26, 28, and 29). It takes previous or future career years at face value, so that, for instance, it predicts Brady Anderson's 1997 expectation based significantly on his 1996. (If you showed a human Brady's entire career, he probably wouldn't weight 1996 quite so high.)

UPDATE: Tango describes it better than I do:

"As for the reason for that 3.7, a large portion of that is almost certainly the uncertainty of the true talent for each player. There’s only so much we can know about a player, given such a small sample as 3000 plate appearances, combined with such a narrow talent base that is MLB."

In light of all that, my point about Dick Howser is this: since the entire unexplained residual SD is only 3.7 games, then there can't be a whole lot of manager influence in temporarily increasing a player's talent. It's certainly possible that Dick Howser managed his team into an extra 12 games of extra talent, but things like that certainly can't happen very often.

If you square the unexplained SD of 3.7, you get an unexplained variance of about 14. Multiply that by the 26 teams that existed in 1980, and you get about 356 total units of unexplained variance.

If Dick Howsers are routine, and there's typically one every season creating a discrepancy SD of 12, that Dick Howser singlehandedly contributes a variance of 144. That's about 40 percent of the total unexplained variance for a typical league. That's a lot.

Furthermore, it's absolutely impossible for there to be an average of two and a half Dick Howsers in MLB per year, each boosting his team by 12 wins worth of talent. If that were the case, then that would account for the entire 356 units of variance, which means all the other sources of error would have to be zero. That's obviously impossible.

Even if there were only half a Dick Howser every year, that would still be 21 Howsers in the period I studied. In that case, instead of seeing the discrepancies normally distributed, we'd see a normal distribution with 21 outliers.

But we don't.

If "batting career year discrepancy" is normally distributed, we should expect about 24 teams out of 1042 to have discrepancies of 2 SD or more. The actual number of teams at 2 SD or more in the study: 25, almost exactly as expected.

We should also expect 24 teams to have discrepancies of 2 SD or more going the other way. Actual number: 22.

So there is no evidence at all that there's anything more than luck going on. That still doesn't mean that Dick Howser can't be a special case ... it could be that career years are just random, *except for 1980 Dick Howser.* But, obviously, the number alone doesn't give us any reason to believe he is. A certain number of managers are going to have as big an effect as the 1980 Yankees, regardless. (And, in fact, three other teams beat them; the 1993 Phillies led the study with a "career year hitting" effect of 13.1 games.)

So, if you think Dick Howser is something other than a random point on the tail of the normal distribution, you have to explain why. It's like when Daphne Weedington, from Anytown, Iowa, wins the $200 million lottery jackpot. You don't know *for sure* that Daphne doesn't have some kind of supernatural power. But, after all, *someone* had to win. Why not Daphne?

Labels: , , ,

Sunday, April 10, 2011

Buck Showalter's $2,000,000 tactic

From Tom Verducci's article on Buck Showalter, in the March 28, 2011 issue of Sports Illustrated:

"Showalter had schooled his players on this: runners at first and third, less than two outs and a ground ball that the second baseman fields near the baseline. Most runners on first are taught either to stop or head toward the infield grass, making it hard for the second baseman to tag them and still have time to throw to first for the double play. Showalter taught the Orioles to slide directly into the second baseman, essentially breaking up a double play in the baseline. "That's six to 10 outs a year if we do it right," Showalter said. Which is 0.2% of the more than 4,000 outs a team gets over a season."

Well, an extra six to ten outs is a lot. Plus, it's not just the outs: it's also the extra runner at first base.

Assuming the runner on third always stays put, and doing a little arithmetic with Tango's base/out matrix:

Suppose there's one out. If the team turns the double play, the inning ends and the run expectancy is zero. If they don't, it's first and third with two outs, which is worth .538 runs.

Suppose there's no outs. Runners on 1st and 3rd with one out is worth 1.243 runs. Runner on 3rd with two outs is worth .387 runs. Difference: .856 runs.

Now, most of the time there'll be one out (it's a lot easier to get two runners on with one out than with no outs). Again from Tango, it's about a 2:1 ratio of one out over no outs. That means the .538 happens twice as often as the .856, which means each broken-up double play averages .644 runs.

"Six to 10" instances of saving .644 runs is 4 to 6 runs. Call it 5.

A free-agent win is worth about $4.5 million. A win is about 10 runs. So, at free-agent rates, 5 runs is worth over two million dollars.

So Buck Showalter has saved his team $2,000,000 -- over half his salary -- in that one small on-field strategy change.


I don't know anything about on-field strategy, so I have no way to evaluate all that. So these questions are for you SMEs reading this.

Will Showalter's strategy work? Is 6-10 outs a reasonable estimate of what it saves? Are there unstated drawbacks that negate those outs?

By sharing the strategy with Sports Illustrated, Showalter runs the risk that all other teams will adopt it, completely negating the Orioles' $2 million advantage. Why would he do that?

I guess I'm thinking that the story sounds a bit too pat. But, I don't really know. Your comments?

Labels: , ,

Saturday, April 09, 2011

Did NFL teams discriminate against black coaching candidates?

The "Rooney Rule," adopted by the NFL in December, 2002, required all teams searching for a head coach to interview at least one black candidate. Between 2002 and 2009, the number of black coaches roughly doubled. Was this the result of the rule, or not?

A paper in the latest "Journal of Sports Economics," by Janice Fanning Madden and Matthew Ruther, looks at some evidence on the question. It's called "Has the NFL's Rooney Rule Efforts "Leveled the Field" for African American Head Coach Candidates?" A version of the paper can be found here (.pdf).

The authors find that before the Rooney Rule, black head coaches guided their teams to significantly superior records: an average of 9.1 wins (instead of the overall and white coaches' mean of 8). For first-year coaches, the difference was even bigger: 9.6 wins versus 7.1 wins.

They note that these numbers are consistent with the hypothesis that black coaches had to be significantly better than average to get the job. That suggests discrimination on the part of hiring teams.

Again, that was before the Rooney rule. Afterwards, there was no appreciable difference between white and black coaches. Is the difference between the two time periods significant?

The authors start by doing a t-test on the pre-Rooney race difference of 1.1 games, and they find significance at 2.57 standard deviations from the mean. However, I'm not so sure about that. I think their t-test assumes all observations are independent. In real life, they're not. A team's record this year is positively correlated with its record last year. One black coach being hired by one (perennially) good team might have made all the difference.

And, indeed, the authors do find that black coaches get hired by better teams. They don't give us the data, but they mention it:

"... the teams that hired African American coaches in the 1990-2002 period had better records prior to the hires ... "

So they run a regression that tries to control for team quality, and they still get a significant result. But that regression uses payroll as a proxy for quality. The relationship between payroll and wins is probably pretty decent, but not as good as other possibilities. I'm sure you could find lots of teams that were excellent despite average payrolls, and, again, all it might take is one black coach to be hired by such a team.

Then, they try a regression that uses the Sports Illustrated preseason prediction as a variable. Again, that's not perfect, but it should be pretty good. Actually, it should be better than pretty good. SI writers are subject matter experts, and will use a wide assortment of data to make their predictions. They're probably not perfect, of course, and they're not as good as Vegas odds might be, but I think this is a pretty good way of doing it.

And, now, the result is no longer significant. It's only 1.43 SD, and probably less when you correct for the fact that seasons aren't independent.

But, in fairness, and as the authors mention, that may understate the significance if the SI staff adjust their predictions for the realization that the coach is of higher quality. I'm guessing that's not much of a factor, though.


The authors then look at firings. Controlling for several variables, including wins, whether the team made the playoffs, how many years the coach was with the team (and the square of that figure), they find that, before the Rooney rule, black coaches were more likely to be let go. After the Rooney rule, the difference disappeared.

But ... aren't coaches fired for performance relative to expectations rather than for raw performance? Since the black coaches started with better teams in the first place, you'd expect them to get fired faster for a given record, because it's easier to disappoint from a higher level than from a lower level. If you start 10-6, and then fall to 7-9, your job will be in jeopardy. But if you go from 8-8 to 7-9, you're more likely to be safe.

Since that result is only barely significant (2.15 SD), I'm guessing that if you used more realistic "disappointment" variables, the significance would disappear.


Finally, the authors look at offensive and defensive coordinators. They find no significant difference in the performances of black and white coordinators, either before or after the Rooney Rule. However, they do find that in the entire period of the study -- 1984 to 2009 -- not even one black offensive coordinator was promoted to head coach. The authors say that's statistically significant at p=.01.

But, again, I think the authors are assuming independence, which causes the significance level to be overstated. Moreover, the authors' own Table 8 shows that black offensive coordinators worked for worse teams than white offensive coordinators. After the Rooney Rule, for instance, black offensive coordinators worked for teams in the 34th percentile of performance, while black defensive coordinators worked for teams in the 54th percentile. Perhaps that explains part of the difference.

Also, there are many comparisons in the authors' charts, so it becomes more likely that at least one of them will show significance. My unscientific feeling is that this one datapoint is a random anomaly, and, in any case, not all that significant anyway.


My overall impression when reading this paper was ... geez, there were only 29 black coaches in the pre-Rooney Rule era. Why not actually look at them and see if their performance was unexpectedly good? That would require the assistance of subject matter experts (SME) -- people who knew the NFL -- which, admittedly, is not usual for an academic paper of this sort. And, of course, any SME judgments would necessarily be subjective.

But, still, if you want to get the best answer to the question, instead of the most journal-publishable answer to the question, that's the way to do it. Maybe coach X was hired just when player Y blossomed into a superstar, and so it would be incorrect to attribute the team's playoff success to the coach. Maybe black coaches are unproven, and teams are willing to hire an unproven coach only when they have a hugely disappointing season -- which suggests bad luck, which suggests maybe they bounce back to their previous level of excellence.

If there were thousands of datapoints, you couldn't check all those things. But, 29? That doesn't seem too difficult an obstacle. And, it's telling that the regression that comes closest to doing that -- the one that takes into account the SMEs at Sports Illustrated -- was the one that didn't find statistical significance.

Labels: , ,

Thursday, April 07, 2011

"Pinburgh": pinball sabermetrics

On the weekend of March 18, I competed in the huge (and phenomenally well-run) "Pinburgh" match-play pinball tournament in Pittsburgh.

I finished roughly in the middle of the field of 173 competitors, and, I wondered, if I'm really average, what would my chances be of winning the whole thing next year just by luck?

So I wasted a day or so and wrote a simulation.

It turns out that I'm probably a 2000:1 longshot, unless I get better, or unless I'm *already* better and don't realize it. Still, on average, I should win back half my entry fee.

This is probably of no interest to more than a handful of people in the entire world, but I wrote up a whole bunch of results anyway. They're