Saturday, December 31, 2011

Do NHL referees call "make up" penalties?

Among NHL fans, there's a perception that referees like to call "make up" penalties. If a ref has just called a minor penalty on one team, it's very likely that the next penalty will go to the other team.

I was skeptical, until I downloaded a bunch of data from The Hockey Summary Project ... they're like Retrosheet for hockey. (Their website is here, and if you want data downloads, you can join their group by going here.)

I looked at all penalties from 1953-54 to 1984-85 (for which the HSP data is almost complete). I eliminated all cases where there both teams got penalties at the same time. Then, I checked what was left, to see if the team that got the current penalty was less likely to get the next one.

Absolutely, very much so. There's a 60% chance the next penalty will go to the other team -- 59.7%, to be more exact. (But, since I'm not sure that database is complete, and I forgot to remove misconducts, and I didn't consider situations where both teams got a penalty but one team got an extra one, I'm happier to drop the decimal and just go with 60%.)

The effect is reasonably consistent over time, although it was a little stronger back in the six-team era. Here's a too-long chart.

1953-54: 62.7 888/1416
1954-55: 61.3 857/1397
1955-56: 60.6 912/1506
1956-57: 61.2 833/1360
1957-58: 58.8 793/1348
1958-59: 61.4 801/1305
1959-60: 62.3 723/1160
1960-61: 61.7 740/1199
1961-62: 62.0 821/1324
1962-63: 62.0 797/1286
1963-64: 61.8 826/1337
1964-65: 61.7 841/1362
1965-66: 58.6 820/1399
1966-67: 58.6 710/1212
1967-68: 60.0 1515/2527
1968-69: 59.5 1666/2800
1969-70: 60.5 1793/2962
1970-71: 60.4 1944/3220
1971-72: 57.8 1918/3317
1972-73: 60.6 2200/3633
1973-74: 58.8 2135/3628
1974-75: 56.7 2873/5069
1975-76: 56.3 2890/5130
1976-77: 57.2 2371/4144
1977-78: 59.1 2316/3916
1978-79: 59.5 2337/3925
1979-80: 59.3 3021/5091
1980-81: 60.1 3780/6293
1981-82: 60.7 3613/5957
1982-83: 60.3 3479/5774
1983-84: 60.2 3788/6296
1984-85: 60.6 3542/5847
Overall: 59.7 39597/58543

Even though the effect is real, we can't say for sure that it's referee bias. It could just be that, after a penalty, the penalized team plays more cautiously, trying to avoid a second penalty. Or, it could be that the just had the power play decides to play more aggressively.

(As an aside: why did penalties drop so much between 1975-76 and 1976-77? At first I thought it might be bad data, but then I checked power-play opportunities on Hockey Reference, and it checked out.)

Here's what I think is some relevant evidence. I broke down the stats by referee (minimum 300 datapoints). The database only has the referee named for about a quarter of the total games (mostly older ones), but I figure it's probably good enough to at least look at.

The first column is the main number, the percentage of penalties called against the team who drew the last one.

Pctg Z-sc Size Ref
---- ---- ---- ----------------------------
59.3 00.0 0509 Andy Van Hellemond
60.4 +0.4 0846 Art Skov
59.3 -1.2 1128 Ashley
60.0 00.0 0460 Bill Friday
57.0 -1.1 0537 Bob Myers
58.9 -0.3 0878 Bruce Hood
57.1 -0.9 0580 Bryan Lewis
59.3 -1.6 1453 Buffey
64.6 +1.6 0933 Chadwick
59.4 -0.1 0567 Dave Newell
60.4 -0.3 0356 Farelli
60.1 -0.3 0511 Friday
59.9 -0.1 0696 John Ashley
61.3 +0.8 0359 Lloyd Gilmour
61.3 -0.2 0789 Macarthur
56.7 -1.4 0319 Mehlenbecher
63.0 +0.5 0327 Olinski
60.4 -0.8 1906 Powers
55.2 -2.2 0698 Ron Wicks
63.7 +1.7 1095 Skov
57.7 -3.1 2197 Storey
63.1 +2.7 4717 Udvari
59.9 +0.4 0709 Wally Harris

The least "biased" referee is 55%, and the most "biased" is 64%. If you think it's only referee bias that keeps the numbers from being 50%, you'd have to think that EVERY referee is biased almost exactly the same way. It's hard for me to accept that none of the referees noticed the bias and saw fit to try to eliminate it.

The second column of the table is the Z-score, the number of standard deviations the referee is from expected (which is normalized to the seasons he officiated). Normally, you concentrate on those with at least plus or minus 2 SD. That gives you Red Storey and Ron Wicks (less biased than most) and Frank Udvari (more biased than most).

The standard deviation of the Z-scores was 1.29. If every referee were the same, and differences were only random, it would be 1.00. This suggests that there are real differences between referees. Specifically, the SD of referee tendencies (or "talent", you might say) is 0.8 (since 1 squared plus 0.8 squared equals 1.29 squared).

In English, you can perhaps interpret that as saying that the differences in the table are about half real and half random, with a little more random than real (since 1.00 is a little higher than 0.8).

The observed range is 55 to 64. Regressing to the mean, the actual range of referee tendencies is probably 57 to 62, or something like that.

So if you think it's referee bias, you have to explain why all the referees seem to be biased within such a tight range, especially, when, presumably, they are all working hard to be as unbiased as possible.


Here's another interesting breakdown, by time since the previous penalty:

0:01 to 1:00: 69.1 (7000)
1:01 to 2:00: 64.7 (9444)
2:01 to 3:00: 68.5 (11778)
3:01 to 4:00: 64.2 (10574)
4:01 to 5:00: 61.2 (8831)
5:01 to 6:00: 59.7 (7470)
6:01 to 7:00: 58.9 (6328)
7:01 to 8:00: 58.3 (5333)
8:01 to 9:00: 56.6 (4399)
9:01 to 10:00: 55.5 (3719)
10:01 to 11:00: 56.7 (3000)
11:01 to 12:00: 55.5 (2591)
12:01 to 13:00: 53.8 (2193)
13:01 to 14:00: 55.2 (1837)
14:01 to 15:00: 53.3 (1565)
15:01 to 16:00: 53.1 (1376)
16:01 to 17:00: 53.1 (1135)
17:01 to 18:00: 52.4 (1019)
18:01 to 19:00: 51.9 (807)
19:01 to 20:00: 53.6 (757)
20:01 to 99:99: 51.8 (3883)

The longer the interval since the previous penalty, the less likely the next penalty will go to the other team. That's consistent with many theories. The "referees are biased" theory would say that referees "forget" to even things up as the game goes on. The "other team wants revenge and plays aggressively" theory would say that if they don't get revenge early, they don't need it as much later. And the "penalized team takes fewer chances" theory would say that as time goes on, the players "forget" that they have to be more careful.

So, the data doesn't help us choose, but it's interesting nonetheless.

By the way, the 1:01 to 2:00 group is an exception to the pattern, but that's probably due to power plays, since the first penalty is probably still in effect. Actually, I'd have expected that part to go the other way, with the first two minutes being *more* than 50 percent, on the logic that the shorthanded team playing in the defensive zone is more likely to be forced to take a penalty. But, that doesn't happen.

And here's an interesting breakdown of the first half of the first group:

81.9% within 5 seconds
78.1% between 6 and 10 seconds
76.0% between 11 and 15 seconds
73.8% between 16 and 20 seconds
69.3% between 21 and 25 seconds
67.0% between 26 and 30 seconds.


Finally, one more question: after one team gets, say, four straight penalties, what happens then? Is there an even stronger bias for the other team to take the next penalty?


57.1 after exactly 1 in a row (64858 datapoints)
64.0 after exactly 2 in a row (23850)
66.6 after exactly 3 in a row (7042)
66.0 after exactly 4 in a row (1781)
63.8 after exactly 5 in a row (442)
60.6 after exactly 6 in a row (127)
67.5 after 7 or more in a row (40)


So: what's going on? Any ideas?

UPDATE: Part 2 is here. Part 3 is here.

Labels: , , ,

Wednesday, December 28, 2011

The best goalies should play for the worst teams

Last week, I described a way to look for "bad team" goalies as described by Ken Dryden. I don't think that method is going to work ... the sample sizes are too small.

But, while doing the math (which I'll spare you), it occurred to me that there IS a class of goalies that could be considered "bad team" goalies, in the sense that they're more valuable to a good team than a bad team. That class of goalies is simply ... the best goalies.

The worse the team defense, the more shots the other team gets. So the great goalie will wind up saving a lot more goals for a bad defense than a good defense. It's like how a policeman is more productive in a bad neighborhood.

I guess this isn't a new realization ... people have said that Ken Dryden was wasted, a bit, in the Montreal goal for so many years ... there was very little for him to do. He would have had more value to a team with a worse defense -- at least in terms of goals saved.

So, if teams are rational, you should see the best goalies playing behind the worst defenses.

Or maybe not. I'm surprised at how tiny the effect is. Eyeballing last year's NHL stats, it looks like bad teams gave up maybe 125 more shots than average.

The best goalie in the league might be 2 SD above average, or .008. Multiply 125 by .008, and you get ... exactly one goal. So, even a great goalie is worth only one more goal to a bad team than to an average team.

That assumes that all shots are equal. Suppose bad teams give up harder shots, and good teams give up easier shots. Maybe that doubles the effect. In that case, the advantage becomes two goals. So moving from the best team to the worst is worth four goals -- from two goals worse than normal, to two goals better than normal.

Hmmm ... maybe not as tiny as I thought. To get four goals of goalie improvement is the equivalent of 3/4 of a standard deviation in goalie talent. For a team, saving four goals should get you, what, maybe a couple of points in the standings?


I'm sure this is old hat to hockey sabermetricians, but this is the first time it seriously occurred to me that the same player can be more valuable, in terms of influencing the score, with a bad team than a good team.

You've also got the punter in football ... the worse the offense, the more fourth downs, so the more important punting is overall. And, maybe, the safety: he's the last line of defense, so he gets more chances when his teammates fail to make the tackle before him.

In baseball, good fielders are more valuable on bad teams, since bad pitchers allow more balls in play. Also, a bad team will have a lot more men on base than normal, which means more double-play opportunities. Also, a strikeout pitcher is more valuable on a team that doesn't field well.

You might also argue for the NHL enforcer, if his job is to start fights when his team is behind, and you also accept the premise that the goonery actually helps the team come back.

Which is the strongest example, the one where the player adds the most value moving to the bad team? I'd guess the NHL goalie, but, really, I have no idea.


Tuesday, December 27, 2011

Will taxing the rich improve democracy?

Many people believe that income inequality in our society is too high. I generally don't agree (some of my reasons are here), but I'm open to arguments. However ... not the particular argument that Ian Ayres and Aaron Edlin made in the New York Times last week (and later added to at the Freakonomics site), that income inequality undermines democracy:

The progressive reformer and eminent jurist Louis D. Brandeis once said, “We may have democracy, or we may have wealth concentrated in the hands of a few, but we cannot have both."

Brandeis understood that at some point the concentration of economic power could undermine the democratic requisite of dispersed political power. This concern looms large in today’s America, where billionaires are allowed to spend unlimited amounts of money on their own campaigns or expressly advocating the election of others.

What we call the Brandeis Ratio — the ratio of the average income of the nation’s richest 1 percent to the median household income — [now stands at 36]. We believe that we have reached the Brandeis tipping point. It would be bad for our democracy if 1-percenters started making 40 or 50 times as much as the median American.

Enough is enough. Congress should reform our tax law to put the brakes on further inequality. Specifically, we propose an automatic extra tax ["Brandeis Tax"] on the income of the top 1 percent of earners -- a tax that would limit the after-tax incomes of the club to 36 times the median household income.

So, their idea is: some people have so much money more than the rest of us that they can disproportionately affect the outcome of elections. Therefore, we should tax them to make the income distribution more equal. That way, the rich will have less money, which means they'll spend less influencing politicians, and democracy will be stronger.

There are so many reasons to disagree with this that I won't list them all. But, the most obvious one: are the authors really saying, with a straight face, that a tax on the rich will make them less politically active? Would that work with any other group? "Hey, black people are starting to march on Washington. Let's put a special surtax on black people. That'll quiet them down!"

And that's not even the biggest problem. The biggest problem is that Ayres and Edlin don't own a mirror.

Think of the most important ways that the USA and Canada are different, in terms of government policy, than they were a couple of generations ago. If you were to ask me, the biggest changes are things like: The elimination of racial segregation. Women's Rights. Gay Rights. The Canadian Constitution and Charter of Rights. A more peaceful world. A better welfare system. Somewhat lower crime. Socialized medicine (in Canada).

Add your own. Then, ask yourself, how many of them have much to do with rich people giving money to politicians?

Take, for instance, racial segregation. How did that change? Did some rich black guy come along, slip a politician a couple of million dollars, and then, wham, suddenly everyone gets to eat at the same lunch counter?

Of course not. Racial progress happened by a change in the attitudes of Americans, not by the actions of politicians (who acted late, only in response to public demand.) What happened is that Americans fueled the process. They read the newspaper, and debated, and protested, and wondered, and marched, and argued, and pondered, and chatted at the water cooler, and gave speeches, and rode buses, and watched TV.

And, slowly, day by day, week by week, people's views changed.

It had nothing to do with where wealth was concentrated, and nothing to do with the richest 1% of Americans donating money to the right politicians. The "power" WAS dispersed; it was dispersed among millions and millions of citizen voters. The wealthiest black person in the country couldn't possibly have used political contributions to speed up the process much before its time.

Of course, some people had more influence than others, like Martin Luther King. And, the media: newscasters,
and columnists, and reporters; TV, and radio, and newspapers. Someone writing for the New York Times, for instance, would get read by millions of people. One New York Times is the equivalent of thousands of water coolers.

That means that if you want to argue that inequality is undermining democracy, you shouldn't be thinking about money. You should be thinking about public discourse.

Let's suppose that over the last year, the Times has had five or six opinion pieces per day. That means that, at most, around 2,000 Americans got to have their voices heard in the op-ed pages of the New York Times. Many writers, of course, appeared more than once. For the sake of argument, let's say there were 1,500 different writers. The US population is 300 million, which gives us this shocking measure of inequality of influence:

The top 0.0005% of US writers wrote 100% of the New York Times op-eds.

Compare that to money: by my estimation, the top 0.0005% of US households earned less than 2% of the overall income.

By this measure, that means the concentration of op-eds is FIFTY TIMES AS HIGH as the concentration of income. Fifty times.

But that's not really fair, since the New York Times isn't the only place you can broadcast your influence. Let's suppose the top 100 venues -- newspapers, TV, or blogs -- have 100,000 different writers, and comprise 75% of the total political influence in the US. Compare that to the top 100,000 households in income:

0.03% of the population has 8% of the income.
0.03% of the population has 75% of the influence.

So, if democracy is compromised by the unequal distribution of income, how can you say it's not compromised by an unequal distribution of public access to opinions and influence -- especially when the latter inequality is EIGHT TIMES AS HIGH?

The fact is that Ian Ayres, with at least three other op-ed pieces in the New York Times in the past ten years -- 50,000 times as many as the average American -- has much, much more influence on public policy than your typical rich guy. And Paul Krugman, the Times economics columnist, writes around 100 columns a year -- fifteen million times as many as the average American.

"Enough is enough," indeed.

I propose
a "Krugman Tax." Every year, we'll compute the "Krugman Ratio," the number of words the top 100,000 writers published, divided by the number of words the average American published. If it's more than, say, 40, those writers will be taxed just enough words to bring the ratio back down to 40 in future. That will ensure that everyone, not just the ultra-published like Ian Ayres and Paul Krugman, can influence American political decisions.

We'll do it for democracy.

Labels: ,

Wednesday, December 21, 2011

Are there "good team" goalies and "bad team" goalies?

In "The Game," Ken Dryden argues that some players are not psychologically suited to playing on good teams:

Because the demands of a goalie are mostly mental, it means that for a goalie the biggest enemy is himself. The fear of failing, the fear of being embarrassed ... The successful goalie understands these neuroses, accepts them, and puts them under control. The unsuccessful goalie is distracted by them, his mind in knots, his body quickly following.

It is why [Rogie] Vachon was superb in Los Angeles and as a high-priced free-agent messiah, poor in Detroit. It is why Dan Bouchard ... lurches annoyingly in and out of mediocrity. It is why there are good "good team" goalies and good "bad team" goalies -- Gary Smith, Doug Favell, Denis Herron. The latter are spectacular, capable of making near-impossible saves that few others can make. They are essential for bad teams, winning them games they shouldn't win, but they are goalies who need a second chance, who need the cushion of an occasional bad goal, knowing that they can seem to earn it back later with several inspired saves. On a good team, a goalie has few near-impossible saves to make, but the rest he must make, and playing in close and critical games as he does, he gets no second chance.

A good "bad team" goalie, numbed by the volume of goals he cannot prevent, can focus on brilliant saves and brilliant games, the only things that make a difference to a poor team. A good "good team" goalie cannot. Allowing few enough goals that he feels every one, he is driven instead by something else -- the penetrating hatred of letting in a goal.

Dryden seems to be saying at least three things here:

1. Some goalies, like Rogie Vachon, can't handle pressure.
2. Some goalies are better on bad teams than on good teams.
3. Those two groups are the same goalies.

I'm very skeptical about #1, especially with regards to Rogie Vachon. Yes, Vachon had a serious decline after leaving the Kings -- with Detroit, he was worse by more than a goal a game (3.90 to 2.86). But, was it really Vachon's neuroses? After all, he was 33 years old that year. Dryden may know Vachon pretty well -- they were together on the Canadiens for a few months in 1971 -- but is that enough for him to conclude that Vachon's problem is that he choked under pressure?

I'll skip over #3, also, and concentrate on #2, the part about "good team" goalies and "bad team" goalies. What Dryden seems to be saying, as an empirical hypothesis, is something like this:

There are some goalies who make brilliant saves that few others can, but also give up more weak goals. Those goalies are more valuable to bad teams, because bad teams give up more scoring chances where brilliant saves are required. It wouldn't make sense for a good team to pick up a goalie like that, because they'd get only the weak goals, but not the brilliant saves.

That's actually a pretty interesting theory! And it seems plausible. After all, what a team should care about is how many goals a guy allows, not how he looks doing it. A goalie with a 2.50 GAA is more valuable than a goalie with a 2.75 GAA, even if the first guy lets in more bad goals than the second guy.

But, is there any evidence for it?

Not in the book. Dryden gives us only those three examples of "bad team" goalies. Unfortunately, they played on bad teams for most of their careers.

Still, we have a few datapoints.

-- Gary Smith left Oakland (bad) to play two seasons for the Black Hawks (good) as Tony Esposito's backup. The first year, he was very good; the second year, he was mediocre.

-- Denis Herron moved from (bad) Pittsburgh to (good) Montreal (where he replaced Ken Dryden). Like Smith, he was great the first year, but not so great the second year.

-- Doug Favell was nothing special in his last season with Toronto (an average team). Then, he went to a below-average Colorado Rockies team, where it seems like he was pretty good. So, maybe that's a plus.

But, overall ... no real evidence either way, really. And Dryden doesn't give any examples of "good team" goalies, so there's nothing to check there.

But ... maybe here's something we can do.

Goalies play some of their games against good teams, and some against bad teams. If Dryden is correct, that Smith, Herron and Favell give up more bad goals but also make more spectacular saves, they should do better than expected against good teams, and worse than expected against bad teams. That's because they'll give up roughly the same amount of bad goals each way, but they'll make more brilliant saves against the good teams.

Does that make sense? Maybe we can find a way to check that.

Here's how that might work. In 1977-78, the Penguins, with Denis Herron as their regular goalie, gave up 321 goals, or 4.01 per game. That season, the best five teams (alphabetically) were the Bruins, Canadiens, Flyers, Islanders, and Sabres. The worst were the Barons, Blues, Canucks, Capitals, and North Stars.

From the game log, I manually calculated that against the five best teams, the Penguins gave up 5.25 goals per game. Against the five worst teams, they gave up 3.14.

For that to be evidence that the Penguins have "bad team" goalies, you'd have to show that 5.25 goals against good teams is actually better than expected for a team that gives up 4.01, and that 3.14 against bad teams is actually worse than expected for a team that gives up 4.01.

How would you do that? Well, one thing you could do is find a matching team, one that also gave up 4.01 goals per game (or close to it), but had a random goalie. If that team gave up 6.00 goals against the good teams, but 2.50 against the bad teams, that would be confirmatory evidence.

The match wouldn't be perfect, because it might have to be from another season, and the "best" and "worst" groups might not be comparable. Still, even with just those three goalies, you'd have about 33 seasons to compare (if you require a minimum 30 game season). If you found the two most comparable instead of one, that would be 66 comparisons.

That's better than nothing. But there'd still be a lot of noise. To make it workable, you'd have to limit your sample to games those goalies started (in 1977-78, Herron himself gave up only 3.57 goals per game, compared to 3.96 for the team after subtracting empty net goals). There'd be even less noise if you used save percentage instead of goals against.

The process would be a bit like searching for clutch hitting, only with a lot less data. And it would be a lot of work ... but, there's an organization, the Hockey Summary Project, that collects NHL game summaries -- like a hockey Retrosheet -- and I've asked them for access to their database. I'm hoping they have shots on goal. (Also, those summaries might help us trying to trace the games that Dryden talked about in his book, the ones that didn't match the game logs.)

Before I go any further, does this make sense as a way to test Dryden's hypothesis? Can you think of any others that might be easier?

Labels: , ,

Monday, December 19, 2011

Ken Dryden's "The Game"

Ken Dryden's book, "The Game," is considered one of the greatest ever in sports. Many times, I've heard it called "the best hockey book ever written," and that's the quote (unattributed) on the cover of my 1999 edition. The Canadian literary crowd loves it.

So, last week, I thought I'd read it. I was disappointed.

"The Game" takes place towards the end of the 1978-79 NHL season, Dryden's last before retirement. It was actually written later, based on his notes at the time, and published in 1983. It takes the form of a ten-day diary, although most of each day is taken up by Dryden's reminiscences and analyses, rather than the actual events of the particular day.

I'd argue that it's only tangentially a book about hockey. It's really a book about Ken Dryden. What he does, how he feels, what goes on in his dressing room, what he thinks of his teammates, and so forth. If you actually were hoping to learn something about hockey and how it works, there won't be much for you here ... except that you'll hear some stories about the personalities of teammates and coaches.

It doesn't talk much at all about strategy, or playmaking, or statistics, or how to win games. It deals a little more with personalities, and stories. But, mostly, it deals with Ken Dryden's feelings.

That may not sound too interesting ... except that Ken Dryden is very, very articulate. He can take the most common observation, and write paragraphs of poetry about it. Here he is talking about travelling to the Forum:

"I drive down side streets narrowed by drifts and snow-shrouded cars. Traffic is light today, and the few cars on the road move easily, unconcerned by the conditions. After the awkward caution of a winter's first snowfall, for Montreal drivers, like riding a bike, it all comes back, and slippery streets are driven as if bare and dry. I park several blocks away from the Forum and walk. The wind, gusting up Atwater Street, is bitter and cold, and, hunching over, I try to cover up, but can't. I start to jog, then run, faster as the wind bites harder. At de Maisonneuve, the light turns red but I continue across."

Well, OK, great, that's certainly a vivid picture of a windy Montreal winter's day. But the tone is a bit dramatic. It's an impressive feat of writing, and there's nothing wrong with being Shakesperean at times ... but Dryden does it *everywhere*, from the first page to the last. It started grating on me.

What would mitigate the tone, a bit, is if Dryden would talk more concretely about hockey. Sure, there are a few pages on Scotty Bowman's coaching style, and on Larry Robinson's history, and little profiles of some of the other players pop up occasionally. Finally, close to the end of the book, there's a bit of meat -- about 20 pages on the history of hockey, and how the game has suffered, and how rule changes can fix it. It's really good stuff, especially for 30 years ago, when that kind of analysis was rare. Moreover, with something more concrete to explain, Dryden allows some air to come out of the lofty prose, and it reads a lot better.

Still, even accounting for those 20 pages, the book is mostly Ken Dryden, sociologist and psychologist, observing his hockey team. It becomes a little weird because Dryden writes like he's not even there, like he's a psychiatrist floating above the dressing room taking notes.

I guess that's his thing. Ken Dryden is a lawyer; he famously took a year off from the NHL in 1973-74 to finish his law degree. He's intelligent and articulate, and it's almost like he figures there's no everyday scene that can't be made better if you try to analyze it dramatically:

"... Half-naked players move hurriedly about, laughing, shouting for tape (black or white, thick or thin), cotton, skate laces, gum, ammonia "sniffers," Q-Tips, toilet paper, and for trainers to get them faster than they can. It is the kid of unremitting noise that no one hears and everyone feels. But there is another level dialogue we can all hear. It is loud, invigorating, paced to the mood of the room, the product of wound-up bodies with wound-up minds. It's one line, a laugh, and get out of the way of the next guy -- "jock humor." It is like a "roast," the kind of intimate, indiscriminate carving that friends do to keep egos under control. Set in motion, it rebounds by word association, thought association, by "off the wall" anything association, just verbal reflex, whatever comes off your tongue, the more outrageous the better. Elections, murders, girl friends, body shapes, body parts, in the great Tonight Show / Saturday Night Live tradition, verbal slapstick dressed as worldly comment ..."

Nothing wrong with all that, but ... for a lot of the book, that's all there is. And, for all the analysis and introspection, you won't even feel like you know much abour Dryden, which is weird, because he writes a lot about himself: how he feels, and when he's more confident in net, and how he reacts to winning and losing, and his thoughts on retirement. But it's all detached, like he's trying to save you the trouble of understanding him yourself by doing the analysis for you.

There are a few occasions when he tries to say something about the game, something that a sports columnist would say, and it's almost a relief -- finally, something you can get your brain around! At one point, for instance, he talks about how he doesn't think the Canadiens are going to win this year (they eventually did, but that season was their last of four consecutive Stanley Cups). Why? Because, he says, he notices the team is complacent, spoiled by its own success, looking for the "big play" to win the game. Players "shoot from long range, safe from the punishment that goes with rebounds, deflections, screens, and goal-mouth tip-ins." Dryden says he himself cares less about winning, "content that goals appear as "good goals.""

I'm not sure I necessarily buy it 100%, but at least there's something concrete there ... you can try to figure out if it's true or false, or at least how you could study the issue. You think, hey, the guy played in the NHL for a few years, finally he's telling us what he learned!

But, that's as meaty as it gets before Dryden starts getting poetic again:

"I have felt it before in other years, but never so often and never with the same feeling, that if we lose, it will be because of us, no one else. It is not fun to feel a team break down, to find weakness where I always found strength; to discover the discipline and desire can go soft and complacent; to discover that we are not so different as we once thought; to realize that winning is the central card in a house of cards, and that without it, or with less of it, motivations that seemed pure and clear go cloudy, and personal qualities once noble and abundant turn on end; to realize that I am a part of that breakdown."

I can see why some people like this stuff, but ... well, I don't. It's just cotton candy, an exquisitely-worded paragraph that melts down to nothing when you try to figure out what it means. It's articulateness disembodied from communication, as if, when you say something beautifully and poetically enough, it doesn't matter that there's no content. Dryden produces this huge fog of articulateness that overwhelms you with feeling and the sense that something important has just been said, but ... there's not much there when you actually look.

I can't resist one more example. Here's Dryden analyzing Guy Lafleur:

"For there is a life there, and in destiny and romance there is no room for life. Painted as they are with broad brush strokes, vivid and lush, they find shape and pattern only with distance. The person who lives them is too close. He feels sweat as well as triumph. He understands what others see, but feels none of it himself."

Huh? Some people eat this stuff up, but I just don't get it.


Call me cynical, but I think that these things I think are weaknesses are actually why people like this book. It's about hockey, but not about hockey. It talks about "big issues" that people like to pretend they care about. There are long digressions into Quebec separatism and culture, and personal growth and emotion. It's articulate, it's educated, and it's erudite. It detaches the reader from the world of uneducated jocks, allowing them to identify and affiliate with someone they look up to. For some readers, it lets them signal to the world that *this* is the hockey they like, that they watch every Saturday night and talk about at the water cooler, the hockey that's deep and sociological and high-class.

"The Book" is a lot like a politician giving a speech, saying things so beautiful and eloquent and moving that you don't even notice that he's not saying anything concrete about what he'll do once elected. You wind up voting for the guy, not because of how he'll be able to run the country, but because you feel like you're voting for him as a person, good-looking and well-dressed and articulate.

But, no offense to Dryden ... it could be just me, that his book just isn't what I'm looking for, that the emotional stuff doesn't do anything for me. But ... whether it's me or not me, this is still not a book about hockey. It's a book about Ken Dryden, articulate hockey player. If you didn't know much about hockey before the book, you still won't know much about it after.

Of course, if you're a Montreal Canadiens fan, you'll love it ... the little stories about the team you love will be irresistible. In his chapter on Toronto, there are some of Dryden's little psychological observations on some of the Leafs of my childhood. I'm not sure if I really believe them, but I still ate them up, and I wished there were more.


It's kind of an aside, but the reminiscences in the book don't seem to be right. When I started writing this post, I figured I'd try to let you know which ten days Dryden's diary covered. It should have been easy: a ten-day stretch with the first game in Buffalo, and the last game against the Islanders. But the historical record doesn't jibe with Dryden's narrative. It's not just occasionally that the book is off, but almost all the way through.

This may be a little long ... if you don't like this boring "tracers" stuff, you can skip to the next section.

1. At the beginning of the book, Dryden writes "last night was the sixty-second game of my eighth season with the Montreal Canadiens." It was a game in Buffalo, where the Habs won and Dryden played well.

Here's a game log for the 1978-79 Canadiens, one I'll refer to many times in this post. From the log, it seems the 62nd game of the 1978-79 season was a home game against the Leafs, on March 1. Dryden might be referring to the February 18, game, which was a 5-2 win in Buffalo. Not that big a deal.

That game, Dryden writes, came "after a tie in Chicago and a Saturday loss at home to Minnesota." That doesn't work out. The Canadiens played two games in Chicago that year: a 4-1 loss on October 28, and a 5-3 victory on December 20. Neither of their two home games against the Black Hawks was a tie, and neither closely preceded a game in Buffalo. They did tie Chicago the year before, on Feburary 9, 1978, but that was a home game.

They did play Chicago at home on November 25, losing 8-3, making up for it with a 8-1 rout in Buffalo two games later. That's as close as I could find.

As for Minnesota, the Habs played them four times that year, winning three. The loss was a 4-3 game, but not at home, and not on a Saturday. It was on Wednesday, March 14, 1979 -- almost a month *after* the Sabres game that Dryden possibly describes.

2. That Buffalo game was on a Sunday, as it was described as "yesterday" in the "Monday" chapter. The next game occurs on "Wednesday," at Maple Leaf Gardens in Toronto. That could have been the aforementioned 62nd game that year, on March 1, except that it doesn't match. The Habs won it 2-1, but Dryden's description is 6-4.

It can't be that he just misremembers the score, because he describes the game in detail. After tying the game, Dryden writes, the Leafs get confident that they can keep up with the powerhouse Canadiens, and the Leafs begin to take over the play. But Mark Napier and Pat Hughes score two quick goals for the Habs. The Canadiens score two more, and then the Leafs get two late. The next day, the players wonder why coach Scotty Bowman didn't give them hell for allowing those two late goals.

That adds up to 6-4. There was no 6-4 win in Toronto in 1978-79. Also, I couldn't find a 6-4 win in Toronto in either of the two prior seasons.

There was a 6-3 win on February 3. I looked that game up in the February 5 Globe and Mail. It doesn't match Dryden's description: it went 1-0 for the Leafs, then 3-1 Habs, then 3-2, Then 4-2, then 6-2, then 6-3.

3. The next day, Dryden writes, the Habs fly to Boston, where they win 3-2 after coming back in the third period from a 2-1 deficit. But, according to the 1978-79 game logs, Montreal played only two regular season games in Boston, and tied both of them, 1-1 and 3-3.

There *is* a game that fits in the *following* season, February 10, 1980. It matches the 3-2 score, and the Habs coming back from 2-1 in the third period. But, of course, it can't be that game, because Dryden was then retired (Denis Herron was in net). Also, Dryden describes Larry Robinson tying the game early in the period on a power-play goal, and Mario Tremblay potting the winner. But the third period goals were actually scored by Mark Napier and Pierre Larouche -- the first goal coming at 10:03 -- and there were no power plays after the first period.

4. The next game is in Montreal, against Detroit. Scotty Bowman says, "we got 'em back in Detroit next week." The home-and-home timeline pins it down to the game of April 4 (the return game in Detroit would have been April 8). According to the game log, Montreal won the April 4 game by a score of 4-1.

But that can't be it. During the game, Dryden notices the Leafs vs. Flyers on the out of town scoreboard. Those two teams didn't play each other on April 4. But they did on March 3, when Detroit was also visiting Montreal. So now we have two candidates.

Dryden doesn't explicitly tell us the final score, but he tells us Montreal won. The book says it was 1-0 after the first period, 2-1 after the second, and, with five minutes left in the game, Guy Lapointe scored on a shot off Mario Tremblay's right knee to make it 3-1.

Montreal lost on March 3, so it couldn't be that one. On April 4, Montreal won 4-1, which looks promising -- perhaps an empty-net goal that Dryden didn't report.

But, nope, the rest of the details don't match. The April 4 game was 1-1 after one period, and 4-1 after the second. Jacques Lemaire had a hat trick, and Steve Shutt the other goal.

5. The next game, Dryden has Montreal 7, Philadelphia 3. Flyers goalie Bernie Parent was out with an eye injury that game, the book says.

Parent did indeed suffer a career-ending eye injury in 1978-79, which occurred on February 17. But, Montreal's last game against Philadelphia was January 29. It was indeed a 7-3 score -- but Bernie Parent was the starting goalie.

6. Next is a road game against the Islanders. The book has been foreshadowing that Islanders game from the beginning -- the Islanders were challenging Montreal for number 1 in the standings, and appeared to be the Canadiens' main rivals for the Cup that year. There are several mentions of that important Islanders game coming up, including one in the first few pages, which puts it nine or ten days after the Sabres game.

As far as score, all Dryden says is that the Canadiens lost. So it could be either of Montreal's two games on Long Island: February 27, by a 7-3 score, or October 17, by 3-1. The February 27 game is nine games after the original Buffalo game, so that must be it. Still, none of the games in between are the ones Dryden describes in the book.

7. Epilogue: "In the season's final game, we needed a tie against Detroit for first place, and we lost. The Islanders, waiting to be crowned, lost to the Rangers in the playoffs. And we won again."

Finally, this time it works out. Detroit beat Montreal 1-0 in the last game, and the playoffs match Dryden's recollection.


So, six out of seven cases don't check out. What happened? Perhaps instead of using an actual ten days out of the season, Dryden pieced together a composite. That might actually make sense. The book has lots to say about Toronto, where Dryden grew up. It has lots to say about Boston, where he went to school and had a major playoff victory in 1972. And it has lots to say about Philadelphia, which Dryden uses as his springboard for what's wrong with hockey.

He'd have liked to have a period that included all three cities. Maybe he just constructed one out of previous games, periods, and goals that he saw.


8. Finally, not a score thing, but: on page 71, Dryden describes Maple Leaf Gardens:

"The enormous Sportimer is gone; an even larger, more versatile scoreboard-clock, the kind you might find in any large arena, is in its place."

But: in 1979, Gardens sported the same, iconic clock that it had since 1966. It wasn't changed until 1982 -- after Dryden's retirement, but before his book was published. Or, maybe he's referring to the original Sportimer, the one before 1966, which he would have seen as a child.

Either way, Dryden's career went from 1971 to 1979, so he would have seen the exact same clock his entire career. So what's this all about? Perhaps he saw the new clock in 1982, before the book was published, and thought he also saw it back in 1979.


Reading the book made me think of how different Dryden is from Don Cherry.

Cherry, in one five-minute episode of Coach's Corner, will say more of substance than Dryden says in ten pages. But Cherry uses uneducated language, dresses funny, is passionate about the things he believes, and occasionally slips into political views that tend to be less accepted by "learned" people. So Dryden gets credited with "the best hockey book ever written," and Cherry gets scorn.

Coincidentally, Cherry has books of hockey stories too (as dictated to sportswriter Al Strachan); I just finished reading his second one. The styles, as you can imagine, are different. Dryden shrouds each locker room conversation in a cloud of profundity and mood; Cherry tells it in his straight-ahead style. But Cherry has something to say, and a point to make, and the stories are actually interesting. Dryden's got a couple of good ones -- like Steve Shutt urinating into a cup, adding Coke for color, and waiting for one of his teammates to come by and drink it -- but most of them are, well, not that engrossing. Like this one:

Amid the business of getting ready for practice, there is talk of beer.

"Calisse, you see the paper?" [Rejean] Houle moans. "Beer's goin' up sixty-five cents a case. Sixty-five cents!"

His words bring a grumble of memory.

"Shit, yeah," says a mocking voice, "the only thing should go up is what they pay fifteen-goal scorers, eh, Reggie?"

There is laughter this time. Across the room, Guy Lapointe stares at the ceiling, lost in thought. Suddenly he blurts, "That's it, that's it. No more drinkin'."

There is loud laughter.

"Hey, Pointu," Steve Shutt says, "ya just gotta learn to beat the system -- drink on the road."

That one, it seems safe to say, wouldn't have made it into Cherry's book. And, as far as actual hockey content goes, Cherry has Dryden beat by a mile. I opened Cherry's book to a random page, 74, where there's a thing about fighting. It's too long to quote entirely, but here's what Cherry tells me in five paragraphs:

-- There is no rush in the world like when you fight.

-- When players get older, they get a conscience and start to hesitate, and they have a tough time. That's why you see few older players fighting.

-- Fighters' hands suffer serious damage. "You wouldn't believe the hands on Joey Kocur ... it looks like he's had a ping-pong ball implanted under each knuckle."

-- The advent of helmets and visors have made hand injuries much more common, so fighters will remove their helmets as a show of respect. "I love it when I see one guy who's have trouble getting the strap loose on his helmet, and the other guy gives him time while he gets it off. ... That's honour! I love those guys."

No exaggeration: I learned more in that half-page than probably in 50 pages of Ken Dryden. Now, I'm not saying Don Cherry is always right, or even mostly right, or that you have to agree with him, or that he's particularly eloquent. But, he does try to tell you something about hockey. He may be wrong about some of the things he believes, like Joe Morgan or Harold Reynolds, or any other commentator in any other sport. But, geez, at least he says things about hockey!

As much as people love the Ken Drydens of the world, it's the Don Cherrys who actually can teach you things. I mean, they're not always right: for every non-Dryden who believes something correctly and passionately, there'll be many other non-Drydens who believe something different, incorrectly but just as passionately. So I'm not saying that when you find a Don Cherry, you should immediately become a mindless follower. I'm just saying that if you actually want to know how things are, you should start with the Cherrys and then go with your brain. Whatever the subject, hockey or otherwise, you'll never learn anything unless you stick with the people who are seriously trying to tell you something -- not just the people who sound the best.


No offense to Ken Dryden. I'm not saying that he doesn't know anything about hockey, just that he chose not to tell us too much about it. He actually wrote a pretty decent book, just one that's not as much to my taste as it could have been. And Dryden is under no obligation to write the kind of book that I like to read.

Still, "The Game" is nowhere near the best hockey book ever written. It's just the most poetic.

Labels: , ,

Friday, December 09, 2011

A "Grantland" article on Moneyball effects

Here's a baseball salary article at Grantland, by economists Tyler Cowen and Kevin Grier. It’s a strange one ... the impression I get is that is that the authors are just going on the basics of the "Moneyball" story, but don’t really follow baseball discussions very much. And so some of their arguments are obviously behind the curve.

For instance, they talk about how closers used to be paid inefficiently, but aren't any more, except by free-spending teams like New York:

"This year, the Yankees' Mariano Rivera was ranked fifth in total saves with 44. At a salary of $14.9 million, that works out to be a hefty $338,600 per save. The four closers ranked ahead of him averaged 46.5 saves and a salary of $2.9 million, or $63,771 per save — quite the bargain."

The problem here is obvious to almost any serious baseball fan: closers aren’t normally evaluated by the number of saves, which is mostly a function of the opportunities the team provides. Rather, and like any other member of the roster, the closer is paid according to how many wins he can contribute to the team's record, as compared to a replacement player. For Rivera to be worth $15 million, he has to contribute about three extra wins (at a going rate of $4.5 million per win). Which means, basically, he has to blow three fewer saves, given his opportunities. Or, rather, he has to be *expected* to blow three fewer saves; there's still a lot of randomness there.

But Cowen and Grier don't mention randomness at all. And their only reference to blown saves is in one sentence that mentions the Twins' Joe Nathan and Matt Capps, who blew 12 saves out of 41 opportunities.

Another thing, too, is that the article doesn't mention one big difference between Rivera and the others: Rivera is a free agent, while young players like Neftali Feliz can be paid whatever the team wants. The Yankees might prefer Feliz to Rivera, but that’s not a choice they have open to them.

It's not a new "Moneyball" discovery that "slaves" make less money than established free-agent stars ... but the article seems to imply that teams don’t realize that the $400,000 stopper can be just as valuable, for the money, as the $15,000,000 stopper.

To me, it looks like the problem is that if you don’t know baseball that well, you tend to overrate the “Moneyball” possibilities, because that’s the story that you’ve heard the most.


The authors then go on to say:

"The best-known Moneyball theory was that on-base percentage was an undervalued asset and sluggers were overvalued. At the time, protagonist Billy Beane was correct. Jahn Hakes and Skip Sauer showed this in a very good economics paper. From 1999 to 2003, on-base percentage was a significant predictor of wins, but not a very significant predictor of individual player salaries. That means players who draw a lot of walks were really cheap on the market, just as the movie narrates."

The authors imply that “walks were really cheap on the market,” means that the A’s had a huge hole to exploit.

But ... even if walks were indeed “really cheap,” it would still be a small hole. Walks are a significant part of a player’s value, but still in the sense of a small edge, not a huge one. Suppose teams valued walks at only half their actual value. If you can pick up a player with 60 walks, for the price of 30, you’ll gain about 10 runs, or one win. Not a big deal.

Of course, if you can do that nine times, that’s nine free wins. But the A’s didn’t. In 2002, they walked 609 times, third in the league. But that was only 157 more walks than Baltimore, second-worst in the league. If 157 was the number of walks they got at half-price, that’s still only two or three wins.

You could choose, instead, to compare the A’s to the 2002 Tigers, who walked only 363 times. That would be completely unrealistic, in my view, to assume the A’s would have been as bad as one of the worst recent teams ever. But if you do, you *still* only gain four wins.


The authors also put too much faith in the Hakes/Sauer paper. As I wrote a few years ago, it seems to me that the paper has a few problems, and I don’t think it shows what it purports to show.

The study found a huge increase in the correlation between salary and OBP between 2003 (when the "Moneyball" book was released) and 2004. The numbers for 2004 almost exactly matched the actual value of a walk, so the authors concluded that the market became efficient in the off-season, and teams wised up after reading the book..

But that conclusion doesn’t make sense. Since only a small percentage of players got new contracts between 2003 and 2004, for the overall average to move so much, the market would have had to overcompensate for walks by double, or triple their real value! That doesn’t sound like a reasonable possibility, and it’s certainly not consistent with GMs now learning to be efficient.


Finally, on the subject of correlation:

"Here's something funny about the Moneyball strategy: It is bringing us a world where payroll matters more and more. Spotting undervalued players boosts their salaries and makes money more important for the general manager; little did Billy Beane know that in the long run he would be strengthening the hand of the large home-market teams, such as the Yankees. From 1986 to 1993, payroll explained 2.2 percent of the variation in team winning percentage, and that meant spending more money yielded little return in terms of quality on the field. In the 2004 to 2006 seasons, after the Moneyball revolution was under way, payroll explained 27.1 percent of the variation in team winning percentage, which means a stronger reason to spend more."

I've written about this before, and Tango’s written about it several times: a higher r-squared does NOT necessarily mean money is more important in buying wins. Rather, the r-squared is a combination of:

1. the extent to which money can actually buy wins;
2. the extent to which teams differ in spending, in real-life.

When the authors say, "spending more money yielded little return," they seem to be assuming it’s all the first thing, when it might be all the second thing.

As an example, take dueling, where two people go out at dawn, draw weapons, and one of them kills the other. Back when it was legal, dueling would explain a lot of the variation in death rates of people who didn’t like each other. Now that it’s illegal, it explains zero.

However, the fact that the r-squared dropped doesn’t mean that dueling is any less dangerous than it used to be (point 1) -- it just means that people no longer vary in how often they get killed in duels (point 2).

The same thing could be happening here. I did a Google search and found an article (.pdf) that gives some team payroll data for the period the article covers. From Table 1, the article shows that from 1985 to 1990, fourth quartile teams (the 25% of teams with the highest payrolls) outspend the first quartile teams by only about 2 to 1. From 1998 to 2002, the ratio jumped to 3 to 1. The paper only covers to 2002, but a glance at later numbers seems to show around 2.5 to 1 (but up to 3.1 to 1 for the 2011 season).

This is evidence that at least *some* of the difference is probably caused by teams being willing to spend more.

I may be unfair to the authors here ... that might be partly what they’re saying. If I read them right, they’re saying that, armed with "Moneyball" concepts, teams are realizing they can buy wins cheaper by evaluating players more accurately (1) -- and, that teams are therefore more likely to vary in how much they pay when they know it’s money well spent (2).

But ... well, I think these effects are pretty small. As I argued, walks are a small part of the overall equation, even if they were undervalued by half (which itself is probably an overestimate). It’s not like, in 1990, teams were paying Jose Oquendo as much as Wade Boggs. To be sure, teams weren’t perfect in evaluating players -- but they were still reasonably good. Any improvement since then has to be relatively small, at the margins.

So, the idea that teams would say, "hey, we can now evaluate players slightly more accurately, so let’s go on a spending spree" doesn’t seem all that plausible.


What actually *did* happen to tighten the relationship between payroll and wins? As usual, you guys probably know better than I do. I’ll give you my guess anyway, which is that it’s a combination of a bunch of things:

1. It became more "socially acceptable" for teams to pay big money to free agents. Remember, 1985 to 1990 includes the collusion year, and there was probably a significant amount of pressure to keep spending down. That pressure was probably more significant in discouraging headline-grabbing salaries, rather than routine signings, so maybe a player who was twice as valuable wouldn’t be able to sign for twice as much. That would help keep the correlation between salary and success low.

2. When baseball revenues exploded, they grew more in some cities than others. That meant that marginal wins would be extremely valuable to the Yankees, but not so much to the Pirates. That increased the variation in team spending, which pushed up the r-squared.

3. Teams got smarter, in line with Cowen and Grier’s theory. But I think that was a small part of what happened. Also, I’d guess that a lot of improvement in that regard would have happened well before Moneyball, as Bill James’ discoveries got around a bit. Conventional wisdom denies that baseball executives put any faith in what Bill James had to say, but ... I dunno, good ideas tend to get noticed, even if people say they don’t believe in them. Also, Bill James’ ideas showed up early in arbitration hearings, which affected the teams’ bottom lines pretty much immediately.

4. Randomness. In a team payroll to wins regression, Cowen and Grier give an r-squared of .022 for 1986 to 1993.

(By the way, I assume Cowen and Grier's regression adjusted for payroll inflation ... salaries more than doubled between 1986 and 1993. If they didn't adjust, that might explain the low correlation.)

I wonder if that .022 might just be an outlier. Here are equivalent numbers from Berri/Schmidt/Brook in "The Wages of Wins," page 40:

Wages of Wins:

1988 to 1994: r-squared = .062, r = .25
1995 to 1999: r-squared = .325, r = .57
2000 to 2005: r-squared = .176, r = .42


1986 to 1993: r-squared = .022, r = .15

The numbers sure do move around a lot! It probably doesn’t take much to knock the correlation down: you need a few teams to get lucky in exceeding their talent, and a few teams to get lucky and get some good slaves and arbs. Maybe I’ll try a simulation and see how common a .022 might actually be.

Labels: , , , ,

Tuesday, December 06, 2011

Transparent studies are better, even if they're less rigorous

In 1985, Orioles manager Joe Altobelli claimed that power pitchers do better than finesse pitchers in cold weather. In the 1986 Baseball Abstract, Bill James wanted to check whether that was true.

So, he went through baseball history, and found sets of pitchers with exactly the same season W-L record, but where one was a power pitcher and one was a finesse pitcher. For example, Tom Seaver, a power pitcher, went 22-9 in 1975. He got paired with Lary Caldwell, a finesse pitcher, who went 22-9 in 1978.

Bill found 30 such pairs of pitchers. So, he had two groups of 30 pitchers, each with identical 539-345 (.610) records overall.

He compared the two groups in April, the cold-weather month. As he put it, "Altobelli was dead wrong. He couldn't have been more wrong." It turned out that the power pitchers were only 49-51 (.490) in April, while the finesse pitchers were 63-37 (.630). That's exactly opposite to what Altobelli had thought.


Nice study, right? I love it ... it's one of my favorites. But it wasn't very "sophisticated" in terms of methodology. For instance, It didn't use regression.

Should it? Well, I suppose you could make that argument. With regression, the study can be more precise. Bill James used only season W-L record in his dataset, but in your regression, you could add a lot more things: ERA, strikeouts, walks. You could include dummy variables for season, park, handedness, and lots of other things. And, of course, you wouldn't be limited to only 30 pairs of pitchers.

And you'd get a precise estimate for the effect, with confidence intervals and p-values.

But ... in my opinion, that would make it a WORSE study, not better. Why? One big reason: Bill's study is completely transparent, and understandable.

Take the four paragraphs above, and show them to a friend who doesn't know much about statistics. That person, if he's a baseball fan, will immediately understand how the study worked, what it showed, and what it means.

On the other hand, if you try to explain the results of a regression, he won't get it. Sure, you could explain the conclusions, and what the coefficient of walks and strikeouts mean, and so on. And he might believe you. But he won't really get it.

With the easy method, anyone can understand what the evidence means. With regression, they have to settle for understanding someone else's explanation of what the evidence means.

Reading Bill's study is like being an eyewitness to a crime. Reading the results of the regression is like hearing an expert witness testify what happened.


Now, you may object: why is that so important? After all, if it takes a sophisticated method to uncover the truth, well, that's what we have to do. Sure, it's nice if the guy on the street can be an eyewitness and understand the evidence, but that's not always possible. If we limited our research studies to methods that were intuitive to the layman, we'd never learn anything! Physics is difficult, but gives us cars and airplanes and electronics and nuclear energy. If it takes a little bit of effort and education to be able to do it, then that's the price we pay!

To which I have two responses. The first one is, that, actually, I agree. I'm not saying that we should limit our research to *only* studies that laymen understand better. I'm just saying that it's preferable *when it's possible*.

The second response, though, is: it's not just laymen I'm talking about. It's also sophisticated statisticians and sabermetricians. You, and me, and Tango, and Bill James, and JC Bradbury, and David Berri, and Brian Burke, and all those guys.

Because, the truth is, a regression study is not transparent to ANYBODY. I mean, I've read a lot of regressions in the last few years, and I can tell you, it takes a *lot* of work to figure out what's going on. There are a lot of details, and, even when the regression is simple, there's a lot of work do to in the interpretation.

For instance, a few paragraphs ago, I gave a little explanation of how a regression might work for Bill's study. And, suppose, making some numbers up, I get a coefficent of -.0007 per strikeout, and -.0003 per walk. What does that mean?

Well, you have to think about it. It means, for instance, that if Tom Seaver had 50 extra strikeouts and 20 extra walks than pitcher X, his April winning percentage will be .041 worse, all things being equal, than X. But ... it takes a while, and you have to do it in your head.

But wait! Not every pitcher in the study had the same number of innings pitched. So Tom Seaver's 50 extra strikeouts, do I have to convert that to a fixed number of innings? What did the study use? Now I have to go back and check.

And how do I interpret the results? Am I sure they really apply to all pitchers? I mean, suppose there's a pitcher like Seaver, who strikes out a lot of guys, but has better control, so he walks 30 fewer guys. Should I really assume that he'll do better in April than Seaver, since he's "less" of a power pitcher in that dimension?

Also, wait a sec. The regression included season W-L record, but that *includes* the April W-L record that it was trying to predict. That will throw off the results, won't it? Maybe the regression should have used only May to October. Or maybe it did? Now I have to go check that.

And what if it didn't, and there were a bunch of pitchers that pitched only one game after April? Will that throw off the results? If the confidence intervals suspect, are the coefficients suspect too?

I could go on ... there are a thousand issues that affect the interpretation of the regression's results. And it's impossible for any one person, even the best statistician in the world, to hold all thousand in his head at once.


You could now come back with another argument: "OK, the regression is harder to interpret, but we can just do that interpretation. Indeed, that's the duty of the researcher. When you write up a study, your duty isn't just to do a regression and report the results. It's to figure out what the results mean, and check that everything's been done right. Also, that's what the peer reviewers are for, and the journal editors."

To which the most obvious reply is: it's simply not true that peer reviewing makes sure everything is correct. There are loads and loads of problems with actual interpretation of regressions, some of which I've written about here. There was the paper that forgot to hold everything constant when interpreting a coefficient. There was the paper that decided that X picks ahead was worth zero, but drafting 2X picks ahead was significant. There was the paper that decided that a low "coefficient of determination" meant that the effect was insignificant, regardless of what the coefficient showed. And so on.

And those are the easy ones. I mean, sure, I'm no sophisticated peer reviewer with a Ph.D. who looks through a hundred of these a month, but I do have a decent basic idea of how regressions work and how baseball works. But, for some of these regressions, it took me a long time, measured in hours, to figure out what they actually were doing and (in many cases) why the results didn't really mean what the author thought they meant. It's not that you need the right kind of expertise ... it's that every case is different, and there's no formula. To figure out what a result actually means, you have to look at everything: where the data comes from, how it interacts, what is really being measured, what the coefficients mean, and, especially, if the model is realistic and if other models give different results.

As I've said before, regression is easy. Interpreting the regression is hard -- legitimately hard. And, unlike other hard problems, you don't know when or whether you've found the right answer. You can spend days looking at it, and you might still be missing something.


Which, of course, means that the simple method is more likely to be correct than the complicated method: if we understand a study, we're much more likely to spot its flaws. If Bill James did something dumb in how he compiled his data, most of us will catch it. If Joe Regressor does something wrong in his complicated regression, it's likely that nobody will see it (unless it's in a hard science, in which case a plane will crash or a patient will die or something).

And, of course, there's still the advantage that the simple study is easier to understand. That's an advantage even if the regression study is absolutely 100% correct. If you can see the answer in four paragraphs of arithmetic, that's better than if it takes ten pages of regression notation.

Which advantage is more important? Actually, it looks like the first one is more important ... but, you know, the second one can make a strong case.

That Bill James study that I mentioned earlier ... if you have a copy of the 1986 Abstract handy, you should go read it. It's on page 134. It's only two pages long, and easy reading, like most of Bill's prose.

If you've read it, I'd say that, right now, you KNOW the answer to Bill's question. I don't mean you know 100% that he's right, and what the answer is ... I'm saying that you know, almost 100%, the evidence and the logic. You may or may not think the results are conclusive -- for instance, I'd like to see a larger sample size -- but, either way, it's your own decision, not Bill's.

That is: even though Bill did the study, you instantly absorb the answer. You don't have to trust Bill about it. Well, you have to trust that he aggregated the data properly, and didn't cheat in what pitchers he chose. But you don't have to trust Bill's judgment, or Bill's knowledge of baseball, or Bill's interpretation. His interpretation will become yours: you'll see what he did, and understand it well enough that if someone challenges it, you can defend it. Bill's study is so transparent, that, after you read it, you understand the answer as well as Bill did, probably about as well as anyone can.

That's not true for a regression study. Often, for many readers (myself included), the explanation is impenetrable. The references for the methods refer you to textbooks that are hard to find and technical, and, usually, there's no discussion of what the results mean, other than just a surface reading of the coefficients. If you want to truly understand what's going on, you have to read the study, and read critically. Then you have to read it again, filling in the missing pieces. Then you have to look at the tables, and back to the text, then to the model. And then you still probably have to read it again.

And all this is assuming you already know something about how regression works. If you don't, you'll just have no idea.

So the difference is: if the study is simple, you know what it means. If the study is complicated, you don't know anything. You have to trust the person that wrote the study.

It's night and day: knowing versus not knowing. With the Bill James study, you know it's true. With a regression study, you just believe it's true.


To go a bit off topic ...

A lot of the factual things we think we know, and will passionately defend, we don't really know, except from what other people tell us. I bet everyone reading this has a strong opinion on creation vs. evolution, or whether global warming is real or a hoax, or whether 9/11 was or was not partly an inside job by the US government.

Take evolution. Let's suppose you believe in evolution, and you think creationism is not true, just wishful thinking by creationists. I think most of my friends fall into this category, and I've read surveys that say most Americans generally believe this.

But if that's you, can you really say that you KNOW it? You don't, probably. I bet you that for most of you (myself included), if someone asked you for examples of actual evidence for evolution, we'd have nothing. Seriously, zero. I couldn't even give a half-hearted attempt at a single sentence of why and how we know that evolution actually happened.

Sure, I believe evolution happened, but not because of any actual evidence. I believe it for secondhand reasons. I believe it happened because I know that scientists, serious researchers, have looked at the evidence, and that it's strong evidence. What I *do* think I know, from dealing (at arm's length) with teachers and authors and scientists and journalists, is that it's very, very unlikely that the worldwide supply of scientists, working independently and jockeying for discoveries and status and publications, and being so steeped in scientific method, and competing in an open marketplace of ideas, could have deluded themselves into misinterpreting so much evidence over so many decades.

So, you know, I can't say I know evolution is true. But now I DO know there is strong evidence that power pitchers do not outperform in April.

That's the beauty of the Bill James-type study, the one that lays it all out for you without using fancy techniques. It gives you a completely different feeling, the feeling that you actually know something, instead of just believing it from hearsay.

And that, to me, is an important part of what science is all about.

Labels: , , ,