Thursday, December 24, 2009

Do pitchers throw no worse in their bad starts than in their good starts?

Suppose your starting pitcher does great against the first nine batters he faces, getting them all out. You'd think he's got good stuff tonight, right? And that he'd do better than normal the rest of the game.

It turns out that's not the case. In a study I did nine years ago (.pdf, page 13), I found that when a pitcher had a no-hitter going through three, four, or five innings, his performance in the remainder of the game was almost exactly what you'd expect out of a pitcher of his general quality. The great early performance didn't signal a better performance later on.

The authors of "The Book" did a more comprehensive study (page 189), and came to the same conclusion. If a pitcher is excellent early, that doesn't give you any extra information on how he'll perform later.

What if the pitcher is really crappy his first couple of innings? My study showed that, again, there was no effect. After giving up three runs in the first inning, starters reverted to almost exactly their expected form in the remainder of the game. In this case, though, "The Book" found a different, and more intuitive result: pitchers were actually a little worse than usual after getting hammered: you would have expected them to give up a wOBA of .360, but they got hit for .378. That's statistically significant, but still fairly small.

Anyway, the point is: there seems to be a whole lot of luck involved in pitching. The pitchers who got hammered gave up a wOBA of .701 to their first nine batters, but only .378 afterwards. Now, I suppose you could argue that they were legitimately unskilled at first, so bad that the entire lineup hit better than Barry Bonds or Babe Ruth at their peak. But it's much more likely that they just had bad luck, that they were pretty much the same pitcher before and after, and just happened to throw pitches a bit worse than usual at first. Or, that the hitters were lucky enough to get good wood on the ball during that inning.

If you think about it, that's roughly how we think of hitters. When a career .250 hitter goes 3-for-4 one day, we praise him for what he did, and might even name him the MVP of the game. But, still, we don't assume that he somehow played better than usual. We don't say it in so many words, but we understand that every player has good games and bad games, lucky at-bats and unlucky at-bats, and 3-for-4 is not that unusually lucky. He was the same hitter as always, but happened to get better results that day.

For pitchers, on the other hand, we're a little bit less understanding that way. When a pitcher gets hammered, we usually talk about how he gave up bad pitches, or how he didn't have his control, or some such. It's rare that a pitcher will give up five runs in the first inning without everyone trying to figure out what's wrong.

But if you think about it for a bit, you'll understand there must be a lot of luck. If you've played simulation games, like APBA or Strat-O-Matic or Diamond Mind, you've experienced that the same pitcher can appear to pitch awesome, or get hammered, just by a few unlucky rolls of the dice. I think I found once that an average team has a standard deviation of 3 runs scored per game due to luck. If you assume that runs scored is normally distributed (which it's not, but doesn't affect the argument much), you'll see that a decent starter with an ERA of 4.00 would be expected to give up 10 runs in a complete game, one time a year, just by luck alone. (Of course, he'll likely be relieved long before he gets to 10 runs, but the point remains.)

Which brings me to the point of this post, which is an awesome pitching analysis by Nick Steiner, over at The Hardball Times. Steiner looked up PITCHf/x data for A.J. Burnett, during his good starts, and during his bad starts, and found ... pretty much *no difference*. If you haven't seen the article yet, you should go there now, scroll about halfway down, and look at the two scatterplots. They look almost the same to me. Also, the pitch selection bar graphs, at the bottom of the piece ... they look pretty much the same, too.

That's a bit different from what we were talking about before, where pitchers "got better" after a bad start. There, it was still possible that they threw worse pitches even though their "stuff" was just fine. But here, we're finding that not only was their skill level apparently not changed, but the *actual pitches* were the same.

That is, there's two different kinds of luck. First, there's the possibility that, even though your skill is fine, your pitches are a bit unlucky and don't quite work -- they don't hit the corners, or the specific curve ball hangs a little more than normal. Second, there's the possibility that, your pitches are just as good as any other day -- but you're unlucky enough that the batters just happened to hammer them.

It's the second kind of luck that we're talking about here. Of course, even with Steiner's data, we're not 100% sure it's luck. There are other things it could be, as some of the commenters have pointed out:

-- Mistake pitches. Maybe the difference between a good and bad outing is just the number of mistakes. From the charts, it would be easy to miss a few hanging curves, or fastballs down the middle.

-- Combinations. Maybe it's not just what kind of pitch, how fast it is, and how it breaks: maybe it's the timing of certain pitches relative to others. If it's hard to hit a curve ball after a fastball, and the pitcher doesn't choose that combination often enough, the batters will have better results.

-- Slight differences. Maybe a small difference in location makes a big difference in results. It could be that there *are* differences in those to scatterplots, but we can't pick them up with the naked eye.

-- Umpires. Part of the effect could be a different strike zone between the "good" days and the "bad" days -- the same pitch that was called a strike five days ago is called a ball today.

-- Differences that PITCHf/x doesn't tell you about. Maybe certain pitches are deceiving in ways that type, velocity, and spin don't capture: two pitches that look identical on paper might look very different to the batter.

All those things are possible, but they just don't seem very likely to me. The most plausible one, to me, is differences that aren't captured in the data. I'm not well-enough informed to know if that could be happening.

My feeling is that a lot of the luck comes from the batter side. The pitcher has time to plan and decide; the batter has very little time to react. If the batter has to "guess" what kind of pitch is coming, and roughly where it's going to cross the plate, that's inherently a random process. If the hitter is "waiting on a fastball," and he gets one, things are going to work out well for him. If it's a curve ball, not so much.

Doesn't it seem reasonable that a certain pitch might be a "good" pitch only in the sense of probability? Maybe a certain pitch is a strike 50% of the time, a ball 20% of the time, an out 20% of the time, and a hit 10% of the time. But, if instead of ten pitches going 5/2/2/1, one day they might go 5/2/1/2 -- only because the batter happened to guess right one extra time, and got good wood on the ball. That one extra hit is worth an average of more than three-quarters of a run. Depending on circumstances, it could be several runs. And not because anything the pitcher did differently, but just because the batter decided to wait on a curve ball instead of a slider.

Anyway, it shouldn't be too hard to check: find a bunch of pitchers of roughly the same ability. Figure out the variation of their results. Then, run a simulation, and check the variation of *those* results. If they're the same, you've just shown that pitchers perform like their own APBA cards, and that it's likely that almost all of what you see is randomness. If not, then the difference between the two variances (actually, the square root of the squares of the differences) is something other than luck.

What might that be? 20 years ago, we probably thought it was pitchers having better stuff some days, and not other days. But now, we know that's a fairly small effect, except (as "The Book" found) for inexperienced pitchers.

Up until a few days ago, we thought a lot of it might be pitchers having their stuff, but just getting unlucky and happening to throw bad pitches that day. But now, we seem to have evidence that that's not a big factor either. But even though it's not a *big* factor, it must have *some* effect. That's because we know that some pitches are easier to hit than others; there have been PITCHf/x studies that have shown, as expected, that pitches down the middle get hammered, and that certain levels of movement are easier or harder to hit than others. So it can't be that the pitches actually thrown make *no* difference.

But it does appear that the difference is small, at least compared to luck -- because, when we compare good games to bad games, the difference in the scatterplot of pitches is too small to notice!

What does that mean, in practical terms? It means that you shouldn't necessarily take out your starter just because he gives up a lot of runs, because he's likely just his usual self. That might be hard for some managers, when their ace gives up 7 runs in the first inning.

But managers already know that, perhaps. On "The Book" blog, Tangotiger found that A.J. Burnett threw almost as many pitches in his bad starts as in his good starts (101 vs. 105).

So what else do we learn? Well, from the experience of DIPS, it could turn out that Steiner has found a way to eliminate a lot of noise from a pitcher's record. If and when we can associate a firm run value to a specific pitch, based on type, speed, location, and spin, we might be able spot those pitchers who were unlucky: who threw good pitches, but were hit hard anyway.

That might be a ways off: it might be that there are things about the individual pitcher that go beyond those measurements that PITCHf/x makes, and it could be that the "mistake" pitches get lost in the scatterplot. But, as a starting point, I think teams would have at least a bit of an edge applying Steiner's conclusions anyway.

Labels: ,

Wednesday, December 16, 2009

The renowned neurosurgeon vs. Don Cherry

Dr. Charles Tator is a Toronto neurosurgeon who has treated many young people for concussions and spinal injuries. He's not just a doctor: he's an active medical research scientist with an impressive resume. He's also the founder of "Think First," a program that attempts to help prevent of concussions and spinal injuries by promoting the use of safe practices and proper equipment.

On Saturday, at a conference in Regina, Dr. Tator argued that hockey is too aggressive. He said that players are taking too many hits to the head, and that concussions are more frequent than they should be. He said NHL players used to have "respect for their own safety and respect for the safety of their opponents," but not any more.

That probably wouldn't have made the news, except that Tator went on to blame Don Cherry for contributing to the problem. Cherry, a plain-spoken (and often controversial) commentator on "Hockey Night In Canada," advocates an agressive style of hockey, and has created a series of "Rock 'Em Sock 'Em" videos that prominently feature NHL fights and spectacular checks.

"I think [Cherry] is a negative influence because he promotes aggressive hockey," Tator said.

I'm on Don Cherry's side. I think Tator is out of line.

The reason is: this is not a medical question. The general effects of concussions and spinal injuries are not expert or specialized knowledge: we all understand what it means to have a concussion, or to be paralyzed. We read about these kinds of injuries in the sports pages all the time, about Tim Tebow, or Brett and Eric Lindros, or Kevin Everett, or any number of other players. We laymen understand that less aggressive play will lead to fewer injuries.

We laymen understand other parts of the issue as well. We know that we could eliminate hockey injuries completely if we eliminated hockey. If that's too extreme, we know we could eliminate a very large proportion of serious injuries by banning bodychecking, where every NHL game is as contact-free as the All-Star game. There are lots of other ways we could reduce injuries, too. We could make the puck a little softer -- that might have prevented Trent McCleary's frightening injury. We could force goalies to wear a Kevlar neck brace (which would have saved Clint Malarchuk from his near-death experience), limiting their mobility but increasing their safety.

That's not a question of medical expertise. It's a question of tradeoffs. It's a question of, how much injury are we willing to put up with while keeping the quality of game where we think it should be? Or, put another way: how much are we willing to lose in the quality of or interest in the game in order to save a certain number of injuries per year?

It would be nice if there were no tradeoff at all, and, indeed, some people might say that banning aggressive play will make the game *better*, not worse. But if that were the case, the NHL would have done it already, and there would be no controversy. There are already rules banning many dangerous practices, rules that have a very strong consensus of approval in and out of the game. In any case, even in the unlikely event that Tator believes that there is absolutely no cost to implementing his view of what should be allowed in the NHL, that's an opinion about hockey, not about medicine.

Which is my argument, in one sentence: the issue is about hockey, not about medicine. Is there any reason to believe that Dr. Tator's opinion about what's good for hockey is more valid than Don Cherry's? Absolutely not. Certainly, Dr. Tator might have expertise on what kind of hits cause what kind of injuries. But in terms of whether the tradeoff is desirable, and what the rules of hockey should be ... well, on that score, there is a strong argument to be made that Tator is biased, much more biased than Cherry.

If you're Dr. Tator, what is your experience with spinal injuries? Pretty direct. You see many, many victims of sports injuries, some of whom are very badly hurt. Every day, you see their despair and their pain, and you identify with them, and try to help them as best you can. Sometimes, there's nothing you can do. You, and the victim, constantly reflect on what could have been. If only the opposition had been a little more careful; if only the game hadn't gotten just a little too chippy in the third period; if only the players had played in a non-contact league. Then, everything would have been fine.

It's easy to understand why victims and doctors are so concerned with averting as many future injuries as possible, and why some of them, like Tator, start organizations to promote education and prevention. Every day they see the costs, intimately and emotionally. We, the fans, do not. We may feel bad for Kevin Everett and Eric Lindros, but we quickly forget and move on. Dr. Tator cannot do so quite as easily.

But what about the benefits? Dr. Tator doesn't see them nearly as much as he sees the costs. There are literally millions of North Americans who play hockey, without incident. There are millions of us who watch hockey, and many of us see aggressive play as one of the fundamental characteristics of what makes the game great. Suppose reducing body contact will save 20 injuries a year, but reduce fan interest by 5%. Do the benefits outweigh the risks? Dr. Tator will have to deal with some of the 20 injuries, but won't be one of the 5% who lose a little bit of interest in the game.

I don't play ice hockey, but I play ball hockey four times a week. Two weeks ago, one of our players got hit in the hand by a stick. His right index finger was broken in three places; he'll be unable to use it for a month. If you're a doctor, an indexfingerologist, and you see three of these broken fingers a day, wouldn't it be easy for you to get the idea that floor hockey is dangerous, and should be banned -- or at least that everyone should have to wear gloves? But we, the participants, just kept playing. We understand there's a risk of getting our finger broken, or worse, and we accept that risk. Floor hockey is fun, and the risk seems quite reasonable compared to the benefits. My life would be a lot less interesting without ball hockey -- it's my main source of recreation and exercise, and a good part of my social life. If the doctor sees only the thousand broken fingers, but not the millions of happy players who get all these benefits, while knowing about the danger and being willing to put up with it, isn't it the doctor who's seeing things the wrong way?

I recall a few years back, there was another chapter in the ongoing debate on whether motorcycle helmets should be compulsory. A doctor wrote something like: "if you don't think helmets should be mandatory, it's because you're not a doctor dealing with the victims every day. You should come to the emergency room and see how mangled these riders' heads are when their cranium hits the pavement. If you saw a few of these, you'd change your mind."

And that, frankly, is a bulls**t argument. You don't need to see the blood and guts to understand that the accident killed the victim, or turned him into a vegetable. You don't need to have a medical degree. You need to carefully weigh the risks and benefits, and come up with an argument. You need to study the issue. If helmets saved one life a year, it wouldn't be worth it: you could take all the money spent on helmets, use it to buy medical tests, and probably save hundreds of lives. On the other hand, if helmets saved a hundred thousand people a year, then, yeah, there's an argument for requiring them. But "blood and guts are disgusting and tragic, therefore helmets should be banned," is not a reasonable argument. It's an argument from a misguided person who thinks the fact that he treats the victims gives him a special moral insight into what risks society should tolerate and what risks it shouldn't.

To his credit, Dr. Tator doesn't make such an argument, but the idea is roughly the same. The argument has to be one of costs vs. benefits, and Dr. Tator's involvement with the victims, no matter how expert, charitable, and concerned, makes him likely to be *more* biased and *less* credibile in analyzing the issue. Of course, he could make an argument, with numbers and logic, to show us that he has used valid analysis to overcome the possibility of bias. If he's done that (and the press didn't report that he did), I'm absolutely willing to look at it.

Now, you could argue that for the NHL, the fans are biased just as much, in the other direction. After all, in ball hockey, the injury is mine. In the NHL, the injury is a stranger's. As a fan, I get all the benefits, the entertainment value of fights and bodychecks, and it's the players who pay the price. If I'm willing to criticize the doctor for overemphasizing the costs, shouldn't I also criticize myself for *underestimating* the cost?

Well, yes and no. From an economic standpoint, we fans *are* bearing the costs: more enthusiastic fans creates more revenue for the league, which means the players get paid more, and therefore compensated for the risks inherent in the kind of hockey we demand. On the other hand, Dr. Tator doesn't have to pay anything for his demand that hockey get less aggressive. He gets all the benefits, in terms of having to give bad news to fewer victims of spinal injuries, but pays a very small portion of the costs, being only one hockey fan out of millions (and perhaps not liking aggressive play in the first place).

But never mind that argument. Suppose we ignore that we fans are compensating the players for their preference, and we assume that we are so insulated from the reality of career-ending injuries that we're too biased against safety, and are demanding more than the "optimal" amount of hockey aggressiveness. Then we're biased one way, and Dr. Tator is biased the other way.

So it seems like, so far, everyone is biased. What we need is an opinion from someone not so far from the fans and the game, and someone not so far from the victims of injury. Someone who is intimately familiar with both the costs and the benefits.

That's Don Cherry, isn't it? Cherry may have strong (and sometimes controversial) political views, he is often politically incorrect, and it seems to me that he's disliked by many who don't like his blunt style and uneducated way of speaking. But in my (admittedly untested) opinion, he is one of the foremost experts on NHL hockey anywhere. He has a huge fan following, and, more importantly, inordinate respect from the players. He is not in favor of recklessness on the ice; he's spoken out many times against aspects of hockey he thinks are dangerous. For years, he's waged a campaign in favor of "no-touch" icing. Every year, he shows an video, quite unpleasant to watch, of injuries incurred by players chasing each other after an iced puck, and rants against the stupidity of the league for not changing the rule. He's been active in injury prevention in youth hockey, promoting the "STOP" program to help prevent hits from behind. I don't follow Cherry as much as some others, but I have never heard him condone dangerous play.

Cherry has been an NHL coach, and he's active in the league's social circles. He's certainly seen many of his player friends and acquaintances felled by injury, unlike most of us fans, which means that, like Dr. Tator, he has first-hand experience with the costs of aggressive play. But he knows the game, and he knows, intimately, the risks involved. He intuitively knows what types of "aggressive" play are risky, and which ones are not. He has opinions, probably as good as anyone's, with credentials as good as anyone's, on what types of aggression are good for the game, and which ones are not.

Intuition is no substitute for a well-formed argument, backed by evidence and logical argument and measurement of costs and benefits and risks. But if you asked me whose gut argument I would want to hear first, it would be Don Cherry's. It doesn't matter if you're one of Canada's foremost experts in spinal injury treatment and prevention, because the question is, what's the proper balance between aggressiveness and risk? That's a question of opinion, not of science.

Dr. Tator may well be right, that hockey is too aggressive and therefore too dangerous. But until he comes up with numbers and arguments and a way to measure the tradeoffs, I am less inclined, not more, to take his word for it simply on account of his profession.


Sunday, December 13, 2009

Did Tim Donaghy really win 70% of his bets against the spread?

According to disgraced NBA referee Tim Donaghy’s new book, Donaghy won 70 to 80 percent of his NBA bets. (The link is to an excellent article by TrueHoop's Henry Abbott, which I recommend highly.)

70 to 80 percent is huge: these were bets against the spread, so you’d have expected that Donaghy's winning percentage would only be around 50 percent, unless he had some kind of edge.

What was his edge? Did he fix the outcomes of games with biased refereeing? Donaghy says no: the way he won so many games was by knowing which *other* referees were biased. Not corruptly biased, mind you, sometimes just unconciously biased. He says,

"I listened to the directives from the NBA office, I considered the vendettas and grudges referees had against certain players or coaches, and I focused in on the special relationships that routinely influenced the action on the court. Throw in some quirks and predictable tendencies of veteran referees and the recipe was complete. All I had to do was call it in and let the law of averages take over. During the regular season, I was right on the money seven out of 10 times. There was even a streak when I simply couldn’t miss, picking 15 winners out of 16 games. No one on the planet could be that lucky. Of course, luck had little to do with it."

Does that make sense? I don't think it does. I don't think that even perfect knowledge of the tendencies of referees can get you a winning percentage of .700.

According to basketball researcher Wayne Winston, a reasonable pythagorean exponent for basketball is 14. (That's from his book "Mathletics," which I've been planning to review for a while now -- I'll do it this week, I swear.) To increase your winning percentage from .500 to .700, then, requires an extra 6% of points, or about 6 points in a 100-point game.

Six points is huge. It's twice what home field advantage is worth. If you assume that teams score an average one point per possession when they're not fouled, but 1.5 points per two-shot foul, it would take 24 extra foul shots (12 fouls) in a game to account for six points.

Teams shoot about 25 foul shots a game on average. 24 extra foul shots would basically double the number of foul calls per game. If you assume that each of the three referees call 8 foul shots each, one biased referee would have to call *four times as many* foul shots for one team to raise its winning percentage to .700.

Of course, the biased ref could also *refrain* from calling foul shots for the other team ... but, with only 8 shots called per game per ref, there's a natural limit to what you can *not* call.

It just doesn't seem at all plausible that one "vendetta" or "special relationship" could have such a large effect. Especially considering that the examples Donaghy provides are pretty weak. For instance:

"Referee Joe Crawford had a grandson who idolized [Allen] Iverson," writes Donaghy. "I once saw Crawford bring the boy out of the stands and onto the floor during warm-ups to meet the superstar. Iverson and Crawford’s grandson were standing there, shaking hands, smiling, talking about all kinds of things. If Joe Crawford was on the court, I was pretty sure Iverson’s team would win or at least cover the spread."

Doesn't that sound pretty much impossible? First, could any professional referee call an extra 24 foul shots a game for Allen Iverson without drawing some kind of attention? And, second, isn't it completely implausible that anyone could rise to the ranks of NBA referee with judgment so bad that he would be *that biased* in favor of his grandson's idol?

Anyway, we don't have to take Donaghy's word for it: we can check the records. Actually, ESPN's Abbott already checked. It turns out that Donaghy's accusations are completely false. With Crawford refereeing, Iverson's teams went 5-9 against the spread. That's .357, not .700.

Abbott checks a few other of Donaghy's claims, and references others who have done similar checks. The bottom line: absolutely no evidence of any bias at all, much less the kind of bias that would let you pick 70% winners.

That leaves at least three possibilities:

1. Donaghy was not specific enough in describing the circumstances in which he knew he had a .700 chance of winning. Maybe, for instance, that only happened when Joe Crawford's grandson was actually at the game, rather than watching at home.

2. There was other information Donaghy used to make his picks, not just his knowledge of referee bias. As he said: "There were other factors that came into play. Inside information about injuries. Home game or away game. Home crowd. Many more factors to take into consideration."

3. Donaghy himself rigged the games in order to win his bets.

4. Donaghy didn't actually win 70% of his bets.

Number 1 still doesn't seem very plausible: as I argued, it's very hard for a referee to make a team lose 20% of the games it otherwise would have won, and make it look natural. This is especially the case if the grudge the referee holds is against a player, and not a team: can anyone really cause Allen Iverson to lose 6 points a game, without making it obvious?

Number 2 is implausible too: there are thousands of bookies and gamblers analyzing basketball much more thoroughly than Donaghy did. If most of the information he claims to have used was public, the betting line would have adjusted for those factors already.

Number 3 is implausible, for similar reasons. Actually, it's a bit more plausible than number 1, because, for one thing, Donaghy would care about the team, not any individual player, so he could spread out his biased calls. Secondly, he could concentrate his fixes in close games, so that it may not take 6 points a game, but perhaps only 1 or 2 points in a game that's tied in the last minute.

But, to me, Number 4 is the most plausible. It does require you to assume that Donaghy and the FBI are incorrect about the results of the wagers: but if Donaghy can be so wrong about the results of his strategies (which can be verified), why can't he also be wrong about the results of his bets (which cannot)?

Anyway, there would be easier ways to figure this stuff out, if there were a list of games that Donaghy bet on. Just check those games, and the betting lines, and see if there was anything unusual there, either by Donaghy himself, or the other referees at the game. But, according to the article, there is no such list. It seems like the NBA and FBI are taking Donaghy's word for how big his bets were ($2000 each) and how many he made (more than 125 over four seasons).

In that case, isn't it more plausible that the $100,000 Donaghy made came from sources other than winning 70% of his $2,000 bets? Maybe he won the lottery, or he was peddling confidential information about Tiger Woods, or he was selling illegal MRIs to Canadian patients with sore knees. Any of those seem more plausible than being able to go .700 against the spread.

In any case, I'd be willing to put Donaghy to the test. He says he can pick 70% winners. I think he can pick 50% winners. Let's set the bar at 60%. I'm willing to bet him even money that he can't go better than .600 in his choice of 30 NBA games this season.

Labels: , , , ,

Thursday, December 10, 2009

The Bradbury aging study, re-explained (Part III)

Last week, J.C. Bradbury posted a response to my previous posts on his aging study.

Before I reply, I should say that I found a small error in my attempt to reproduce Bradbury’s regression. The conclusions are unaffected. Details are in small print below, if you're interested. If not, skip on by.

As it turns out, when I was computing the hitters age to include in the regression, I accidentally switched the month and year. (Apparently, that wasn’t a problem when the reverse date was invalid – Visual Basic was smart enough to figure out that when I said 20/5 instead of 5/20, I meant the 20th day of May and not the 5th day of Schmidtember. But when the reverse date was valid – 2/3 instead of 3/2 -- it used the incorrect date.)

That means that some ages were wrong, and some seasons from 24-year-olds were left out of my study. I reran a corrected regression, and the results were very, very similar – all three peak ages I’ve recalculated so far were within .08 years of the original. So the conclusions still hold. If you’re interested in the (slightly) revised numbers, let me know and I’ll post them when I’m done rerunning everything.

Okay, now to Bradbury’s criticisms. I’ll concentrate on the most important ones, since a lot of this stuff has been discussed already.


First, there’s one point on which I agree with Bradbury’s critique. He writes,

" … the model, as he defines it, is impossible to estimate. He cannot have done what he claims to have done. Including the mean career performance and player dummies creates linear dependence as a player’s career performance does not change over time, which means separate coefficients cannot be calculated for both the dummies and career performance. … Something is going on here, but I’m not sure what it is."

He’s right: having both the player dummies and the career mean causes collinearity, which I eliminated by getting rid of one of the player dummies. I agree with him that the results aren’t meaningful this way. I should have eliminated the mean and gone with the dummies alone.

In any case, it doesn’t matter much: the results are similar with and without the dummies. The reason I used the dummies is that it made the results make more sense, and more consistent with what Bradbury found. It turns out that without the dummies, some of the aging curves were very, very flat. By including the dummies, the curves were closer to what Bradbury found.

In retrospect, the reason the curves make more sense with the larger model is that the dummies have the effect of eliminating any observation of only one season (since the dummy will come out to have that player match whatever curve best fits the other, more-than-one-season, players).

Regardless, the peak age is similar either way. But Bradbury’s point is well-taken.


Secondly, Bradbury disagrees with me that players are weighted by the number of seasons they played:

"His belief is based on a misunderstanding of how least-squares generates the estimates to calculate the peak. There is no average calculated from each player, and especially not from counting multiple observations for players who play more."

It’s possible I’m misunderstanding something, but I don’t think I am. The model specifies one row in the regression for each player-season that qualifies (player with a certain number of PA and seasons). If player A has a 12-year career that peaks at 30, and player B has a 6-year career that peaks at 27, then player A’s trajectory is represented by 12 rows in the regression matrix, and player B’s trajectory by 5 rows.

Bradbury would argue that the scenario above would result in a peak around 28.5 (the average of the two players). I would argue that the peak would be around 28 (player A weighted twice as heavily as player B). I suppose I could do a little experiment to check that, but that’s how it seems to me.


Thirdly, Bradbury says I misunderstood that he used rate statistics for home runs, not actual numbers of home runs:

"I’m estimating home-run rates, not raw home runs. All other stats are estimated as rates except linear weights. This is stated in the paper."

Right, that’s true, but that wasn’t my point. I was probably unclear in my original.

What I was trying to say was: the model assumes that all players improve and decline at the same fixed HR rate, regardless of where they started.

So, suppose Bradbury’s equation says that players drop by .01 home run per PA (or AB) the year after age X. (That’s 6 HR per 600 PA.) That equation does NOT depend on how good a home run hitter that player was before. That is: it predicts that Barry Bonds will drop by 6 HR per 600PA, but, also, Juan Pierre will drop by 6 HR per 600PA.

As I pointed out, that doesn’t really make sense, because Juan Pierre never hit 6 HR per 600PA in the first place, much less late in his career! The model thus predicts that he will drop to a *negative* home run rate.

I continue to argue that while the curve might make sense for the *composite* player in Bradbury’s sample, it doesn’t make sense for non-average players like Bonds or Pierre. That might be lost on readers who look at Bradbury’s chart and see the decline from aging expressed as a *percentage* of the peak, rather than a subtraction from the peak.


Finally, and most importantly, one of Bradbury’s examples illustrates my main criticism of the method. Bradbury cites Marcus Giles. Giles’s best seasons were at age 25 to 27, but he declined steeply and was out of the league by 30. Bradbury:

"What caused Giles to decline? Maybe he had some good luck early on, maybe his performance-enhancing drugs were taken away, or possibly several bizarre injuries took their toll on his body. It’s not really relevant, but I think of Giles’s career as quite odd, and I imagine that many players who play between 3,000 — 5,000 plate appearances (or less) have similar declines in their performances that cause them to leave the league. I’ve never heard anyone argue that what happened to Giles was aging."

Bradbury’s argument is a bit of a circular one. It goes something like:

-- The regression method shows a peak age of 29.
-- Marcus Giles didn’t peak at 29 – indeed, he was out of the league at 29.
-- Therefore, his decline couldn’t have been due to aging!

I don’t understand why Bradbury would assume that Giles’ decline wasn’t due to aging. If the decline came at, say, 35 instead of 28, there would be no reason to suspect injuries or PEDs as the cause of the decline. So why couldn’t Giles just be an early ager? Why can’t different players age at different rates? Why is a peak age of 25, instead of 29, so implausible that you don’t include it in the study?

It’s like … suppose you want to find the average age when a person gets so old they have to go to a nursing home. And suppose you look only at people who were still alive at age 100. Well, obviously, they’re going to have gone to a nursing home late in life, right? Hardly anyone is sick enough to need a nursing home at 60, but then healthy enough to survive in the nursing home for 40 years. So you might find that the average 100-year-old went into a nursing home at 93.

But that way of looking at it doesn't make sense: you and I both know that the average person who goes into a nursing home is a lot younger than 93.

But what Bradbury is saying is, "well, those people who went into a nursing home at age 65 and died at 70 … they must have been very ill to need a nursing home at 65. So they’re not relevant to my study, because they didn’t go in because of aging – they went in because of illness. And I’m not studying illness, I’m studying aging."

That one difference between us is pretty much my main argument against the findings of the study. I say that if you omit players like Giles, who peaked early, then *of course* you’re going to come up with a higher peak age!

Bradbury, on the other hand, thinks that if you include players like Giles, you’re biasing the sample too low, because it’s obvious that players who come and go young aren’t actually showing "aging" as he defines it. But, first, I don’t think it’s obvious, and, second, if you do that, you’re no longer able to use your results to predict the future of a 26-year-old player. Because, after all, he could turn out to be a Marcus Giles, and your study ignores that possibility!

All you can tell a GM is, "well, if the guy turns out not to be a Marcus Giles, and he doesn’t lose his skill at age 31 or 33 or 34, and he turns out to play in the major leagues until age 35, you’ll find, in retrospect, that he was at his peak at age 29." That’s something, but … so what?

I’m certainly willing to agree that if you look at players who were still "alive" in MLB at age 35, and played for at least 10 years, then, in retrospect, those players peaked at around 29. And I think Bradbury’s method does indeed show that. But if you look at *all* players, not just the ones who aged most gracefully, you’ll find the peak is a lot lower. There are a lot of people in nursing homes at age 70, even if Bradbury doesn't consider it’s because of "aging."

Labels: ,

Sunday, December 06, 2009

Bloomberg enters the baseball analysis market

Bloomberg, the company that provides investors with software to provide sophisticated real-time information on stock and financial markets, now has software to provide GMs with baseball information.

The New York Times article describing the new system is sketchy in explaining what kind of information will be provided, but my impression is that the breakthrough is in ease of use, rather than sabermetric sophistication:

"The challenge for Bloomberg is to create software that is better, faster and more visually useful than what rivals offer to help develop players and predict their performances. A demonstration of Bloomberg’s software showed dazzlingly colorful graphics and an easy way to plot statistics and compare players in complex combinations."

Not there's anything wrong with ease of use ... it's not a lot of fun to calculate player values yourself, or even to get your team of programmers to do it, if there's something available off the shelf.

But at the same time, there's a hint that the software will adjust for park effects, and maybe even do simulations:

"For Jeff Wilpon, the chief operating officer of the Mets, the value in the software will be in evaluating free agents.

"If you take X player on another team who’s around a great cast of players," he said, "we want to look at him in our ballpark with different players around him to see how he will fit in."

In addition, it'll include PitchF/X data:

"What looks impressive are highly visual pitch charts that can be summoned for any particular period, with parameters including arm angles that can, based on diminishing performance, suggest physical injury."

But Bloomberg also makes it sound like a friendly database query engine -- in effect, a version of Baseball Reference's "Play Index":

"It’s one thing to say, I want to see how various players hit home runs over the years," said Bill Squadron, who is managing the product introduction. "But it’s another to say, I want to see home runs, on-base percentage, pitches per plate appearance, take it all together and look at 10 guys who exceed a certain level."

Maybe it's all these things together.

It would be interesting to note what kind of sabermetric analysis is included in the system ... will free-agent evaluation include a version of WAR? Will it include estimates of dollars per win? Will Bloomberg have evaluated the various run estimators and chosen the best one? Will the Bloomberg algorithms become "conventional wisdom?" If so, will it be possible for some teams to gain an advantage by taking advantage of flaws in Bloomberg's analysis?

My guess is that if teams start using this system, and it does include some of the newer developments, we'll know about it because team management will start internalizing it. It's easy to ignore analysis from bloggers, but harder to ignore analysis from an expensive and sophisticated system from a respected name like Bloomberg, especially when the owners have spent thousands of dollars to provide it for you.

Labels: ,