Sunday, July 22, 2007

Are MLB player skills normally distributed?

There's a paper in the most recent JQAS on evaluating outfielder and cather throwing arms. It's called "Evaluating Throwing Ability in Baseball," by Matthew Carruth and Shane Jensen.

I'm still going through it, but the first thing that struck me was that the authors assumed that throwing skills in the baseball population are normally distributed.

That seemed to be wrong. In the 1985 Abstract (Blue Jays comment, page 113), Bill James argued, convincingly, that major-league player skills should be shaped not like the normal bell curve, but like the right tail of that curve. The general population is shaped like the normal distribution, but only the very best players make the majors. Those are the ones at the extreme right tail, which doesn't have a bell-curve shape at all.

So shouldn't outfielder and catcher arms also be shaped like the right tail? If they are, the Carruth/Jensen study shouldn't work at all.

After thinking about it a bit, I wonder if individual skills might not be close to normal after all.

Suppose a player's overall talent was the sum of 1,000 independent skills, each of which is normally distributed in the population. Wouldn't each of those skills be almost normally distributed in MLB? Consider skill number 502. If you were average in skill 502, it would barely affect your chance of making the majors – you'd have 999 other skills to be good at. And so if you took a MLB-wide profile of skill 502, it would barely be different from the general population. The same, of course, would be true for each of the other skills. Each skill would look very close to normal, but the *sum* of all skills would be shaped like Bill James' right tail.

Of course, the example of 1000 different skills is farfetched. But, it turns out, the "right tail" disappears much sooner than 1000. It goes away also with a much more realistic model of baseball skills.

Suppose there are three basic skills: hitting, range, and throwing. All three are normally distributed in the general population with mean 100 and SD 16 (which I think puts them on the same scale as IQ). All are independent. A player's overall value is the sum of 70% of his hitting score, 20% of range, and 10% of arm. Any player with an overall score of 146 or more makes the majors.

Under those conditions, you'd still expect the MLB distribution of overall score to be shaped like the right-tail of the normal distribution. But what about the three individual skills? Will they be right-tail shaped, or bell shaped? Probably batting, being 70% of the total, will be tail shaped. But what about range? And arm?

To check, I ran a simulation. I created a general population of 2,000,000 baseball players (I would have used more, but the VB random number generator apparently started repeating). Of those two million, only 87 players made the majors.

I then plotted graphs for those 87 players. As you would expect – and as the simulation forced – the overall rating of those 87 players looked like the right tail. 29 players scored at 146. Only 18 scored at 147. 12 scored at 148. 9 were at 149, 11 at 150, and then they trickled off, so that only 8 were between 151 and 161. Definitely a right-tail picture.

That was the overall rating. So I then plotted just batting. And what happened? The distribution looked more like a normal curve! Even though batting comprised 70% of the "right tail" rating, that 70% was bell shaped!

(I'm not good enough with HTML to show the curve here, but I've posted it on my website. Take a look

The other two skills – range and arm – looked even "more" normal. They're shown in the above link also.

I didn't run any formal statistical tests for normality. Indeed, it's probably easy to show that none of the three skill distributions should be normal. But they're *approximately* normal, pretty good bell shapes. A normal distribution has a "skewness" (measure of symmetry) of zero, and a "kurtosis" (measure of peakedness) of 3. Here are the stats for these four curves:

Normal : Skewness = +0.0, Kurtosis = 3.0

Overall: Skewness = +2.5, Kurtosis = 11.5
Batting: Skewness = +0.4, Kurtosis = 3.6
-Range : Skewness = -0.1, Kurtosis = 2.8
-- Arm : Skewness = +0.1, Kurtosis = 2.4

The above numbers don't tell you anything the graphs don’t – they're just a numerical way of summing up the pictures. I have no idea if the last three are statistically significantly different from normal (0.0, 3.0), but they're pretty close in real-life terms.

I can't help but conclude that most individual player skills, so long as they're not overwhelmingly correlated with the player's overall value, could be pretty close to normally distributed.

But a few points. First, I assumed that all three skills were independent of each other in the general population. In real life, that's obviously not true: good athletes will have both good range and good arms. The more correlated the skills, the more likely they'll be in line with overall value, which is right-tail shaped. So that might change things.

Second, the assumption was that an arm of –2 SDs is worth twice as much (badness) as an arm of –1 SD. That's again not true: a minus-two player can be taken advantage of by baserunners, and can therefore cost his team three or four times as much as the minus-one guy. That might mean there are fewer –2s in real life than the model, which would make the distribution look more right-tailed.

(Finally, my sample may be too big. 87 out of two million is the equivalent of some 4,000 players in the U.S. male population of baseball-playing age. Throwing out half the players may make everything more right-tailed. Hang on, let me check ... nope, results stayed roughly the same. Never mind!)

By the way, a few more interesting notes from the sample of 87:

1. The players had an average batting rating of 163. The average range rating was only 119. The average arm rating was barely above average at 105. This is as you'd expect: if you can hit 4 SDs above average, you're a major leaguer, no matter how bad your arm. But if you can throw 4 SDs above average, so what? Unless you can hit, your arm just isn't that valuable to the team.

This works for other sports too – in golf, as I understand it, there are guys who can drive a ball 400 yards. But the other aspects of their game aren't very good, so they're doing driving contests for a living instead of playing on the PGA tour.

2. The best player in the sample, with an overall 161, had only an 88 arm. Again, this makes sense. It would probably take months of going through people at the mall before you found one who can hit like a major-leaguer outfielder. But you could probably find some guy who can throw like a major-league outfielder much more easily.

3. There was a high negative correlation (r = -0.75) between hitting and range. That makes sense too. Players who make it to the majors with their bat don't have to have good range to keep their jobs. And players who earn a job with their glove are unlikely to be among the best hitters.

The correlation between range and arm was –0.14, and between bat and arm was –0.18.

4. Every one of the 87 players had a better hitter rating than his overall rating. This is probably indicative of the study being oversimplified, since there are numerous players in MLB whose value comes mostly from defense. A better study is probably called for, but I think the conclusion – that individual skills may indeed be normally distributed – is still supported.


  1. I would expect hitting ability to be tail-shaped because there is really no such thing as "hitting ability". There are several unrelated abilities that one must excel at in order to be able to hit a baseball at the major-league level. It's obvious that one must have superior eye-hand coordination to hit a baseball. What's not as obvious is that one must have a much shorter latency than average in saccadic eye movements, or that one must be much better than average at detecting visual contrast, just to name two. The ability to hit is rarer than something like a major-league throwing arm because the former is composed of a suite of many independent abilities, while the latter is less so.

  2. You may be interested in an old "article" of mine on talent distribution.

    I agree that each component would be very close to a normal distribution. You can see that if you look at even things like BB rates and K rates. It's almost a given that the "ability to hit make contact on a 0-2 pitch" will be normal, even among MLB-only players, for the "cumulative" reason that Phil specifies.

  3. One thing about the distribution of talent (e.g., hitting) is that it may not be possible to infer the charcteristics of the distribution of talent from the distribution of outcomes. With hitting, for example, outcomes are a result of the intersection of two distributions--the distribution of hitting skill and of pitching skill. Both may be positively skewed, but the distribution of outcomes may wind up looking normal.

    The problem is to obtain measures of talent/skill that are independent, in this case, independent of efforts to thwart my use of a skill.

    For that reason, I would expect to see that the distribution of outcomes in an individual sport like golf (or bowling) are less likely to be normal than those in team sports.

  4. Well, so much at least for a piece of that hypothesis. In 2006, the mean driving distance on the PGA Tour was 289.5 years with a s.d. of 4.35 yards (plarers, not individual drives, being the unit of observation.). Using intervals of 1 s.d. around the mean (the center interval is +0.5 s.d. to - 0.5 s.d.), I get this distribution:

    320.0 - 311.3 2
    311.2 - 302.9 9
    302.8 - 294.0 52
    293.9 - 285.1 71
    285.0 - 276.2 50
    276.1 - 267.3 10
    267.2 - 258.4 1

    I gotta say, even without doing any tests, that looks symmetrical, even is not normal (instead of 2/3 of the observations being within 2 s.d. of the mean, 88% are within 1.5 s.d. of the mean). If anything, the non-normality shows up as compression of the distribution, not as skewness.

  5. That should be a s.d. of 8.7 yards...sorry about that.

  6. The first thing I thought of while reading this analysis was the NFL combine. If this data are public, that would be a great source for determining distribution of individual tasks such as 40 yard dash, vert leap, bench press, etc...

  7. Nate's probably right, but even then it'd be necessary to distinguish position differences, I'd think. The question is what's the relevant (sub)population.

  8. Driving accuracy in golf in 2006 on the PGA tour (greens in regulation or better) is also symmetrically, but non-normally (compressed) distributed. Again, about 88% of the observations are within 1.5 s.d. of the mean of 63.45% of greens in regulation or better.

  9. In a normal distribution, 88.3% of all data points are +/- 1.5 standard deviations.

  10. Phil: Interesting analysis. Basically, the normal distribution for individual skills is created by valuing multiple skills. What's surprising is that hitting, even weighted at 70%, does not look more like the right tail.

    It seems to me that we should expect pitching talent in the aggregate -- i.e. run prevention --to look like the right tail, since we don't care about any other skill than that. (If you separate starters from relievers.)

  11. A few comments ...

    1. Chucko, are you missing the word "not" in your first sentence?

    2. Tango: thanks for the link, I had seen that study of yours. What it implies is that talent in terms of *humans* looks like the right tail, but because the left side of the right tail doesn't play much, the distribution of *playing time* looks symmetrical. I agree with you. My point was a bit different, that even if players get *equal* playing time, you still get something normal-looking.

    3. Doc: Tango is right about the +/- 1.5. The "2/3" figure is for 1 SD, not 2. Thanks for the PGA info, very interesting!

    4. Guy: agreed, that run prevention talent, being the "goal," should be right-tailed shaped. The problem is that we only have actual performance, not talent, which complicates things. Perhaps if you use career data, where you can assume that there's not much regression to the mean ...

  12. One problem I have is typing faster than I think. The 88% in golf is actually within 1 s.d. or the mean, not 1.5. (And, yeah, I knew the actual distribution...its been a long week already...)

  13. Over enough sample points (time) isn't this just the Central Limit Theorem at work?

  14. I think it would only be a truncated right tail of a normal distribution if baseball talent evaluators were perfect.

    Allowing for the imperfection of evaluation, and for the fundamental decline in skills for some pitchers due to injury or otherwise, we'd expect the left side of the "cutoff" to have somewhat of a tail itself. It might look more like a gamma distribution.

  15. Very interesting discussion! It would be interesting to repeat the simulation for different weightings on the three talent categories, which could represent different positions. Presumably the DH position would be more skewed since only the single category of hitting is appropriate.