Sabermetric Research: Blacks in baseball: the peak was 20%, not 27%

Saturday, October 11, 2008

Blacks in baseball: the peak was 20%, not 27%

Remember that statistic that said that the percentage of (American) blacks in major league baseball hit a high of 27% back in the early 1970s? It turns out that isn't true.
According to Carl Bialik, of the Wall Street Journal, the 27% figure applied only to ~~full-time non-pitchers~~ non-pitchers with at least 50 games that season. If you include everyone, the actual high was 20%.

Today's figure is around 8%, so there's still a sizeable drop to explain – just not as large as originally purported.

The originator of the original "27%" figure, John Loy, used it in a study where he looked for evidence of "stacking" (which means restricting black players to certain positions). He found that African-American players were disproportionally represented in the outfield, and suggested that teams put blacks in the outfield because that way they'll have less interaction with their (mostly white) teammates.

But didn't Bill James note (maybe in his 1987 rookie study?) that black players appeared to keep their foot speed a lot longer than white players did? I remember he once mentioned that Rick Monday was drafted partially because he was so fast. Today, of course, Monday isn't associated with speed at all – he's remembered mostly in connection with flag burning and breaking Tango's heart.

Anyway, if Bill was right, that would certainly explain the effect – you have to be reasonably fast to play the outfield, but not to catch, play first base, or designated hit. So there would appear to be some segregation by race, when it's really by speed.

I haven't read the Loy study – he might have corrected for this. I'm just saying.

There's another effect Bialik mentions:

" ... other research suggests that latent racism within the game tended to reserve bench spots for white players."

...

"[SABR's Mark] Armour found that black players consistently have outperformed their contemporaries in total "win shares," a statistic developed by baseball numbers pioneer Bill James that represents players' total contribution to a team's success. ... One reason black players were, on average, better than white players was that they needed to be to make the roster."

I think Bill James debunked this one a long time ago too. If blacks are slightly better than whites, on average, the effect is magnified at the extremes of ability.

Suppose that, on average, whites average 100 "points" of ability, but blacks average 103. And suppose both races are normally distributed with standard deviation of 10. Finally, let's say blacks are 15% of the population.

In that case, about 24% of blacks will be at 110 or higher – but only 16% of whites. In terms of population, 21% of the 110+ players will be black.

But now, let's look at the star players, the ones above 130. Only 0.35% of blacks will achieve this mark. But for whites, it's a lot less -- 0.13%. So in that group, almost 40% of players will be black.

-- 40% black players at 130+ (stars)
-- 21% black players at 110+ (bench players)

So there's a very plausible explanation of why there are proportionally fewer blacks on the bench, than there are blacks playing full time.

Again, I'm not sure what specific studies Bialik is referring to, but I hope there's more evidence for the racism hypothesis than just the numbers he mentions.

Hat Tip: Bob Timmermann

Labels: baseball, race

4 Comments:

At Monday, October 13, 2008 2:29:00 PM, Anonymous said...: Hi Phil,

several comments:

"[Loy's] 27% figure applied only to full-time non-pitchers."

Actually Bialik says that Loy used a fifty game minimum for position players. So he goes a ways down from full time, say to the level of the 22nd man on the roster. The pitchers vs non-pitchers split does raise an interesting point - an adjustment should be made for the increasing allocation of roster spots over time for pitchers. Your suggestion about outfield segregation being due to speed instead of race also has potential, assuming James' original result holds up. I am not aware that anyone has re-studied that question. But you'd have to look at age adjustments, and how many of the slowing white outfielders transitioned to 1B or 3B (or DH?)to crowd out slower black players, as opposed to just transitioning out of the majors altogether. [Eventually those black outfielders slow as well and transition somewhere ...]

" "... latent racism within the game tended to reserve bench spots for white players." ... I think Bill James debunked this one a long time ago too."

This doesn't ring a bell with me. (Can anyone provide a reference to this debunking?)

Even if for the sake of argument I accept the proposition in your model that blacks are slightly more talented than whites, your model and your argument need to be reformulated to be convincing.

If we're going center ability on 100 and assume that it is normally distributed, the cutoff for fringe major league talent must be closer to 140 or 150 at least. [This is just from memory, but if you compare the number of American HS baseball players to the eventual number of American born major leaguers from that approximate birth year (as a proxy for being part of that original HS cohort), that sliver, 140 of 3,000,000 (?), would nominally represent 5 SD above average baseball talent. Now some of those top HS baseball players chose to become football players or physicists instead of pursuing a baseball career, so I won't be surprised if you only need to be 3-4 SD above average in a normal distribution to make it to the majors, but we're not yet talking stars, or regulars, or regular bench players; this threshhold is just players good enough to have appeared in at least one game (ever) in the majors.

You illustrate the results of your model for proportion of "star" vs fringe players, then draw a conclusion about representation among "full time" vs fringe players. How many full time players do we want to elevate to the status of stars? One out of six? One more SD up in talent ??

Your illustration combined with that conclusion will mislead careless readers; it implies that your initial (possibly modest) assumption would allow you to explain a greater degree of imbalance between whites and blacks in the population of fringe players.

A more convincing discussion of the model would map to reality at each level - are the ratios of blacks to whites among star players and among non-star regulars and among semi-regulars or "bench" players consistent with observation at every step?
At Monday, October 13, 2008 2:38:00 PM, Phil Birnbaum said...: Hi, Joe,

Point taken about 50 games not being full-time ... I read too quickly.

The other result, I think, holds up if you use more realistic numbers. (I'll post a better example later today when I get a chance.) I'll try using 140 and 150 as the cutoffs for bench and stars, or something.

Actually, I can check now ...

At 143, the whites are 4.3 SDs, the blacks are 4.0. The ratio of blacks to whites (here assuming equal numbers of each race, instead of adjusting for the actual percentage) is 3.71 blacks for each white.

At 153, the whites are 5.3 SDs, the blacks are 5.0. The ratio is 4.95.

Assuming 153 is stars, and 143 is bench players, you get (after adjusting for population size):

bench: 26 whites per 100 blacks
stars: 19 whites per 100 blacks

So the relationship still holds.
At Tuesday, October 14, 2008 9:22:00 AM, Anonymous said...: Phil,

we're probably talking past each other, since I'm not sure what your exact point is. There is some "conventional wisdom" that discrimination against blacks in baseball continued for a long time in the form of a preference for white bench players.
Now I don't know if anyone has actually argued it exactly this way, but we can imagine that an upholder of this view might argue the deficiency of black bench players in a couple of ways. (15% of population is black, so 15% of bench should be black OR 25% of regulars were black, so 25% of bench should be black). [You keep illustrating with "stars" - could you just mean starters? - I doubt anyone has ever made the 2nd form of argument and based it on an identification of a subpopulation of stars within the population of starting players!]

As I see it, your intention is to criticise the second form of argument, by pointing out that some degree of disparity could occur without any discrimination.

What I tried to say is that this purely theoretical argument is not convincing without any checking against reality. Just to use your numbers, if we compared to reality and happened to observe that 26% of the stars [sic] are black, but only 11% of the bench is black (instead of the model's prediction of 19%), then perhaps your talent model is wrong, but under it there is certainly still plenty of room for discrimination to have been occurring. In addition I suspect the "talent gap" between regulars and bench players is much narrower than your illustration suggests.

Your model could be true without refuting the discrimination explanation. It would merely erode the size of the discrimination effect. i.e. the latent discrimination explanation is certainly not "debunked" by having an untested alternate explanation.

We know that discrimination against blacks in baseball once existed, to 1947 and well beyond, and that it at least declined. We don't know that it disappeared, and we don't know that blacks actually have a different talent distribution. So if you want to argue convincingly against the discrimination explanation, I think you need to provide better justification for an assumption of a difference in talent distribution.

Now I am not hostile to the proposal that the talent distribution may be different and that this may have real effects. We see it "all the time" in the populations such as great hockey players (disproportionately Canadian), ballerinas (disproportionately Russian) and elite marathoners (disproportionately Kenyan). But these are historically contingent results - much more connected to availability of opportunity to develop and to motivation - social reasons, not genetic ones. I can imagine that the proportion of major league caliber black players has changed over time - for example that there was a peak maybe a generation after Jackie Robinson, and a natural decline from that with the rise of basketball and football as options and with the disappearance of sandlots.

Indeed, the distribution of talent could have "flipped" over a generation, so that (by your model) blacks should nowadays be under-represented as baseball stars (or starters) and over-represented on the bench. So besides the subjectivity involved in deciding whether a player should be counted as "black", the choice of start and end dates could obscure our ability to measure and justify any "talent effect."
At Tuesday, October 14, 2008 9:49:00 AM, Phil Birnbaum said...: Joe,

Right. I'm not saying there's no discrimination. I'm just saying that you need more evidence than just noting that there are proportionally fewer blacks on the bench than in the field.

As I said in the last line of my post: I hope there's more evidence for the racism hypothesis than just the numbers [Bialik] mentions.

Sabermetric Research

Saturday, October 11, 2008

Blacks in baseball: the peak was 20%, not 27%

4 Comments:

About Me

Previous Posts