Tuesday, September 07, 2021

Are umpires racially biased? A 2021 study (Part II)

(Part I is here.)


20 percent of drivers own diesel cars, and the other 80 percent own regular (gasoline) cars. Diesels are, on average, less reliable than regular cars. The average diesel costs $2,000 a year in service, while the average regular car only costs $1,000. 

Researchers wonder if there's a way to reduce costs. Maybe diesels cost more partly because mechanics don't like them, or are unfamiliar with them? They create a regression that controls for the model, age, and mileage of the car, as well as driver age and habits. But they also include a variable for whether the mechanic owns the same type of car (diesel or gasoline) as the owner. They call that variable "UTM," or "user/technician match".

They run the regression, and the UTM coefficient turns out negative and significant. It turns out that when the mechanic owns the same type of car as the user, maintenance costs are more than 13 percent lower! The researchers conclude that finding a mechanic who owns the same kind of car as you will substantially reduce your maintenance costs.

But that's not correct. The mechanic makes no difference at all. That 13 percent from the regression is showing something completely different.

If you want to solve this as a puzzle, you can stop reading and try. There's enough information here to figure it out. 

-------

The overall average maintenance cost, combining gasoline and diesel, is $1200. That's the sum of 80 percent of $1000, plus 20 percent of $2000.

So what's the average cost for only those cars that match the mechanic's car? My first thought was, it's the same $1200. Because, if the mechanic's car makes no difference, how can that number change?

But it does change. The reason is: when the user's car matches the mechanic's, it's much less likely to be a diesel. The gasoline owners are over-represented when it comes to matching: each has an 80% chance of being included in the "UTM" sample, while the diesel owner has only a 20% chance.

In the overall population, the ratio of gasoline to diesel is 4:1. But the ratio of "gasoline/gasoline" to "diesel/diesel" is 16:1. So instead of 20%, the proportion of "double diesels" in the "both cars match" population is only 1 in 17, or 5.9%.

That means the average cost of UTM repairs is only $1059. That's 94.1 percent of $1000, plus 5.9% of $2000. That works out to 13.3 percent less than the overall $1200.

Here's a chart that maybe makes it clearer. Here's how the raw numbers of UTM pairings break down, per 1000 population:

Technician      Gasoline    Diesel    Total
-------------------------------------------
User gasoline     640        160       800
User diesel       160         40       200
-------------------------------------------
Total             800        200      1000 
 

The highlighted diagonal is where the user matches the mechanic. There are 680 cars on that diagonal, but only 40 (1 in 17) are diesel.

In short: the "UTM" coefficient is significant not because matching the mechanic selects better mechanics, but because it selectively samples for more reliable (gasoline) cars.

--------

In the umpire/race study I talked about last post, they had a regression like that, where they put all the umpires and batters together into one regression and looked at the "UBM" variable, where the umpire's race matches the batter's race. 
From last post, here's the table the author included. The numbers are umpire errors per 1000 outside-of-zone pitches (negative favors the batter).

Umpire             Black   Hispanic   White
-------------------------------------------
Black batter:       ---      -5.3     -0.3
Hispanic batter    +7.8       ---     +5.9
White batter       +5.6      -4.4      ---

I had adjusted that to equalize the baseline:

Umpire             Black   Hispanic  White
------------------------------------------
Black batter:      -5.6     -0.9     -0.3
Hispanic batter    +2.2     +4.4     +5.9
White batter        ---      ---      ---

I think I'm able to estimate, from the original study, that the batter population was almost exactly in the 2:3:4 range -- 22 percent Black, 34 percent Hispanic, and 44 percent White. Using those numbers, I'm going to adjust the chart one more time, to show approximately what it would look like if the umpires were exactly alike (no bias) and each column added to zero. 

Umpire             Black   Hispanic  White
------------------------------------------
Black batter:      -2.2     -2.2     -2.2
Hispanic batter    +3.8     +3.8     +3.8
White batter       -1.7     -1.7     -1.7

I chose those numbers so the average UBM (average of diagonals in ratio 22:34:44) is zero, and also to closely fit the actual numbers the study found. That is: suppose you ran a regression using the author's data, but controlling for batter and umpire race.  And suppose there was no racial bias. In that case, you'd get that table, which represents our null hypothesis of no racial bias.

If the null hypothesis is true, what will a regression spit out for UBM? If the batters were represented in their actual ratio, 22:34:44, you'd get zero:

Diagonal          Effect Weight    Product
-------------------------------------------------
Black UBM          -2.2    22%     -0.5  
Hispanic UBM       +3.8    34%     +1.5  
White UBM          -1.7    44%     -0.8  
-------------------------------------------------
Overall UBM               100%     -0.0  per 1000

However: in the actual population in the MLB study, the diagonals do NOT appear in the 22:34:44 ratio. That's because the umpires were overwhelmingly White -- 88 percent White. There were only 5 percent Black umpires, and 7 percent Hispanic umpires. So the White batters matched their umpire much more often than the Hispanic or Black batters.

Using 5:7:88 for umpires, and 22:34:44 for batters, the relative frequency of each combination looks like this. Here's the breakdown per 1000 pitches:

                                             Batter
Umpire             Black   Hispanic  White    Total
---------------------------------------------------
Black batter        11        15      194      220
Hispanic batter     17        24      300      341
White batter        22        31      387      439
---------------------------------------------------
Umpire total        50        70      881     1000

Because there are so few minority umpires, there are only 24 Hispanic/Hispanic pairs out of 422 total matches on the UBM diagonal.  That's only 5.7% Hispanic batters, rather than 34 percent:

Diagonal       Frequency  Percent
----------------------------------
Black UBM            11     2.6% 
Hispanic UBM         24     5.7%
White UBM           387    91.7%     
----------------------------------
Overall UBM         422     100%

If we calculate the observed average of the diagonal, with this 11/24/387 breakdown, we get this:
                                      
                  Effect  Weight      Product
--------------------------------------------------
Black UBM          -2.2    2.6%    -0.06 per 1000
Hispanic UBM       +3.8    5.7%    +0.22 per 1000
White UBM          -1.7   91.7%    -1.56 per 1000 
--------------------------------------------------
Overall UBM                100%    -1.40 per 1000

Hispanic batters receive more bad calls for reasons other than racial bias. By restricting the sample of Hispanic batters to only those who see a Hispanic umpire, we selectively sample fewer Hispanic batters in the UBM pool, and so we get fewer bad calls. 

Under the null hypothesis of no bias, UBM plate appearances still see 1.40 fewer bad calls per 100 pitches, because of selective sampling.

------

That 1.40 figure is compared to the overall average. The regression coefficient, however, compares it to the non-UBM case. What's the average of the non-UBM case?

Well, if a UBM happens 422 times out of 1000, and results in 1.40 pitches fewer than average, and the average is zero, then the other 578 times out of 1000, there must have been 1.02 pitches more than average. 

                  Effect  Weight       Product
--------------------------------------------------
UBM                -1.40   42.2%   -0.59 per 1000
Non-UBM            +1.02   57.8%   +0.59 per 1000
--------------------------------------------------
Full sample                100%    -0.00 per 1000

So the coefficient the regression produces -- UBM compared to non-UBM -- will be 2.42.

What did the actual study find? 2.81. 

That leaves only 0.39 as the estimate of potential umpire bias:

-2.81  Selective sampling plus possible bias
-2.42  Effect of selective sampling only
---------------------------------------------
-0.39  Revised estimate of possible bias

The study found 2.81 fewer bad calls (per 1000) when the umpire matched the pitcher, but 2.42 of that is selective sampling, leaving only 0.39 that could be umpire bias.

Is that 0.39 statistically significant? I doubt it. For what it's worth, the original estimate had an SD of 0.44. So adjusting for selective sampling, we're less than 1 SD from zero.

--------

So, the conclusion: the study's finding of a 0.28% UBM effect cannot be attributed to umpire bias. It's mostly a natural mathematical artifact resulting from the fact that

(a) Hispanic batters see more incorrect calls for reasons other than bias, 

(b) Hispanic umpires are rare, and

(c) The regression didn't control for the race of batter and umpire separately.

Because of that, almost the entire effect the study attributes to racial bias is just selective sampling.













Labels: , , , ,

Monday, August 30, 2021

Are umpires racially biased? A 2021 study (Part I)

Are MLB umpires racially biased? There's a recent new study that claims they are. The author, who wrote it as an undergrad thesis, mentioned it on Twitter, and when I checked a week or so later, there were lots of articles and links to it. (Here, for instance, is a Baseball Prospectus post reporting on it.  And here's a Yahoo! report.)

The study tried to figure whether umpires make more bad calls against batters* of a race other than theirs (where there is no "umpire-batter match," or "UBM," as the literature calls it). It ran regressions on called pitches from 2008 to 2020, to figure out how best to predict the probability of the home-plate umpire calling a pitch incorrectly (based on MLB "Gameday" pitch location). The author controlled for many different factors, and found a statistically significant coefficient for UBM, concluding that the pitcher gains an advantage when the umpire is of the same race. It also argues that white umpires in particular "could be the driving force behind discrimination in MLB."  

I don't think any of that is right. I think the results point to something different, and benign. 

---------

Imagine a baseball league where some teams are comprised of dentists, while the others are jockeys. The league didn't hire any umpires, so the players take turns, and promise to call pitches fairly.

They play a bunch of games, and it turns out that the umpires call more strikes against the dentists than against the jockeys. Nobody is surprised -- jockeys are short, and thus have small strike zones.

It's true that the data shows that if you look at the Jockey umpires, you'll see that they call a lot fewer strikes against batters of their own group than against batters of the other group. Their "UBM" coefficient is high and statistically significant.

Does that mean the jockey umps are "racist" against dentists? No, of course not. It's just that the dentists have bigger strike zones. 

It's the same, but in reverse, for the dentist umpires. They call more strikes against their fellow dentists -- again, not because of pro-jockey "reverse racism," but because of the different strike zones.

Later, teams of NBA players enter the league. These guys are tall, with huge strike zones, so they get a lot of called strikes, even from their own umpires.

Let's put some numbers on this: we'll say there are 10 teams of dentists, 1 team of jockeys, and 2 teams of NBA players. The jockeys are -10 in called strikes compared to average, and the NBA players are +10. That leaves the dentists at -1 (in order for the average to be zero).

Here's a chart that shows every umpire is completely fair and unbiased. 

Umpire             Jockey    NBA    Dentist
-------------------------------------------
Jockey batter:       -10     -10     -10
NBA batter           +10     +10     +10
Dentist batter        -1      -1      -1

I've highlighted the "UBM" cells where the umpire matches the batter. If you look only at those cells, and don't think too much about what's going on, you could think the umpires are horribly biased. The Jockey batters get 10 fewer strikes than average from Jockey umpires!  That's awful!

But then when you look closer, you see the horizontal row is *all* -10. That means all the umpires called the jockeys the same way (-10), so it's probably something about the jockey batters that made that happen. In this case, it's that they're short.

I think this is what's going on in the actual study. But it's harder to see, because the chart isn't set up with the raw numbers. The author ran different regressions for the three different umpire races, and set a different set of batters as the zero-level for each. Since they're calibrated to a different standard of player, the results make the umpires look very different.

If I had done here what the author did there, the chart above would have looked like this:

Umpire             Jockey    NBA   Dentist
------------------------------------------
Jockey batter:         0    -20      -9
NBA batter           +20      0     +11
Dentist batter        +9    -11       0

If you just look at this chart without knowing you can't compare the columns to each other (because they're based on a different zero baseline), it's easy to think there's evidence of bias. You'd look at the chart and say, "Hey, it looks like Jockey umpires are racist against NBA batters and dentists. Also, dentist umpires are racist against NBA players but favor Jockeys somewhat. But, look!  NBA umpires actually *favor* other races!  That's probably because NBA umpires are new to the tournament, and are going out of their way to appear unbiased."  

That's a near-perfect analogue to the actual study.  This is the top half of Table 8, which measures "over-recognition" of pitchers, meaning balls incorrectly called as strikes (hurting the batter). I've multiplied everything by 1000, so the numbers are "wrong strike calls per 1000 called pitches outside the zone".

Umpire             Black   Hispanic   White
-------------------------------------------
Black batter:       ---      -5.3     -0.3
Hispanic batter    +7.8      ---      +5.9
White batter       +5.6      -4.4      ---

It's  very similar to my fake table above, where the dentists and Jockeys look biased, but the NBA players look "reverse biased". 

The study notes the chart and says,

"For White umpires, the results suggest that for pitches outside the zone, Hispanic batters ... face umpire discrimination. [But Hispanic umpires have a] "reverse-bias effect ... [which] holds for both Black and White batters... Lastly, the bias against non-Black batters by Black umpires is relatively consistent for both Hispanic and White batters."

And it rationalizes the apparent "reverse racism" from Hispanic umpires this way:

"This is perhaps attributable to the recent increase in MLB umpires from Hispanic countries, who could potentially fear the consequences of appearing biased towards Hispanic players."

But ... no. The apparent result is almost completely the result of setting a different zero level for each umpire/batter race -- in other words, by arbitrarily setting the diagonal to zero. That only works if the groups of batters are exactly the same. They're not. Just as Jockey batters have different characteristics than NBA player batters, it's likely that Hispanic batters don't have exactly the same characteristics as White and Black batters.

The author decided that White, Black, and Hispanic batters all should get exactly the same results from an unbiased umpire. If that assumption is false, the effect disappears. 

Instead, the study could have made a more conservative assumption: that unbiased umpires of any race should call *White* batters the same. (Or Black batters, or Hispanic batters. But White batters have the largest sample size, giving the best signal-to-noise ratio.)

That is, use a baseline where the bottom row is zero, rather than one where the diagonal is zero. To do that, take the original, set the bottom cells to zero, but keep the differences between any two rows in the same column:

Umpire             Black   Hispanic  White
------------------------------------------
Black batter:      -5.6     -0.9     -0.3
Hispanic batter    +2.2     +4.4     +5.9
White batter        ---      ---      ---

Does this look like evidence of umpire bias? I don't think so. For any given race of batter, all three groups of umpires call about the same amount of bad strikes. In fact, all three groups of umpires even have the same *order* among batter groups: Hispanic the most, White second, and Black third. (The raw odds of that happening are 1 in 36). 

The only anomaly is that maybe it looks like there's some evidence that Black umpires benefit Black batters by about 5 pitches per 1,000, but even that difference is not statistically significant. 

In other words: the entire effect in the study disappears when you remove the hidden assumption that Hispanic batters respond to pitches exactly the same way as White or Black batters. And the pattern of "discrimination" is *exactly* what you'd expect if the Hispanic batters respond to pitches in ways that result in more errors -- that is, it explains the anomaly that Hispanic umpires tend to look "reverse racist."

Also, I think the entire effect would disappear if the author had expanded his regression to include dummy variables for the race of the batter.  

------

If, like me, you find it perfectly plausible that Hispanic batters respond to pitches in ways that generate more umpire errors, you can skip this section. If not, I will try to convince you.

First, keep in mind that it's a very, very small difference we're talking about: maybe 4 pitches per 1,000, or 0.4 percent. Compare that to some of the other, much larger effects the study found:

 +8.9%   3-0 count on the batter
 -0.9%   two outs
 +2.8%   visiting team batting
 -3.3%   right-handed batter
 +0.5%   right-handed pitcher
+19.7%   bases loaded (!!!)
 +1.4%   pitcher 2 WAR vs. 0 WAR
 +0.9%   pitcher has two extra all-star appearances
 +4.0%   2019 vs. 2008
---------------------------------------------------
 +0.4%   batter is Hispanic
---------------------------------------------------

I wouldn't have expected most of those other effects to exist, but they do. And they're so large that they make this one, at only +0.4%, look unremarkable. 

Also: with so many large effects found in the study, there are probably other factors the author didn't consider that are just as large. Just to make something up ... since handedness of pitcher and batter are so important, suppose that platoon advantage (the interaction between pitcher and batter hand, which the study didn't include) is worth, say, 5%. And suppose Hispanic batters are likely to have the platoon advantage, say, 8% less than White batters. That would give you an 0.4% effect right there.

I don't have data specifically for Hispanic batters, but I do have data for country of birth. Not all non-USA players are Hispanic, but probably a large subset are, so I split them up that way. Here is batting-handedness stats for players from 1969 to 2016:

Born in USA:       61.7% RHB
Born outside USA:  67.1% RHB

That's a 10% difference in handedness. I don't know how that translates into platoon advantage, but it's got to be the same order of magnitude as what we'd need for 0.4%.

Here's another theory. They used to say, about prospects from the Dominican Republic, that they deliberately become free swingers because "you can't walk off the island."  

Suppose, that knowing a certain player is a free swinger, the pitcher aims a bit more outside the strike zone than usual, knowing the batter is likely to swing anyway. If the catcher sets a target outside, and the pitcher hits it perfectly, the umpire may be more likely to miscall it as a strike (at least according to many broadcasters I've heard).

Couldn't that explain why Hispanic players get very slightly more erroneous strike calls? 

In support of that hypothesis, here are K/W ratios for that same set of batters (total K divided by total BB):

Born in USA:       1.82 K per BB
Born outside USA:  2.05 K per BB 

Again, that seems around the correct order of magnitude.

I'm not saying these are the right explanations -- they might be right, or they might not. The "right answer" is probably several factors, perhaps going different directions, but adding up to 0.4%. 

But the point is: there do seem to be significant differences in hitting styles between Hispanic and non-Hispanic batters, certainly significant enough that an 0.4% difference in bad calls is quite plausible. Attributing the entire 0.4% to racist umpires (and assuming that all races of umpires would have to discriminate against Hispanics!) doesn't have any justification whatsoever -- at least not without additional evidence.

-------

Here's a TLDR summary, with a completely different analogy this time:

Eddie Gaedel's father calls fewer strikes on Eddie Gaedel than Aaron Judge's father calls on Aaron Judge. So Gaedel Sr. must be biased! 

--------

There's another part of the study -- actually, the main part -- that throws everything into one big regression and still comes out with a significant "UBM" effect, which again it believes is racial bias. I think that conclusion is also wrong, for reasons that aren't quite the same. 

That's Part II, which is now here.

----------


(*The author found a similar result for pitchers, who gained an advantage in more called strikes when they were the same race as the umpire, and a similar result for called balls as well as called strikes. In this post, I'll just talk about the batting side and the called strikes, but the issues are the same for all four combinations of batter/pitcher ball/strike.)


Labels: , , , ,

Thursday, May 22, 2014

Are black NBA fans less loyal to their home teams?

Black NBA fans seem to be less loyal to their hometown teams than non-black NBA fans, the New York Times has found.

Here's the article, from Nate Cohn of The Upshot. It found data showing that, in ZIP codes where at least 40 percent of residents are black, the home team got a significantly smaller proportion of Facebook "likes" than in other ZIP codes. In Milwaukee, for instance, the map highlighting black areas is almost identical to the map highlighting areas where more fans prefer teams other than the Bucks. Here are those stolen maps:





It's a very interesting finding. But, I'm not convinced it's a race thing.

In general, what can you say about sports fans whose favorite team isn't their own city's?  It seems like they're more serious fans. Here in Ottawa, we have a lot of fans who are ... not bandwagon jumpers, but just people who support the team, by default, because it's the Ottawa team. A lot of them don't know that much about the players, or hockey in general.

But fans who support unlikely teams, like the Sharks or the Predators, probably have more than a passing interest in the NHL. Maybe they like the style of play, or one of their favorite players is there, or even, they want to root for a more successful team.  

Might that explain what's happening in Milwaukee? As it turns out, blacks are indeed more likely to be serious NBA fans. The second sentence of the article says,
"About 45 percent of people who watched N.B.A. games during the 2012-2013 regular season were black, even though African-Americans make up 13 percent of the country's population."

So, what I suspect is that at least part of the explanation is that black areas are being confounded with "high fan interest" areas.  I have no evidence of that, and I might be wrong.  (But, the article has no evidence against it, either.)

-----

It's not just Milwaukee: Cohn discovered the same effect in Cleveland, Memphis, Atlanta, Detroit and Chicago. But, interestingly, there was no effect in Houston, Philadelphia, Dallas, and Washington.  

What would the difference be? Maybe the success of the teams?  The "disloyal" cities' teams averaged 35.5 wins this past season, and four of the six had losing records. The "loyal" fans' teams averaged 41.5, and only one of the four was below .500 (Philadelphia, at 19-63).

That's something, but it doesn't seem that strong.  

Are there be some cities where it's culturally acceptable to root for a different team, and other cities where it's not? Is this one of those random "tipping point" things?

Are basketball fans -- or even black basketball fans -- more fervent in Cleveland than they are in Houston? Probably not ... the Neilsen TV demographics report (.pdf), from which the "45 percent of viewers were black" statistic was taken, shows that Dallas and Chicago were almost identical in the percentage of the population that watches or listens to games.

Is the distinction just one of statistical significance, where Houston *does* have a strong effect, just not strong enough to be 2 SD from zero? Maybe it's that some cities are less segregated by ZIP code than others, so the effect is still there but doesn't show up in maps?

Could it be that there are more natural, opposing loyalties some places than others? Here in Ottawa, we have a ton of Leafs fans and Canadiens fans, because the Ottawa team is relatively new, and people hang on to the teams they loved in their childhoods. Did Milwaukee fans grow up rooting for Michael Jordan and the Bulls, which is why they formed weaker attractions to their Bucks? That sounds plausible, but then, it's hard to explain why the same thing holds for the Chicago area.

Any other ideas? Anybody see anything else that might be a relevant distinction between the two groups of cities?


Labels: ,

Thursday, June 20, 2013

Are eBay baseball card buyers racially biased?

Here's a nice recent post by Bo Rasny, "Ten Articles on Baseball Cards and Race".  Number ten is actually one of my own posts ... I'm going through the other nine.   Most of them are academic studies, some of which aren't available in full.

But this one is, a downloadable 2011 study called "Race Effects on eBay."  It finds that cards auctioned on eBay by black (African American) sellers sell for significantly less money than cards by white sellers.

Authors Ian Ayres, Mahzarin Banaji and Christine Jolls -- I'll call them "ABJ" -- bought 394 baseball cards on eBay in order to resell them.  Before posting their auctions, they split the cards into two groups, at random (actually, alternating alphabetically, but close enough).  The first group would ostensibly be posted by a black seller, and the second by a white seller.

How would the bidders know the race of the seller?  When posting, ABJ used a photograph of the card held in what was ostensibly the seller's hand.  Here's one of their example photos I'm stealing, just for fun (and because I love 1973 Topps, and because I keep seeing Don Sutton on Match Game):




When all was said and done, it turned out the white sellers' cards went for 14 or 20 percent more than the black sellers' cards, depending on how you look at it.  The result was significant at 2.2 SDs, and the authors conclude that eBay bidders discriminate by race.

It's kind of a fun study.  But, as usual, I'm dubious.  

--------

Suppose you want to find out whether white sellers can sell their car for more money than black sellers.  So you get a bunch of BMWs, and a bunch of Chevrolets, and assign them randomly to your test subjects.  It turns out the white sellers got higher prices than the black sellers.

But, what if the white sellers got more expensive cars to sell?  It could be that the white guys (and gals) wound up with 70 percent of the BMWs, and that's might explain why they got the results they did.  If that's what happened, it explains what happened without recourse to race, right?  It doesn't matter that the cars were assigned randomly, with the best random number generator money can buy.  Remember, even the best-designed study is going to give a false positive, by chance, 5 percent of the time.  This would be one of those times.  The fact that we see the intermediate result -- more BMWs for white guys -- lets us catch that false positive for what it is.  

Of course, if the cars were assigned randomly and *we never looked* at which races got which cars, we'd publish our study without hesitation.  It would still be a false positive, but we wouldn't know it.  

--------

Well, that's what happened with the baseball cards.  The authors showed the original purchase prices of the cards in the two groups.  One group sold cards that were originally purchased for an average of $9.82 each, and the other sold cards that were purchased for an average of $9.23 each.  And that difference was statistically significant at the 5 percent level! 

That is: one group got significantly "more expensive" cards to sell than the other group!  It's exactly like the BMW and Chevy example.

Except -- and here's the interesting part -- the difference went the *other way*.  It turned out that the white sellers had the "cheaper" cards, and the black sellers had the "more expensive" cards! 

In that case, it looks like that makes the effect *harder* to explain, not *easier*.  In fact, the effect is "double" what we thought it was.  Not only did the white sellers get more for their cards, but they did so even while selling less worthy cards in the first place!

The thing is, though, I'm not sure they *were* less worthy cards.  And that's the next part of the argument.  

--------

As you can imagine, when the authors auctioned off the cards they just bought, they didn't sell for the exact amount as the purchase price.  Sometimes they sold for more, and sometimes less.

But, mostly less.  A lot less.  It appears that, on average, the authors paid $9.75 for their cards, and sold them for around $6.15.  They took a 37 percent bath.

It seems that ABJ overpaid quite a bit for the cards they bought.  How is that possible, in an auction format?  On eBay, you never pay much more than the second-highest bidder.  Even if you bid $1000 ... if the next highest bid was $1, you'd wind up paying only $1.25.

It might be just because cards in this price range vary a lot in winning bids.  It's kind of random, depending on how many people happen to be interested at that particular time.  

For instance: the card in the picture, 1973 Topps Don Sutton #10, graded PSA 8.  I'm looking that up in eBay's completed auctions right now ... and I find two of them.  (Here's a link, but the results change over time.)

One sold at $7.50.  The other sold six weeks later at $13.49, almost double.  The first one had only two bids, while the second one had ten.  The first auction drew bids only on the last day.  The second one started drawing bids three days before the auction's close.

It's probably just a random thing -- the first auction had one serious bidder, while the second one happened to have two.

Checking out another card from the study, 1975 Topps #195 ... in PSA 8, there were three: $23, $33.57, and $45.  In PSA 7, they're more consistent, from $9.95 to $11.95.  

Finally, I checked 1983 Fleer Wade Boggs, #179, PSA 8 ... there are six, ranging from $1.04 to $7.50.  (The $1.04 has a "PD" downgrade, but, strangely, so does the $7.50!)

So: prices vary a lot.  And, I think, the authors' rules for when to bid amount to selective sampling on the expensive auctions.  

First: they chose only auctions with existing bids.  The auctions where the price gets bid up are the ones most likely to have bids early.  If ABJ chose to visit eBay on a random day, they'd be three times more likely to wind up in the second Don Sutton auction than the first.  

Second: ABJ chose only auctions where the existing high bid was between $3 and $8.  But bids don't reach close to their final level, usually, until close to the end of the bidding... sometimes, even, the last few minutes of a week-long auction.  That means the authors, again, selectively sampled those auctions where the price had been bid up early.

For instance: on those 1983 Wade Boggs cards, ordered by final selling price (low to high), the amount of time the bidding was over $3 was: 

never (the $1.04 card)
1 minute
36 hours
14 hours
9 hours
13 hours (the $7.50 card)

The weighted average of selling prices, by time over $3, is $4.80.  The average price overall was $4.01.  ABJ's strategy, in this case, would have them overpay by 20 percent.

ABJ acknowledge that they overpaid for the cards.  They write, " ... in purchasing the cards initially we did not exert significant effort to minimize our buying prices." 

It doesn't really matter that the selling prices were lower than the purchase prices ... so this isn't meant to be a criticism.  But the pattern of how the cards were bought is something that's going to factor into my coming argument.  

------

The authors sold the cards, overall, for about 60 percent of the prices they paid.  But, the relationship between purchase price and selling price is actually weaker than that.  In their regressions, the authors show that for every additional $1 they paid for the card, they received only an extra 40 cents.  That's the kind of relationship you'd expect when you overpay differently for some cards than others.  (For instance, if they paid the same for all the cards in a "grab bag" random purchase, the relationship would be zero.)

But, 40 cents is 40 cents, right?  The cards that cost more actually *did* bring in more, so means the black/white effect is still compounded.  It's still true that the white sellers brought in more money while selling cards that should have brought in *less* money.  Not as much less as originally thought, but still less.  Right?

Well ... it's possible that there's something else going on.  It may sound a little contrived, but ... I think it might be right.  That is, I'm not just playing devil's advocate in trying to shoot down the finding: I actually think it's plausible.  

Suppose (to simplify things) that there are two types of cards sold on eBay.

Group 1 cards are "commodities".  There are lots of copies sold, and prices are pretty constant.  There are many "buy it now" copies available, so nobody bids prices up too high, because you can always just abandon the auction and buy the card at a fixed price.  

Group 2 cards are "obscure".  They're not actually obscure in the real world -- there are lots of them around -- but they're scarce on eBay, since they're low demand and harder to sell.  They don't come up for auction much.  Common cards, say, or semi-commons.  If you have one of those, you might just put it up on a fixed-price basis, since there aren't going to be throngs of people lining up to bid. 

You can't pay too much for "commodity" cards.  For "obscure" cards, though ... it's a crapshoot.  There are few bidders.  If you're the only bidder, you get it cheap.  But if there's competition, the card goes for more than it's "worth", because it's a card that doesn't come up often and you want for personal reasons, like to complete a set.  

That is: there is wider variance in prices received for "obscure" cards.  Sometimes they go cheap, and sometimes way expensive.  

In my experience selling on eBay, this actually happens.  I had one set where I sold graded commons from the same set, and, it turned out, some of them went for two or three times as much as others.  Of course, they were different players, but ... they were all classed as commons, and there was no obvious reason why one should go for that much more than another.  It just so happened that some cards had more interest than others, for what I think was just luck of the draw.  

As I argued, ABJ would selectively wind up in the "expensive" obscure card auctions, since they'd never be the only bid, and they'd never bid until the price passed $3.  Likely, the "obscure" cards are probably the ones for which they overpaid the most.

Now -- and here's the weakest part of my argument -- maybe those are the cards they paid more for, on an *absolute* basis.  Maybe the scarce cards went for, say, $12 each, on average, and the commodity cards went for, say, $7 each, on average, just to pick numbers out of a hat.

In that case ... it's possible that the more expensive cards, overall, would actually be expected to return less on resale than the cheaper cards!  And, since the black sellers got more expensive cards, at a statistically significant level, maybe *that's* why they earned less.  Not racial bias, but worse cards.

-------

Here's a made-up example of how that might happen.  There are five commodity cards, where you pay $2, $4, $6, $8, and $10, and their resale value is 80% of what you paid.  And, there are five obscure cards, where you pay $8, $9, $10, $11, and $12, but those have a resale value of only $3 each.

The obscure cards cost you an average $10 each; the commodity cards cost you $6 each.

If you predict resale price from purchase price, using all ten cards in the regression, the coefficient is positive: for every additional $1 you pay, you get an extra 16 cents in resale.

But: suppose you give the black sellers the $4 and $8 commodity cards, and the $8, $11, and $12 scarce cards.  Now, they're selling cards bought for an average of $8.60.  The white sellers get the rest, which were bought for an average of $7.40.  

The black sellers receive $18.60 for their cards.  The white sellers receive $20.40.

The white sellers got higher prices than the black sellers, even though they sold "more expensive" cards.  When you run a regression to include a dummy variable for race, you find that the coeffiecient for "black seller" is negative 57 cents.

That works -- it matches the direction of every effect in the actual study.  The black sellers receive less money for cards that actually cost more, with no race bias whatsoever.  What makes this work is that the *category* of cards is more important than the *purchase price*, and cards in the loser category tend to be the ones that cost more.

--------

Why do I think this is what happened?  Here's one thing.

The lowest selling price for any card was 99 cents, because that was what ABJ chose as the minimum bid.  If you look at the study's ""Figure 3b", which shows selling prices, it turns out that white sellers had only 7 cards that sold at the minimum, while black sellers had 16 of them.  Now, that could be racial bias ... but it could also be that black sellers just happened to have more "obscure" cards, that were the overpriced ones, that are more likely to have only one bidder.

Why do I favor the "obscure" hypothesis over the "racial bias" one?  I find it much more plausible, in general.  The idea that eBayers are so racially biased that, for nine extra cards, *none* of the thousands of baseball card collectors on eBay wanted to bid $1.24 because the seller was black ... that seems way too extreme.  

But even if you don't agree with that ... there are probably other things it could be.  When the study itself shows that there was a significant difference between the two sets of cards, you have to at least suspect that there might be something else going on, don't you?

Look, suppose it turned out that the black sellers had wound up with mostly hockey cards, and white sellers with mostly baseball cards.  Wouldn't it be plausible that hockey cards resell for less because they're easier to overpay for?  Or, suppose that the black sellers had wound up with ungraded cards, and the white sellers with graded cards.  Again, couldn't the argument be that ungraded cards are easier to overpay for in an auction format?

Now, suppose that the black sellers wound up with cards that originally cost more, and white sellers with cards that originally cost less.  Which is what happened!  Isn't it plausible to argue that there might be something about the "expensive" cards that makes them actually harder to resell than the "cheap" cards?  Even if I haven't got that "something" exactly right?

--------

If we had the complete data, we'd have a way to test my hypothesis, that the price difference between the groups is causing the effect.  

You could take a random sample of 50 cards from the white group, and 50 cards from the black group.  If the black group still has more expensive cards, on average, throw it away and repeat.  Eventually, after several tries, you'll have a random sample with roughly equal card prices in both groups.  Test that new sample, based on the original eBay sales, and see if the results still hold.  

You can repeat that a few times, if you like, and see what happens.



Labels: , ,

Sunday, November 25, 2012

Another NBA race study

A recent academic paper claims to prove that NBA coaches discriminate in favor of players of their own race, by giving them extra playing time.  (It's been mentioned in the press here and here.)

Unlike some of the other race studies I've written about, where the problems were subtle, this one is obvious.

------

The authors start by showing differences between white and black players.  From 1996-97 to 2004-05 -- the seasons covered -- the black players performed better than the white players.  In the average of their previous 20 games, they scored 1.4 more points per 48 minutes, and had more assists and steals.  The white players, on the other hand, committed fewer turnovers, and grabbed more rebounds.  


The white players' advantages seem smaller, and that's borne out by the fact that the black players got 4.8 minutes more playing time per game.

So, the black players seem generally better than the white players.

Having shown that, the authors now run a regression on a whole bunch of stuff -- including performance stats -- to predict minutes played. 

Before the regression, the black players got 4.8 minutes on the floor than the white players.  After the controls, though, it goes the other way: being black *decreased* playing time by 2.9 minutes.

Clearly, the regression doesn't do a particularly good job predicting minutes played. 
 

Remember, the most basic comparison possible showed that the black players played 4.8 minutes more than the whites.  After controlling for everything in the regression, the difference is still 2.9 minutes.  Even after all that regressing, there still appear to be large unexplained differences between whites and blacks.

It's pretty obvious why this might happen: playing time isn't linear.  If you have twice the per-minute stats of Kobe Bryant, you're not going to play 80 minutes a game.  And if you have only 1/10 the stats, you're not going to play 4 minutes: you're going to be out of basketball. 

So the model is very poor.  And, since whites and blacks appear to be very different in their statistical characteristics, the model is inaccurate for them in different ways. 

So if the black players get 2.9 minutes less playing time than the regression thinks they should, it's probably that the model overweights the things black players do, and underweights the things white players do. 

In summary, the model the authors used overpredicts for black players -- even ignoring the race of the coach.

-------

So, what happens when the authors include a dummy for the player and coach being of a different race?

Well, most coaches are white, and most players are black.  So, "white-coach/black-player" is going to be much more frequent than "black-coach/white-player".  If the ratios are 70/30 in both cases, the "different race" bucket is going to be 84 percent black players.

And we know the model overpredicts for black players.  And that's why, when the player is of a different race than the coach, he gets less playing time than the model thinks he should.  Because, 84 percent of the time, he's black, and the model is biased too high for black players.

It's not necessarily race bias at all; it's just a consequence of having a bad model.

------

In Powerpoint form:

-- the model overpredicts for black players;
-- the "different race" case is overwhelmingly black players;


and therefore,

-- the model overpredicts for the "different race" case. 

That's really all that's going on here. 

Labels: , ,

Wednesday, May 23, 2012

Racial bias and baseball card values

Recently, I found out about "Econ Journal Watch," a journal and website that debunks bad papers.  According to its website, EJW

"watches the journals for inappropriate assumptions, weak chains of argument, phony claims of relevance, and omissions of pertinent truths."

Most excellent!  And, this issue, there's a sports article, critiquing and extending a study that searched for racial bias in baseball by looking at baseball card values.

In 2005, four researchers -- John D. Hewitt, Robert Munoz Jr., William L. Oliver, and Robert M. Regoli [call them HMOR] -- published a study in "Journal of Sport and Social Issues" that looked at the rookie card values of 51 Hall of Famers.  They ran a regression to predict card price based on the player's statistics, race, and scarcity of the card.  They found no significant effect for race.

Now, David W. Findlay and John M. Santos reviewed the HMOR study, and tried to reproduce it.  They found a few problems.

-- first, there were some problems with some of the data being wrong -- probably transcription errors.  They fixed those.

-- then, they found that the original authors had taken some of their stats from one edition of Total Baseball, and some from another edition (in which the formulas had changed).  They corrected that, too.

-- third, they found that, for five of the players, HMOR had used career numbers from the 1989 edition of Total Baseball -- even though the players were still active at that time!

-- fourth, they noticed that the authors used scarcity numbers from PSA Authenticators, but card values from Beckett.  When they substituted PSA values instead, they got a much better fit.

-- fifth, they criticize the authors for omitting Hispanic players from the sample (I don’t agree that this one is a problem).

After all that, they reproduced the analysis, and still found that there was no significant effect for race.  However, to their credit, they write,

“Although our results indicate that player race has no statistically significant effect on baseball card prices, we are mindful of Ziliak and McCloskey (2004, 334) who note that "statistical significance, to put it shortly, is neither necessary nor sufficient for a finding to be economically important." The estimated coefficient on the Black dummy variable indicates that the price of a black player’s rookie card, all else fixed, is 9.3% lower than that of an otherwise identical white player."

------

Excellent stuff ... “Econ Journal Watch” is providing an extremely valuable function, giving authors a place to publish their critiques, and thus creating an incentive to do this kind of checking.  Findlay and Santos write that they submitted their article to the journal that published the original, but it was rejected as “not a good fit.”  One suspects that if EJW hadn’t existed as a backup, they wouldn’t have bothered to investigate in the first place.

So, kudos to EJW, and to Findlay and Santos.

------

While data errors are indeed a big concern -- especially the ones resulting from truncated careers -- I think there are problems with the study that are far more serious.  Those, however, are more in the line of “subject matter” issues.  Even with the data corrected by Findlay and Santos, the flaws are so large that I don’t think the study means anything. 

-----

1.  For their measure of player performance, the authors used Pete Palmer’s “Total Player Rating” and “Total Pitcher Index”.  Those are denominated in Runs Above Average.  Do we really value a player only by his career runs?  The various Hall of Fame methods, such as Bill James’, all recognize that there are other factors that influence Hall of Fame voting, such as pitcher wins, times leading the league, times hitting .300, and so on.  Wouldn’t it be expected that collector popularity be similar? 

In fairness, the authors did try to improve on the measurement by trying average runs per season, instead of career total runs.  They found little difference.  

Still, that’s not enough.  It implies that Lou Brock should be only as esteemed as any other player producing +10.5 runs over a long career.  That ignores that there are very good reasons that Brock is in the Hall, whereas most other players at +10.5 are not.

If white players tend to create runs in ways that are more valued than ways that black players create runs, that would create a false perception of racial bias in favor of whites -- and vice-versa.  Even more important: the study is very small, with only 51 players (and only 2 black pitchers!).  Even if blacks and whites are the same, it’s very possible that just by random chance, the whites in this study just happened to create runs in more popular ways.  Over several hundred players, you might be able to assume that the effects would even out.  But not with 51 players.

2.  For card scarcity, the authors used data provided by Professional Sports Authenticator (PSA), a company that grades, authenticates, and slabs cards.  

The company provides a "Population Report", listing the number of each card graded by PSA.  But PSA doesn’t grade cards randomly -- it grades them at the owner’s request and expense.  It stands to reason that owners will submit more valuable cards much more often than less valuable cards.  That will tend to understate the actual scarcity.

To take an extreme example: from the 1960 Topps set, there were 187,192 cards graded.  From the 1988 Topps set, there were only 8,043.  But, of course, there were many, many more cards printed in 1988 than 1960 -- from these estimates, by a factor of perhaps 100 (and even that seems a bit low to me).  People just don’t get 1988 Topps cards graded -- because the cards are worth a penny, and grading costs $10 or $20 or more.

Using the PSA numbers conflates two conflicting effects: scarce cards are graded less often.  But scarce cards are expensive, and expensive cards are graded *more* often.  There’s no obvious way to figure out how to break down the two effects.

And, this creates a very strong bias.  There were more white superstars than black superstars in the 50s.  But the model underestimates the scarcity of their rookie cards.  Therefore, the model predicts a lower price, which can be misread as a racial preference for whites.

3.  The only two factors the studies considered were performance and scarcity.  But there are obviously other important reasons that a player may be more popular than another.  For instance: team.  It goes without saying, doesn’t it, that a New York Yankee superstar should be more popular, than, say, a Minnesota Twins superstar with the same stats?

If the Yankees were less likely to have black superstars than other teams, that would account for some of the difference.  If the Yankees were more likely to have white superstars with low print runs but high grading numbers -- say, Mickey Mantle -- that would cause the model to doubly underestimate what the value of the card should be.

4.  There are many other factors that influence popularity, that are specific to the particular player.  Mark Fidrych and Kerry Wood, for instance, are loved for reasons other than their career totals.  We Blue Jays fans have a bigger soft spot for Ernie Whitt than for George Bell, for reasons that (I would argue) are related more to personality than race.

You’d also think that players who spent their career with one team would have different fan bases than players whose careers spanned multiple teams.  Carl Yastrzemski, for instance, had lots of seasons to make Red Sox fans love him, and the fact that he played his entire career there makes fans love him more.  On the other hand, Dave Winfield -- the most similar player to Yastrzemski -- left strong memories in at least four different places. 

What’s more important for popularity: having lots of short-term fans in different cities, or having long-term fans in one city? 

I don't know.  But the answer matters.  And it won't necessarily even out in a sample of only 51 players.

------

Anyway, I could probably go on ... the point is, that any one of these four factors could significantly affect the findings of this study.  All four, taken together ... well, I don’t think the results tell us anything at all about race affects card prices -- or even about how performance affects card prices, or how scarcity affects card prices.

Yes, the authors screwed up the data a little bit, but ... well, that’s by far the least of this study’s problems.


Hat Tip: Marginal Revolution




Labels: , , , ,

Thursday, July 28, 2011

More fastballs = fewer called strikes

A couple of weeks ago, I noticed that, from 2004 to 2006, even though hispanic and black pitchers received a lower percentage of called strikes than white pitchers (called strikes as a percentage of called pitches), they were able to post above-average numbers.

The reason, it turned out, was that despite not getting as many called strikes, they got a lot more *swinging* strikes, and that more than compensated.

I wondered why that would happen, what was so special about those pitchers. Then, commenter GuyM e-mailed me a suggestion: it looked like the ten pitchers I highlighted were all fastball pitchers.

I went over to Fangraphs and looked them up ... and Guy was right. With the exception of Ray King, the other nine pitchers threw fastballs at or above the MLB-average rate.

So, I did a more formal test. For 2004, 2005, and 2006 (separately), I split the league into the usual nine pitcher/umpire combinations (white/hispanic/black), and figured out the average fastball percentage (FB%) for each group that year. (I didn't have breakdowns on a per-pitch basis, so I used the player's overall season rate for each cell.)

Here's 2005:

Pitcher ------ White Hspnc Black
--------------------------------
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78
--------------------------------

There's a big bump in the H/H row and column -- a lot more fastballs than you would expect. It would be hard to argue that that's racial bias, since the pitch chosen is a deliberate decision from the pitcher and catcher.

It just seems like, in 2005, the H/H pitchers happened to throw a lot of fastballs.

The situation was reversed in 2006:

Pitcher ------ White Hspnc Black
--------------------------------
White Umpire-- 61.09 60.50 62.73
Hspnc Umpire-- 61.93 58.70 58.31
Black Umpire-- 60.80 61.72 61.53
--------------------------------

Suddenly, the H/H group is throwing many FEWER fastballs. Actually, it looks like fastballs were down across the board in 2006 -- I bet that was a change in how the stringers recorded pitches, rather than an actual change in what pitchers threw. In any case, even after adjusting for that, the H/H group is low.

So what's going on? Well, it's probably just different pitchers who make up that cell. It's somewhere around 1,000 pitches each year, which means the equivalent of maybe 20 hispanic pitchers starting against hispanic umpires. Just by chance, the 20 pitchers in 2005 were fastball pitchers, and the 20 pitchers in 2006 weren't.

Finally, here's 2004, just for completeness. It doesn't really show anything interesting.

Pitcher ------ White Hspnc Black
--------------------------------
White Umpire-- 61.81 61.86 66.52
Hspnc Umpire-- 61.66 61.88 64.75
Black Umpire-- 62.66 65.54 66.40
--------------------------------

So, as I was saying ... we want to try to figure out if more fastballs lead to more called strikes. To figure that out, I ran a regression to predict fastball percentage based on strike percentage, using all 27 cases in the above three tables. Since the overall FB% seems to vary from year to year, I added two dummy variables for the individual seasons.

The result: an r-squared of 0.4, and statistical significance. More important, the results of the regression equation: a relationship where, for every 1 percentage point more called strikes you get, you're likely to have thrown 1.67 percentage points fewer fastballs.

When I took out the bottom two cells in each of the "Black" columns (in which the sample sizes are very small, around 100 and 300 pitches each respectively), the result was even more significant (r-squared 0.53), and the relationship changed from 1.67 to 1.1.

So, we have a pretty good indication that more fastballs cause fewer called strikes. Technically, I shouldn't assume causation -- the data leave open the possibility that fewer called strikes cause fastballs, or that some third variable causes both lots of fastballs and fewer called strikes. But neither of those seems very plausible.

-------

Here's a more intuitive way to see the relationship. Here's 2005 again, for fastballs:

Pitcher ------ White Hspnc Black
--------------------------------
White Umpire-- 62.01 61.87 67.86
Hspnc Umpire-- 61.74 64.91 70.89
Black Umpire-- 62.20 60.57 66.78
--------------------------------

And here's 2005 for called strikes:

Pitcher ------ White Hspnc Black
--------------------------------
White Umpire-- 32.15 31.20 31.74
Hspnc Umpire-- 31.55 31.04 24.19
Black Umpire-- 31.39 31.53 30.88
--------------------------------

If you compare the charts, you can see for yourself that the high FB% cells generally seem to be paired with low CS%.

-------

Another important thing is that, now, we can't assume that when a pitcher gets few called strikes, his performance suffers. In fact, if the reason for fewer called strikes is more fastballs, it could be the other way around.

For instance, in the center cell in 2005, where the hispanic pitchers got only 31.04 percent called strikes, they gave up a very good 3.76 RC27 (like a 3.50 ERA). But in 2006, when they got 34.16% called strikes (which is very high), the batters facing them had an RC27 of 5.52. The more called strikes, the worse the performance. Very much opposite to the way you'd think.

That's when we look mostly *between* pitchers -- pitcher A, with more called strikes, is likely to be worse than pitcher B, with fewer called strikes. We don't know the relationship within the *same* pitcher. If pitcher A gets more called strikes in one start than another, is he likely to be worse in that start? We don't know.

So, when the Hamermesh study asserts that the H/H group benefits from the umpires having called more strikes in their favor, that's not necessarily true. It might be, but it also might not be. It's certainly true if the cause IS umpire bias, because that just changes the identical pitch from a ball to a strike. But if the cause is pitch selection, the relationship could be the exact opposite.

--------

Now, in my own little study, which was an attempt to reproduce the results of the original Hamermesh study, I did indeed find that the CS% in the "hispanic/hispanic" cell was very high. Now, we have an explanation other than umpire bias -- pitching style. It could just be that the overall H/H cell had fewer fastball pitchers than expected, and that caused the results.

But, while that would explain *my* results, it won't explain the original Hamermesh results. That's because the Hamermesh study controlled for the identity of the pitcher. So, if the center cell did indeed feature a lot of finesse pitchers, their study would have adjusted for that, even though mine didn't.

Still, we have a possible *weaker* explanation. Suppose that pitchers vary their fastball tendencies from year to year. One season, they might throw 65% fastballs, but, when they're a year or two older, their slider improves, and now they only throw 55% fastballs. The Hamermesh study adjusted for the identity of the player, but not for the individual player/season. So, if hispanic pitcher X threw 55% fastballs in the season where he faced the hispanic umpire, but 60% fastballs in the season where he faced the white umpire, that would bias the results and make it look like the umpire was biased.

Or, even more granular: if pitchers change their reperatoire *from game to game*, that would also do it. For instance, suppose hispanic pitcher Y finds out his curve ball isn't working well one game, and relies more on his fastball. If that happened more in games where the umpire was white, then, again, that would make the hispanic umpire look biased in his favor.

It's important to keep in mind that this is a valid criticism only if pitch selection differences are clustered over games or seasons. If a pitcher randomly decides to throw a fastball this pitch, but a breaking ball next pitch, that's included in the significance levels of the original study. It's only when the fastballs are *clustered* within umpires, rather than random over pitches, that that's something that affects the significance levels.

-------

So where does this leave us? Well, we haven't really found any smoking gun evidence that explains what the Hamermesh study found, since that study did control for who the pitcher is (which means they effectively controlled for fastball percentage). However, we *do* have a potential explanation, which is non-random pitch selection.

Normally, I hate when a study is criticized on the grounds of "you didn't control for X". That's a lazy argument, and it's an argument that can be leveled at any study, because, no matter how thorough, there's always *something* that hasn't been controlled for. Also, there's often no reason to believe X is important to control for. And, even if it is, there's no reason to believe that it's non-randomly distributed among the other variables.

In order to be taken seriously when you say "you didn't control for X," you need to come up with (a) an argument that X is actually an important factor, important enough to change the results, and (b) that there is reason to believe X is distributed non-randomly.

That's what I'm trying to do here. First, (a) I think I have proven that pitch type does seriously and significantly affect called strike percentage. Second (b), it's plausible that pitch type may vary *by the conscious choice of the pitcher* over seasons, and perhaps even games.

If I knew for sure that (b) happened -- if we had data that showed that it was common that, for some games a pitcher chooses to throw 70% fastballs, and some games he chooses to throw only 50% fastballs -- that would be enough to prove that the Hamermesh study's confidence intervals were overstated. Since we don't, it's just a possibility.

We don't know *for sure* that pitch types tend to cluster together. But it's a reasonable thing to look at in a future study. Based on the little I've looked at it so far, I suspect that it's a small but important factor.

-------

P.S. Thanks to GuyM for his e-mail discussion, and to Fangraphs' David Appelman for assistance in getting the FB% data I needed.

Labels: , , ,