Do younger brothers steal more bases than older brothers? Part VI
Note: I'm not happy with this post. I've already revised it twice because I found things that were wrong ... as it is now, I think it's right, but it's not very focused and I think some of the emphasis might be on the wrong issues.
So, just a warning that I plan to redo it soon.
Although this is "Part 6" of the discussion of the sibling study, I'm going to try to make it stand alone, so if you're coming to this thread for the first time, read on.
A few months ago, Frank Sulloway and Richie Zweigenhaft published a study on brothers (siblings) in baseball. They came to the conclusion that a younger brother is about 10 times as likely to attempt more stolen bases in his career (adjusted for opportunities) than his older brother. Ten times is a LOT.
After I read the paper, I believed the result was incorrect. My previous posts on the subject explained why. Following that, I had an e-mail conversation with one of the authors. I don't believe either of us was able to convince the other of the rightness of our respective positions.
Two weeks ago, the authors released a second paper, which attempted to clarify the arguments and address some of my points. I remain unconvinced. And, indeed, I think I've been able to come up with a better, more easily understood argument that explains why.
The authors' study comprised approximately 95 sets of siblings. In those 95 pairs,
58 times the younger brother attempted more steals
37 times the older brother attempted more steals.
I think the authors and I would agree on this (although my numbers might be off by one or two because of the way the authors handled cases where there were more than two brothers, like the Alous).
So the younger brothers had a 58-37 record against their younger siblings, which works out to .610. The SD of winning percentage in 95 games is about .051. So .610 is a little more than 2 standard deviations above .500. That's statistically significant at the 5% level.
I believe this is a legitimate finding, and if the authors had left it at that, we'd have no disagreement.
Another thing they did is to express the result as odds instead of a winning percentage. If you divide 58 by 37, you get 1.57. So you can say something like,
"Younger brothers had odds of 1.57 to 1 of beating their older brothers in steal attempts."
That's perfectly accurate. Again, we have no disagreement.
What the authors did next is where we start to disagree. They took the 95 cases, and split them up into groups, based on the order in which the brothers were called up to the major leagues. To keep things simple, I'm going to leave out the case where the brothers were called up the same season, and concentrate on the case where the brothers were called up in different seasons.
That leaves 80 cases, in which the younger brother's record was 47-33. That's an odds ratio of 1.42 (47 divided by 33). But that's not what the authors come up with.
Because, look what happens when you split them up based on who was called up first:
-- When the older brother was called up first, the brother called up first went 32-42 against the other brother.
-- When the younger brother was called up first, the brother called up first went 5-1 against the other brother.
Converting that to odds:
-- When the older brother was called up first, the brother called up first had odds of .76:1 (32:42).
-- When the younger brother was called up first, the brother called up first had odds of 5.00:1 (5:1).
Now, to compare the younger to the older, the authors divide 5.00 by .76 and get an "odds ratio" of 6.56.
As it turns out, the authors an even more extreme result, 10.58 instead of 6.56. Why? Mainly (and I'm simplifying here) because the 5-1 can also be interpreted as 5-0, depending on how you handle families with more than two siblings. 5-0 is an odds ratio of infinity. The authors use a mathematical technique to average the "infinity," the "6.58", and the result for when the siblings get called up the same year ("7.0"), and it works out to 10.58.
Which is where the authors get their statement,
"It may be seen that the common odds ratio is 10.58, as previously reported [in our original paper]."
That sentence, actually, is pretty much correct.
So if the sentence is correct, what's the problem? The problem is the authors' interpretation of what an odds ratio means. Remember that odds ratio of 6.56 above? That's the correct number. But the authors write,
"For brothers called up to the major leagues first, a younger brother was 6.56 times more likely than an older brother to have a higher rate of attempted steals."
That is not true. That is not what odds ratio means.
Let's suppose you have 100 younger brothers called up first, and 100 older brothers called up first. 6.56 times more likely implies that you'll have 6.56 times as many "wins" in the first group as in the second group. But that's not the case. The younger brothers would have a ratio of 5:1, which, for 100 trials, is 83-17. The older brothers would have a ratio of 32:42, which, for 100 trials, is 43-57.
That means that the likelihood of winning rises from 43 (out of 100) to 83. That's 1.93 times more likely, not 6.56 times more likely.
The "1.93" figure is called the "relative risk". Relative risk is not the same thing as odds ratio.
So if "6.56 times more likely" is not the correct interpretation of an odds ratio of 6.56, what IS the correct interpretation? It's this:
An odds ratio of 6.56 means that if you place a $100 bet on the less likely outcome, your potential winnings will be 6.56 times as high as if you bet $100 on the more likely outcome.
Specifically for this case: If you bet $100 on the 5:1 favorite, you'll win $20. If you bet $100 on the 32:42 underdog, you'll win $131.25. And, $131.25 divided by $20 is 6.56.
*That* is what the odds ratio really means. You can decide how intuitively meaningful it is. It probably means more to you if you're a sports bettor than if you're not.
So why is that a problem? Isn't it a real sense of what the 6.56 (or 10.58) figure actually means? Why, then, do I say it's misleading?
Because it exaggerates the scale of the effect. Roughly, it squares it.
Suppose home field advantage is 2:1 -- the home team has twice the chance of winning as the visiting team. That means that, in turn, the visiting team has half the chance of winning, which is an odds ratio of 0.5:1.
If I do what the authors did, and divide the home team odds by the visiting team odds, I get 2 divided by 0.5, which is 4. But I cannot say, "a home team is 4 times more likely to win than a visiting team." That would be wrong: the correct odds are obviously 2:1. What I'm actually saying is, "if I bet $100 on the visiting team, I'll win 4 times as much money as if I bet on the home team."
Now, that's all well and good, but I would argue that the important measure is the 2:1, not the 4:1. We get the "4" by comparing the 2:1 favorite to the 2:1 underdog. In effect, the odds ratio is roughly "squaring the odds". Which makes sense: if you divide X by the reciprocal of X, you get X squared.
If you take the 6.56 odds ratio, and figure the square root, you get 2.56. That, I think is a reasonable guess at what the effect actually is.
Put another way: the 6.56 occurs when you switch the status of *two* players -- you make the young one get called up first, and you make the old one get called up last. How do you split up the effect between the two players? The most obvious way is to "give" them 2.56 each.
Anyway, that's mostly semantics, and it's about the odds ratio, which is not the interesting question.
The interesting question, to me, is, how often will a younger brother have more steal attempts than his older brother, even controlling for callup order? The answer is nothing near 10.58.
Look at it this way:
-- if the younger player gets called up first, the odds are 5:1.
-- if the younger player gets called up last, the odds are 1.42:1.
Doesn't it follow that the younger player's odds have to be somewhere between 1.42 and 5? After all, the younger player is either called up first, or he's called up last. The best case is when he's called up first, and the odds are 5:1 that he'll beat his brother. So the *overall* odds of beating his brother can't be *more* than 5:1, right?
Back to the odds ratio: if the authors agreed with me, and reverted to 3.25 instead of 10.58, would I believe it? Well, no, because of the confidence interval issue.
There were only 5 or 6 pairs of brothers in the "younger player gets called up first" group. They either went 5-0 or 5-1. The "3.25" figure is almost entirely based on that fact. If they had gone, say, 3-3, instead, the odds would work out to something like the 1.57 we got just by counting. (If you recall, the younger brothers went 58-37 overall, without dividing the sample into "first callup" and "last callup".)
Suppose the actual odds for the "younger called up first" were really the same as the "older called up first". Then, we'd have expected a .610 winning percentage.
The chance of a .610 team going 5-0 is 8.4%. The chance of a .610 team going 5-1 is 20%.
So the observed p-value is somewhere between .084 and .2 -- both higher than the .05 required for significance.
The authors don't do any explicit significance testing, but they say their confidence interval for the 10.58 odds ratio is (2.21, 50.73).
Again, suppose the odds for both callup groups were actually 1.57 in favor of the younger brother. Then the odds ratio we'd observe would be 1.57 squared, which is 2.46.
The authors actually found a confidence interval of (2.21, 50.73). They did things a little differently, and used more data, but, overall, I'd say their confidence interval is pretty consistent with what we found above. We found "almost significant but not really", and the authors are close. I'm actually not sure if we did exactly what they did, if our null hypothesis would be inside their confidence interval or not, but it would probably be close either way.
So my conclusions are:
1. A basic look at the overall data show younger players with odds of 1.57:1 to beat their older brothers in career steal attempts.
2. Dividing the data into "called up first" and "called up last" appears to increase the odds to somewhere between 1.42 and 5.00.
3. The authors' odds ratio of 10.58 does not easily translate into anything intuitive about the odds of one brother beating another, except for the difference in the amount you'd win if you bet.
4. The authors' odds ratio of 10.58 is not how most sabermetricians would express the effect. Going by the example of home field advantage, we'd be more likely to go with an odds ratio of 3.25.
5. In any case, the difference between the 3.25 and the 1.57 we would obtain (if there were no "callup first" effect) is not statistically significant.
6. As I have argued in previous posts, the "called up first / called up last" split is not an appropriate control, because it reverses cause and effect. (You can disagree with this point, if you choose, and the overall argument still holds.)
Bottom line: the data show that younger brothers attempt more steals than their older brothers at a statistically significant rate, with odds of 1.57 to 1. Isn't that interesting enough on its own?
Note: this post is substantially revised. First post was 11/16 am. Took that down, reposted 11/16 pm. Revised again 11/17 am.
Vigorous hug (think Cournoyer/Henderson) to Tango for pointing out that the reported odds ratio is actually approximately the square of the true odds ratio. I hadn't realized that was what's going on until he pointed it out.