## Tuesday, November 16, 2010

### Do younger brothers steal more bases than older brothers? Part VI

Note: I'm not happy with this post. I've already revised it twice because I found things that were wrong ... as it is now, I think it's right, but it's not very focused and I think some of the emphasis might be on the wrong issues.

So, just a warning that I plan to redo it soon.

-----------

Although this is "Part 6" of the discussion of the sibling study, I'm going to try to make it stand alone, so if you're coming to this thread for the first time, read on.

A few months ago, Frank Sulloway and Richie Zweigenhaft published a study on brothers (siblings) in baseball. They came to the conclusion that a younger brother is about 10 times as likely to attempt more stolen bases in his career (adjusted for opportunities) than his older brother. Ten times is a LOT.

After I read the paper, I believed the result was incorrect. My previous posts on the subject explained why. Following that, I had an e-mail conversation with one of the authors. I don't believe either of us was able to convince the other of the rightness of our respective positions.

Two weeks ago, the authors released a second paper, which attempted to clarify the arguments and address some of my points. I remain unconvinced. And, indeed, I think I've been able to come up with a better, more easily understood argument that explains why.

-----

The authors' study comprised approximately 95 sets of siblings. In those 95 pairs,

58 times the younger brother attempted more steals
37 times the older brother attempted more steals.

I think the authors and I would agree on this (although my numbers might be off by one or two because of the way the authors handled cases where there were more than two brothers, like the Alous).

So the younger brothers had a 58-37 record against their younger siblings, which works out to .610. The SD of winning percentage in 95 games is about .051. So .610 is a little more than 2 standard deviations above .500. That's statistically significant at the 5% level.

I believe this is a legitimate finding, and if the authors had left it at that, we'd have no disagreement.

Another thing they did is to express the result as odds instead of a winning percentage. If you divide 58 by 37, you get 1.57. So you can say something like,

"Younger brothers had odds of 1.57 to 1 of beating their older brothers in steal attempts."

That's perfectly accurate. Again, we have no disagreement.

-----

What the authors did next is where we start to disagree. They took the 95 cases, and split them up into groups, based on the order in which the brothers were called up to the major leagues. To keep things simple, I'm going to leave out the case where the brothers were called up the same season, and concentrate on the case where the brothers were called up in different seasons.

That leaves 80 cases, in which the younger brother's record was 47-33. That's an odds ratio of 1.42 (47 divided by 33). But that's not what the authors come up with.

Because, look what happens when you split them up based on who was called up first:

-- When the older brother was called up first, the brother called up first went 32-42 against the other brother.

-- When the younger brother was called up first, the brother called up first went 5-1 against the other brother.

Converting that to odds:

-- When the older brother was called up first, the brother called up first had odds of .76:1 (32:42).

-- When the younger brother was called up first, the brother called up first had odds of 5.00:1 (5:1).

Now, to compare the younger to the older, the authors divide 5.00 by .76 and get an "odds ratio" of 6.56.

As it turns out, the authors an even more extreme result, 10.58 instead of 6.56. Why? Mainly (and I'm simplifying here) because the 5-1 can also be interpreted as 5-0, depending on how you handle families with more than two siblings. 5-0 is an odds ratio of infinity. The authors use a mathematical technique to average the "infinity," the "6.58", and the result for when the siblings get called up the same year ("7.0"), and it works out to 10.58.

Which is where the authors get their statement,

"It may be seen that the common odds ratio is 10.58, as previously reported [in our original paper]."

That sentence, actually, is pretty much correct.

-----

So if the sentence is correct, what's the problem? The problem is the authors' interpretation of what an odds ratio means. Remember that odds ratio of 6.56 above? That's the correct number. But the authors write,

"For brothers called up to the major leagues first, a younger brother was 6.56 times more likely than an older brother to have a higher rate of attempted steals."

That is not true. That is not what odds ratio means.

Let's suppose you have 100 younger brothers called up first, and 100 older brothers called up first. 6.56 times more likely implies that you'll have 6.56 times as many "wins" in the first group as in the second group. But that's not the case. The younger brothers would have a ratio of 5:1, which, for 100 trials, is 83-17. The older brothers would have a ratio of 32:42, which, for 100 trials, is 43-57.

That means that the likelihood of winning rises from 43 (out of 100) to 83. That's 1.93 times more likely, not 6.56 times more likely.

The "1.93" figure is called the "relative risk". Relative risk is not the same thing as odds ratio.

So if "6.56 times more likely" is not the correct interpretation of an odds ratio of 6.56, what IS the correct interpretation? It's this:

An odds ratio of 6.56 means that if you place a \$100 bet on the less likely outcome, your potential winnings will be 6.56 times as high as if you bet \$100 on the more likely outcome.

Specifically for this case: If you bet \$100 on the 5:1 favorite, you'll win \$20. If you bet \$100 on the 32:42 underdog, you'll win \$131.25. And, \$131.25 divided by \$20 is 6.56.

*That* is what the odds ratio really means. You can decide how intuitively meaningful it is. It probably means more to you if you're a sports bettor than if you're not.

-----

So why is that a problem? Isn't it a real sense of what the 6.56 (or 10.58) figure actually means? Why, then, do I say it's misleading?

Because it exaggerates the scale of the effect. Roughly, it squares it.

Suppose home field advantage is 2:1 -- the home team has twice the chance of winning as the visiting team. That means that, in turn, the visiting team has half the chance of winning, which is an odds ratio of 0.5:1.

If I do what the authors did, and divide the home team odds by the visiting team odds, I get 2 divided by 0.5, which is 4. But I cannot say, "a home team is 4 times more likely to win than a visiting team." That would be wrong: the correct odds are obviously 2:1. What I'm actually saying is, "if I bet \$100 on the visiting team, I'll win 4 times as much money as if I bet on the home team."

Now, that's all well and good, but I would argue that the important measure is the 2:1, not the 4:1. We get the "4" by comparing the 2:1 favorite to the 2:1 underdog. In effect, the odds ratio is roughly "squaring the odds". Which makes sense: if you divide X by the reciprocal of X, you get X squared.

If you take the 6.56 odds ratio, and figure the square root, you get 2.56. That, I think is a reasonable guess at what the effect actually is.

Put another way: the 6.56 occurs when you switch the status of *two* players -- you make the young one get called up first, and you make the old one get called up last. How do you split up the effect between the two players? The most obvious way is to "give" them 2.56 each.

-----

Anyway, that's mostly semantics, and it's about the odds ratio, which is not the interesting question.

The interesting question, to me, is, how often will a younger brother have more steal attempts than his older brother, even controlling for callup order? The answer is nothing near 10.58.

Look at it this way:

-- if the younger player gets called up first, the odds are 5:1.
-- if the younger player gets called up last, the odds are 1.42:1.

Doesn't it follow that the younger player's odds have to be somewhere between 1.42 and 5? After all, the younger player is either called up first, or he's called up last. The best case is when he's called up first, and the odds are 5:1 that he'll beat his brother. So the *overall* odds of beating his brother can't be *more* than 5:1, right?

-----

Back to the odds ratio: if the authors agreed with me, and reverted to 3.25 instead of 10.58, would I believe it? Well, no, because of the confidence interval issue.

There were only 5 or 6 pairs of brothers in the "younger player gets called up first" group. They either went 5-0 or 5-1. The "3.25" figure is almost entirely based on that fact. If they had gone, say, 3-3, instead, the odds would work out to something like the 1.57 we got just by counting. (If you recall, the younger brothers went 58-37 overall, without dividing the sample into "first callup" and "last callup".)

Suppose the actual odds for the "younger called up first" were really the same as the "older called up first". Then, we'd have expected a .610 winning percentage.

The chance of a .610 team going 5-0 is 8.4%. The chance of a .610 team going 5-1 is 20%.

So the observed p-value is somewhere between .084 and .2 -- both higher than the .05 required for significance.

-----

The authors don't do any explicit significance testing, but they say their confidence interval for the 10.58 odds ratio is (2.21, 50.73).

Again, suppose the odds for both callup groups were actually 1.57 in favor of the younger brother. Then the odds ratio we'd observe would be 1.57 squared, which is 2.46.

The authors actually found a confidence interval of (2.21, 50.73). They did things a little differently, and used more data, but, overall, I'd say their confidence interval is pretty consistent with what we found above. We found "almost significant but not really", and the authors are close. I'm actually not sure if we did exactly what they did, if our null hypothesis would be inside their confidence interval or not, but it would probably be close either way.

-----

So my conclusions are:

1. A basic look at the overall data show younger players with odds of 1.57:1 to beat their older brothers in career steal attempts.

2. Dividing the data into "called up first" and "called up last" appears to increase the odds to somewhere between 1.42 and 5.00.

3. The authors' odds ratio of 10.58 does not easily translate into anything intuitive about the odds of one brother beating another, except for the difference in the amount you'd win if you bet.

4. The authors' odds ratio of 10.58 is not how most sabermetricians would express the effect. Going by the example of home field advantage, we'd be more likely to go with an odds ratio of 3.25.

5. In any case, the difference between the 3.25 and the 1.57 we would obtain (if there were no "callup first" effect) is not statistically significant.

6. As I have argued in previous posts, the "called up first / called up last" split is not an appropriate control, because it reverses cause and effect. (You can disagree with this point, if you choose, and the overall argument still holds.)

----

Bottom line: the data show that younger brothers attempt more steals than their older brothers at a statistically significant rate, with odds of 1.57 to 1. Isn't that interesting enough on its own?

-----

Note: this post is substantially revised. First post was 11/16 am. Took that down, reposted 11/16 pm. Revised again 11/17 am.

Vigorous hug (think Cournoyer/Henderson) to Tango for pointing out that the reported odds ratio is actually approximately the square of the true odds ratio. I hadn't realized that was what's going on until he pointed it out.

Labels: , ,

At Tuesday, November 16, 2010 11:04:00 AM,  skyjo said...

"To compute an odds ratio, you divide successes by failures. When you divide an odds ratio by another odds ratio..."

I would say that successes divided by failures gives you odds (not an odds ratio). Then, one set of odds divided by another set of odds is an odds ratio.

At Tuesday, November 16, 2010 11:17:00 AM,  Phil Birnbaum said...

This comment has been removed by the author.

At Tuesday, November 16, 2010 4:38:00 PM,  Phil Birnbaum said...

Skyjo,

Right, thanks. I've corrected this (I hope) in the revised version of this post. Should be OK now.

At Tuesday, November 16, 2010 7:04:00 PM,  Guy said...

I think the authors' logic (using the term loosely) for using the ratio runs like this: they believe they need to control for year of callup. So if we take only brothers called up first, the younger bro's odds are 5:1 while the older bro's odds are just 32:42. They are dividing the first by the second to determine how large the "younger brother effect" is, while controlling for callup status. Converting back to win%, it's like saying younger bros have a .83 win%, older bros are .43, so the younger bros are .83/.43 = 1.93 times as likely to have more steals.

*

Just in case someone new stops by this discussion, it's worth mentioning that, even leaving aside the misleading odds ratio talk, this is mostly nonsense.

First, there is no reason at all to "control" for year a player is called up. Year of callup has no plausible causal impact on SBAs or success rate. And controlling for year of callup introduces a huge bias, in that younger bros called up first are vastly better players (they arrived in MBL at much younger age).

Second, the metrics here likely overstate younger bros' tendency to steal bases. First, older bros. have longer careers, and SBAs decline sharply with age, so using career rates downwardly biases the older bros' rate. Second, defined SBA opportunities include XBH (even though steals of 3rd are rare), and older bros have proportionately more XBH -- so this metric exaggerates their true SB opportunities relative to the younger bros.

Both other than that, it's one terrific study. :>)

At Wednesday, November 17, 2010 7:59:00 AM,  Phil Birnbaum said...

Guy,

If you're measuring the distance between two things, and the treatment affects both things, the effect you want to talk about is the effect on one of the two things.

Take home field advantage. Maybe team A at home is .540, and team A on the road is .460. The difference is .080. But we say that HFA is 40 points, not 80 points. Why? Because the 80 points is team A minus its opponent team B. A lost 40 points by going on the road, and B gained 40 points by going back home. The 80 points comes from combining the effect of the two teams. The effect *per team* is only 40 points.

Same thing here. If A and B are the brothers, then, to get an odds ratio of 10.58, you need A switching from called up first to called up last, and B switching from called up last to called up first. The *combined* effect is 10.58. The effect on A *or* B separately is the square root of 10.58.

When you use an odds ratio to evaluate a treatment, you usually compare it to a placebo. In that case, you don't have to take the square root, because there's only one actual entity. The authors of this study just followed that convention -- but they actually have a negative and positive treatment, not a placebo and a positive treatment.

You have a battery. When it faces left, the current is -9 volts. When it faces right, the current is +9 volts. The difference is 18 volts. But you have to recognize it's a 9-volt battery, not an 18-volt battery.

You have another battery. When it's not there, the current is zero. When it's there, the current is 18 volts. The difference is still 18 volts. But *now* you can say you have an 18-volt battery.

The second case is so standard, that when the authors had the first case, they used the second case calculation anyway. And so their estimate is the square of what it should be.

Now, you CAN say,

"Because switching the first battery causes a 18-volt swing, I'm going to call it an "18-volt swing battery". Don't misunderstand and quote me that I said it's an 18-volt battery, because I didn't. I just said that it's an "18-volt *swing* battery," that when you invert it, it causes an 18-volt swing. It's not my fault if you misinterpret."

That's what I think is going on. The authors are saying, "if you make the young guy called up first, and you make the old guy called up second, you get a swing of 10.58 versus if you make the young guy called up second, and you make the old guy called up first." That's true. But you have to allocate the 10.58 between the two brothers. In the absence of other evidence, I say you want to treat them equally, which means 3.25 each.

At Wednesday, November 17, 2010 8:30:00 AM,  Phil Birnbaum said...

I've updated the post again, to incorporate my response to Guy, and to fix a couple of other things.

At Wednesday, November 17, 2010 8:52:00 PM,  bradluen said...

Thanks Phil, and thanks to Sulloway and Zweigenhaft for posting the clarification. It's clear that a source of confusion was the distinction between odds and odds ratio. The NYT confused the two, and I know that I did on at least one occasion.

Phil has done a good job of teasing out the odds/odds ratio issue, so let me get back to the possibly more substantial issue of controlling for call-up. Hopefully the following hypothetical example will make things clearer.

Let's assume we have 100 pairs of brothers, with no twins. Suppose that the set of older brothers has the same distribution of talent as the set of younger brothers. Let's also assume that half the time, the older brother has a higher steal attempt rate, while the other half the time, the younger brother has a higher steal attempt rate. In fact, let's make a stronger assumption: that your attempted steal rate is a (possibly random) function of your speed, and that birth order is independent of attempted steal rate. That means that if you control for speed, your chance of being the brother who steals more is the same regardless of whether you're the older brother or the younger brother. We can show this in a probability table. If your ability is S, let the chance that you have the higher steal attempt rate be P(S), where P(S) increases with S. The first column in the table below is for older brothers; the second column is for younger brothers.

Lower steal attempt rate: 1-P(S), 1-P(S)
Higher steal attempt rate: P(S), P(S)

The odds ratio is thus [(1-P(S))/P(S)]/[(1-P(S))/P(S)] = 1.

We know that older brothers will be called up first more frequently than younger brothers. However, talented younger brothers are sometimes called up first. In particular, younger brothers who are unusually fast are more likely to be called up first, as well as more likely to try to steal.

Let's say 90% of the time, the older brother is called up first, and the other 10% of the time, the younger brother is called up first. The following set of frequencies is plausible (first column is older brothers, second column is younger brothers):

(Called 1st & lower steal attempt rate) 41, 1
(Called 1st & higher steal attempt rate) 49, 9
(Called 2nd & lower steal attempt rate) 9, 49
(Called 2nd & Higher steal attempt rate) 1, 41

Let's check this satisfies our assumptions. The older brother column sums to 100, as does the younger brother column. Of the older brothers, 50 have a higher steal attempt rate, while of the younger brothers, 50 have a higher steal attempt rate. Ten of 100 younger brothers are called up first, and the youngsters who are called up first are more likely to have a higher steal attempt rate.

Now, what's the odds ratio? First, let's just look at those who are called up first. The odds ratio is (41/49)/(1/9), which is about 7.5. If you just look at those who are called up second, the odds ratio is also 7.5. If you want to combine these two odds ratios, it would seem that 7.5 would be the logical choice. But this should be 1!

The reverse Simpson's paradox arises because we've used an inappropriate control. If we were able to control for speed, there would have been no problem. Instead we used call-up as a proxy for speed (via ability). In technical terms, this violates Pearl's unfortunately named ``back-door criterion''. In layman's terms: it's true that speed has an effect on call-up order. However, it's also true that birth order has an effect on call-up order. Then unless the association between speed and call-up is *much* stronger than the association between birth order and call-up, any existing bias in our estimation will be amplified, not reduced. And it's hard to believe that speed and call-up are more strongly associated than birth order and call-up.

There are some Important Statistical Issues floating around the edges of this problem. I'd write a paper on this if I had another example (one is an anecdote, two is a trend).