### On babies, batting averages, and weighted means

Here's a famous old math problem you've probably seen many times before:

A king decides that his country has too many men and not enough women. So he issues a decree: once a couple has a boy, they're not allowed to have any more babies.

The king reasons as follows: no family will have more than one boy. But some families will have two girls, or three girls, or even six girls before they have a boy. So there will wind up being a lot more girls than boys.

Is the king's reasoning correct?

The answer: the king's reasoning is not correct. There will still be 50 percent boys, and 50 percent girls. There are many ways to figure this out. I'd refer you to a solution on the internet, but I can't seem to find the problem. (Can anyone supply a link?)

Now: last week, Steven Landsburg posted a variation on the question, as follows:

There's a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What [is the expected value of the] fraction of the population [that] is female?

At first, this seems like the same question. But it's not. As Landsburg explains, the answer to this one is *not* 50 percent.

But even though the answer isn't 50 percent, it *seems* like it should be -- so much so that commenters are perplexed. One reader, a physics professor, seems certain that Landsburg is wrong, and that the answer is indeed 50 percent. Landsburg has challenged him to a $15,000 bet.

Landburg is indeed correct. I'm going to try explaining why, with a baseball analogy. But before I do, think about it a bit, and maybe read Landsburg's posts, to try to figure it out for yourself. It took me a while to get my head around it.

-----

OK, here we go.

Suppose that in a given season, the overall MLB batting average is .250. What is the average player's batting average?

The answer is NOT .250. But this time it's a lot easier to figure out why.

If you check, you will find that the average major leaguer hits less than the league average .250.

Why? Because when you average individual players, you give them equal weight. But when you figure out the composite MLB average, you weight by the number of at-bats. And good players have more AB than bad players. Therefore, the overall average is inflated by the fact that you weight the good players more heavily, and will wind up being higher than the average of the individual players weighted equally.

This is easier to see with an example. Suppose there are two players in the league. Player A goes 100 for 399, for a batting average of .251. Player B, who is a pitcher, goes 0 for 1. The average player went .125 (the average of .251 and .000). But, overall, the league hit .250 (100 for 400).

Weighted by AB, the league hit .250. Weighted by player, the league hit .125.

Got it for baseball? Now, let's apply it to Landsburg's problem. Because births are random, and we want the average, imagine repeating the birth simulation for a million different countries.

In baseball, players with more AB have a higher proportion of hits. In Landsburg's example, countries with more births have a higher proportion of girls. That's obvious, isn't it? If there are 100 families in the country, there'll always be 100 boys at the end. The only difference, then, must be the number of girls. Countries with more babies, then, *must* have more girls, and therefore a higher proportion of girls.

So we have exactly the same situation for countries as for batters.

-- Country A might have 100 boys and 91 girls, which means a .476 "girling average".

-- Country B might have 100 boys and 109 girls, which means a .522 "girling average".

Overall, there are 200 boys, and 200 girls, for a composite average of .500. However, the average *country* has an average of only .499 (the average of .476 and .522).

If you weight the average by *babies*, you get .500, as expected. But if you weight it by *countries*, you get .499.

Landsburg's question requires you to weight the average by country, and that's why the answer is less than .500.

-----

We'll see if he gets any takers for his bet. I'm guessing he doesn't.

Labels: babies, baseball, Landsburg, statistics