Sunday, September 14, 2014

Income inequality and the Fed report

The New York Yankees are struggling. Why don't they sign Reggie Jackson? Sure, he's 68 years old, but he'd still be a productive hitter if the Yankees signed him today.

Why do I say that? Because if you look at the data, you'll see that players' production doesn't decline over time. In 1974, the Oakland A's hit .247. In 2013, they hit .254. Their hitting was just as good -- actually, better -- even thirty-nine years later!

So how can you argue that players don't age gracefully?

------

It's obvious what's wrong with that argument: the 2013 Oakland A's aren't the same players as the 1974 Oakland A's. The team got better, but the individual players got worse -- much, much worse. Comparing the two teams doesn't tell us anything at all about aging.

The problem is ridiculously easy to see here. But it's less obvious in most articles I've seen that discuss trends in income inequality, even though it's *exactly the same flaw*.

Recently, the US Federal Reserve ("The Fed") published their regular report on the country's income distribution (.pdf). Here's a New York Times article reporting on it, which says, 


"For the most affluent 10 percent of American families, average incomes rose by 10 percent from 2010 to 2013."

Well, that's not right. The Fed didn't actually study how family income changed over time. Instead, they looked at one random sample of families in 2010, and a *different* random sample of families in 2013.  

The confusion stems from how they gave the two groups the same name. Instead of "Oakland A's," they called them "Top 10 Percent". But those are different families in the two groups.

Take the top decile both years, and call it the "Washington R's." What the Fed report says is that the 2013 R's hit for an average 10 points higher than the 2010 R's. But that does NOT mean that the average 2010 R family gained 10 points. In fact, it's theoretically possible that the 2010 R's all got poorer, just like the 1974 Oakland A's all got worse. 

In one sense, the effect is stronger in the Fed survey than in MLB. If you're a .320 hitter who drops to .260 while playing for the A's, Billy Beane might still keep you on the team. But if you're a member of the 2010 R's, but wind up earning only an middle-class wage in 2013, the Fed *must* demote you to the minor-league M's, because you're not allowed to stay on the R's unless you're still top 10 percent. 

The Fed showed that the Rs, as a team, had a higher income in 2013 than 2010. The individual Rs? They might have improved, or they might have declined. There's no way of knowing from this data alone.

-----

So that quote from the New York Times is not justified. In fact, if even one family dropped out of the top decile from 2010 to 2013, you can prove, mathematically, that the statement must be false.

That has nothing to do with any other assumptions about wealth or inequality in general. It's true regardless, as a mathematical fact. 

Could it just be bad wording on the part of the Fed and the Times, that they understand this but just said it wrong? I don't think so. It sure seems like the Times writer believes the numbers apply to individuals. For instance, he also wrote, 


"There is growing evidence that inequality may be weighing on economic growth by keeping money disproportionately in the hands of those who already have so much they are less inclined to spend it."

The phrase "already have so much" implies the author thinks they're the same people, doesn't it? Change the context a bit. "Lottery winners picked up 10 percent higher jackpots in 2013 than 2010, keeping winnings disproportionately in the hands of those who already won so much."  

That would be an absurd thing to say for someone who realizes that the jackpot winners of 2013 are not necessarily the same people as the jackpot winners of 2010.

Anyway, I shouldn't fault the Times writer too much ... he's just accepting the incorrect statements he found in the Fed paper. 

And I don't think any of the misstatements are deliberate. I suspect that the Fed writers were sometimes careless in their phrasing, and sometimes genuinely thought that "team" declines/increases implied family declines/increases. 

Still, some of the statements, in both places, are clearly not justified by the data and should not have made it into print.

------

I've read articles in the past that made a similar point, that individuals and families might be improving significantly, even though the data appears to give the impression that their group is falling behind. 

It's not hard to think of an example of how that might be possible. 

Imagine that everyone gets richer every year. During the boom, immigration grows the population by 25 percent every year, and the new arrivals all start at $10 per hour.

What happens? 

(a) the lowest bottom 20 percent of every year earn the same amount; but 
(b) everyone gets richer every year

That is: *everyone* is better off *every year*, even though the data may make it falsely appear that the poor are stagnating.

(Note: the words "rich" and "poor" are defined as "high wealth" and "low wealth," but in this post, I'm also going to [mis]use them to mean "high income" and "low income."  It should be obvious from the context which one I mean.)

-------

Now, even if you agree with everything I've said so far, you could still have other reasons to be concerned about the Fed report. For me, the me, the most important fact is the discovery that 2013's poor (bottom quintile) have 8 percent less income than 2010's poor. 

You can't conclude that any particular family dropped, but you *can* conclude that, even if they're different people, the bottom families of 2013 are worse off than the bottom families of 2010. That's real, and that's something you could certainly be concerned about. 

But, many people, like the New York Times writer, aren't just concerned about the poorer families -- they worry about how "income inequality" compares them to the richer ones. They're uncomfortable with the growing distance between top and bottom, even in good times where the "rising tide" lifts everyone's income. For them, even if every individual is made better off, it's the inequality that bothers them, not the absolute levels of income, or even now fast overall income is growing. If the "Washington R's" gain 20 percent, but the "Oakland P's" gain only 5 percent ... for them, that's something to correct.

They might say something like,


"It's nice that the overall pie is growing, and it's nice that the "P's" are getting more money than they used to. But, still, every year, it seems like the high-income "team" is getting bigger increases than the low-income "team". There must be something wrong with a system where, years ago, the top-to-bottom ratio used to be 5-to-1, but now it's 10-to-1 or 15-to-1 or higher."

"Clearly, the rich are getting richer faster than the poor are getting richer. There must be something wrong with a system that benefits the rich so much while the poor don't keep up."

Rebutting that argument is the main point of this post. Here's what I'm going to try to convince you:

Even when the rich/poor ratio increases over time, that does NOT necessarily imply that the rich are getting more benefit than the poor. 

That is: *even if inequality is a bad thing*, it could still be that the changes in the income distribution have benefited the poor more than the rich.

I can even go further: even if ALL the benefits of increased income go to the poor, it's STILL possible for the rich/poor inequality gap to grow. The government could freeze the income of every worker in the top half, and increase the income of every worker in the bottom half. And even after that, the rich/poor income gap might still be *higher*.

-------

It seems that can't be possible. If everyone's income grows at the same rate, the ratio has to stay the same, right? If rich to poor is $200K / $20K one year, and rich and poor both double equally, you get $400K / $40K, and the ratio of 10:1 doesn't change. Mathematically, R/P has to equal xR/xP.

So if benefits that are equal keep the ratio equal, benefits that favor the poor have to change the ratio in favor of the poor. No? 

No, not necessarily. For instance:

Suppose that in 2017, the ratio between rich and poor is 1.25. In 2018, the ratio between rich and poor is 1.60. Pundits say, "this is because the system only benefited the rich!"

But it could be that the pundits have it 100% backwards, and the system actually only favored the poor. 

How? Here's one way. 

There are two groups, with equal numbers of people in each group. In 2017, everyone in the bottom group made $40K, and everyone in the top group made $50K. That's how the ratio between rich group and poor group was 1.25.

The government instituted a program to help the poor, the bottom group. Within a year, the income of the poor doubled, from $40K to $80K, while the top group stagnated at $50K. 

So, in 2018, the richest half of the population earned $80K, and the poorest half earned $50K. That's how inequality increased, from 1.25 to 1.60, only from helping the poor!

------

What happened? How did our intuition go wrong? For the same reason as before: we didn't immediately realize that the groups were different people in different years. The 2017 rich aren't the same as the 2018 rich.

When the pundits argued "the system only benefited the rich," whom did they mean? The "old" 2017 rich, or the "new" 2018 rich? Without specifying, the statement is ambiguous. So ambiguous, in fact, that it almost has no meaning.

What really happened is that the system benefited the old poor, who happen to be the new rich. It failed to benefit the old rich, which happen to be new poor.

Inequality increased from 1.25 to 1.60, but it's meaningless to say the increase benefited the "rich". Which rich? Obviously, it didn't benefit the "old rich."

But, isn't it true to say that the increase benefited the new rich? 

It's true, but it doesn't tell us much -- it's true by definition! In retrospect, ANY change will have benefited the "new rich" more than the "new poor."  If you used to be relatively poor, but now you're relatively rich, you must have benefited more than average. So when you say increasing inequality favors the "new rich," you're really saying "increasing inequality favors those who benefited the most from increasing inequality."  

These examples sound absurd, but they're exact illustrations of what's happening:

-- You have a program to help disadvantaged students go to medical school. Ten years later, you follow up, and they're all earning six-figure incomes as doctors. "Damn!" you say. "It turns out that in retrospect, we only helped the rich!"

-- Or, you do a study of people who won the lottery jackpot last year, and find that most of them are rich, in the top 5%. "Damn!" you say. "Lotteries are just a subsidy for the rich!"

-- Or, you do a study of people who were treated for cancer 10 years ago, and you find most of them are healthy. "Damn!" you say. We wasted cancer treatments on healthy patients!

It makes no sense at all to regret a sequence of events on the grounds that, in retrospect, it helped the people with better outcomes more than it helped the people with worse outcomes. Because, that's EVERY sequence of events!

If you want to complain that increasing inequality is disproportionately benefiting well-off people, that can make sense only if you mean it's those who were well off *before* the increase. But the Fed data doesn't give you any way of knowing whether that's true. It might be happening; it might not be happening. But the Fed data can't prove it either way.

----

Here's an example that's a little more realistic.

Suppose that in 2010, there are five income quintiles, where people earn $20K, $40K, $60K, $80K, and $100K, respectively. I'll call them "Poor," "Lower Class," "Middle Class," "Upper Class," and "Rich", for short. We'll measure inequality by the R/P ratio, which is 5 (100 divided by 20).

Using three representative people in each group, here's what the distribution looks like:


2010 group, 2010 income
------------------------
P    L    M    U    R
------------------------
20   40   60   80   100
20   40   60   80   100
20   40   60   80   100
------------------------
R/P ratio: 5


From 2010 to 2013, people's incomes change, for the usual reasons -- school, life events, luck, shocks to the economy, whatever. In each group, it turns out that one-third of people make double what they did before, one third experience no change, and one third see their incomes drop in half. 

Overall, that means incomes have grown by 16.7%: the average of +100%, 0%, and -50%. Workers have 1/6 more income, overall. But the change gets spread unevenly, since life is unpredictable.

Here are the 2013 incomes, but still based on the 2010 grouping. The top row are the people who dropped, the middle row are the status quo, and the bottom row are the ones who doubled.


2010 group, 2010 income
------------------------
P    L    M     U     R
------------------------
10   20   30    40    50
20   40   60    80   100
40   80  120   160   200
------------------------
R/P ratio: 5


You can easily calculate that every 2010 group got, on average, the same 16.7% increase. So, since life treated the groups equally, the 2010 rich/2010 poor ratio is still 5. In chart form:


2010 group, % change 2010-2013
------------------------------
 P     L     M     U     R  
------------------------------
+17%  +17%  +17%  +17%  +17%


But the Fed doesn't have any of those numbers, because it doesn't know which 2010 group the 2013 earners fell into. It just takes the 2013 data, and mixes it into brand new groups based on 2013 income:


2013 group, 2013 income
-------------------------
P    L     M     U     R  
-------------------------
10   30    40    80   120
20   40    50    80   160
20   40    60   100   200
-------------------------
R/P ratio: 9.6


What does the Fed find? Much more inequality in 2013 than in 2010. The ratio between rich and poor is 9.6 -- almost double what it was! 

The Fed method will also see that the bottom two groups are earning less than the corresponding group earned three years previous.  Only the top two groups, the "upper class" and "rich," are higher. Here are the changes between each new group and the corresponding old group:


Perceived change 2010-2013
--------------------------
 P    L    M    U    R  
--------------------------
-13% -8%  0%  +29%  +60%


If you don't think about what's going on, you might be alarmed. You might conclude that none of the economy's growth benefited the lowest 60 percent at all -- that all the benefits accrued to the well off! 

But, that's not right: as we saw, the benefits accrued equally. And, as we saw, the "R" group ALWAYS has to be high, by definition, since it's selectively comprised of those who benefited the most!

In effect, comparing the 2010 sample to the 2013 sample is a subtle "cheat," creating an illusion that can be used (perhaps unwittingly) to falsely exaggerate the differences. When the poor improve their lot, the method moves them to another group, and winds up ignoring that they benefited. 

For instance, when a $30K earner moves to $90K, a $90K earner moves to $120K, and a $120K earner drops to $30K, the Fed method makes it look like they all benefited equally, at zero. In reality, the "poor" gained and the "rich" declined -- the $30K earner grew 200%, the $90K earner grew 33%, and the $120K earner dropped by -75%. 

No matter how you choose the numbers, as long as there is any movement between groups, the method will invariably overestimate how much the "rich" benefited, and underestimate how much the "poor" benefited. It never works the other way.

--------

One last example.

This time, let's institute a policy that does something special for the disadvantaged groups, to try to make society more equal. For everyone in the P and L group in 2010, we institute a program that will double their eventual 2013 income. Starting with the same 20/40/60/80/100 distribution for 2010, here's what we see after the 2013 doubling:


2010 group, 2013 income
-----------------------
P     L    M    U    R  
-----------------------
20    40   30   40   50
40    80   60   80  100
80   160  120  160  200
-----------------------
R/P ratio: 2.5


Based on the 2010 classes, we've cut the rich/poor ratio in half! But, as usual, the Fed doesn't know the 2010 classes, so they sort the data this way:


2013 group, 2013 income
-----------------------
P    L    M    U     R  
-----------------------
20    40  60    80  160
30    40  80   100  160
40    50  80   120  200
-----------------------
R/P ratio: 5.8


Inequality has jumped from 5.0 to 5.8. That's even after we made a very, very serious attempt to lower it, doubling the incomes of the previous poorest 40 percent of the population!

-------

There's an easy, obvious mathematical explanation of why this happens.

When you look at income inequality, you're basically looking at the variance of the income distribution. But, changes from year-to-year are not equal, so they have their own built-in variance.

If the changes in income are independent of where you started -- that is, if the system treats rich and poor equally, in terms of unpredictability -- then

var(next year) = var(this year) + var(changes)

Which means, as long as rich and poor are equal in how their incomes change, inequality HAS TO INCREASE. 

Take 100 people, start them with perfect equality, $1 each. 

Every day, they roll a pair of dice. They multiply their money by the amount of the roll, then divide by 7. 

Obviously, on Day 2, equality disappears: some people will have $12/7, while others will have only $2/7. The third day, they'll be even more unequal. The fourth day, even more so. Eventually, some of them will be filthy, filthy rich, having more money than exists on the planet, while others will have trillionths of a dollar, or less.

That's just the arithmetic of variation. Increasing inequality is what happens naturally, not just in incomes, but in everything -- everything where things change independently of each other and independently over time. 

What if you want to fight nature, and keep inequality from growing? You have to arrange for year-to-year changes to benefit the poor more than the rich. That effect has to be large -- as we saw earlier, doubling the income of the 40 poorest percent wasn't enough. (It was a contrived example, but, still, it sure *seemed* like it should have been enough!)

-----

How much do you have to tilt the playing field in favor of the poor? Thinking out loud, scrawling equations ... I didn't double-check, so try this yourself because I may have screwed up ... but here's what I got:

Without independence, 

var(next year) = var(this year) + var(changes) + 2 cov(this year, changes)

Solving on the back of my envelope ... if I've done it right, using logarithm of income and some rough assumptions ... I get that the correlation between this year's income and the change to next year's income has to be around -0.25.

My scrawls say that if you're in the top 2.5% of income, your next-year change has to be in the bottom 30%. And if you're in the bottom 2.5%, your next-year change has to be in the top 30%. 

That seems really tough to do. In a typical year that the economy grows normally, what percentage of incomes in the Fed survey would be lower than last year's? If it's 30 percent, then ... to keep inequality constant, just ONE of the things you need to do is make sure high-income people, on average, never earn more this year than last year.

You'd almost have to repeal compound interest!

------

I don't mean to imply that increasing inequality is *completely* just the result of normal variation. There are lots of other factors. Progressive taxation creates a small effect on equality. Increased savings while the economy grows contributes to inequality. A growing population means that inequality increases where bestselling authors have a larger market. And so on. 

But the point is: because increasing inequality happens naturally, you can't conclude anything just from *the fact that there's an increase*. At the very least, you have to back out the natural effects if you want to really explain what's going on. You have to do some math, and some arguing. 

The argument, "Inequality is growing -- therefore, we must be unfairly favoring the rich" is not a valid one. It is true that inequality is growing. And it *might* be true that we are unfairly favoring the rich. But, the one doesn't necessarily follow from the other. 

It's like saying, "Philadelphia was warmer in June than April; therefore, global warming must be happening."

------

Again, I'm not trying to argue that inequality is a good thing, or that you shouldn't be concerned about it. Rather, I'm arguing that increasing inequality does NOT tell you anything reliable about who benefits from the "system" or how much (if at all) the increase favors the rich over the poor.

I am arguing that, even if you think increasing inequality is a bad thing, the following are still, objectively, true:

-- increasing inequality is a natural mathematical consequence of variation;
-- it is not necessarily the result of any deliberate government policy;
-- it does not necessarily disproportionately favor the rich or hurt the poor;
-- there is no way to know which individuals it favors just from the Fed data;
-- the natural forces that cause inequality to increase are very strong;
-- natural inequality growth may be so strong that it will persist even after successful attempts to benefit the poor generously and significantly;
-- the poor could be gaining relative to the rich even while measured inequality increases.

As for the Fed study itself,

-- the Fed statistics do not measure income changes for any family or specific group of families;
-- the Fed statistics that measure distributional income changes for percentile groups are a biased, exaggerated estimate of the income changes for the average family starting in that percentile;
-- It is impossible to tell, from the Fed's numbers, how the poor are faring relative to the rich.

Finally, and most importantly,

-- all of these statements follow necessarily from basic logic and math -- and do not require any other arguments from politics, economics, compassion, greed, fairness, or partisanship.





Labels:

Saturday, August 30, 2014

Is MLB team payroll less important than it used to be?

As of August 26, about 130 games into the 2014 MLB season, the correlation between team payroll and wins is very low. So low, in fact, that *alphabetical order* predicts the standings better than salaries!

Credit that discovery to Brian MacPherson, writing for the Providence Journal. MacPherson calculated the payroll correlation to be +0.20, and alphabetical correlation to be +0.24. 

When I tried it, I got .2277 vs. .2346 -- closer, but alphabetical still wins. (I might be using slightly different payroll numbers, I used winning percentage instead of raw win totals, and I may have done mine a day or two later.)

The alphabetical regression is cute, but it's the payroll one that raises the important questions. Why is it so low, at .20 or .23? When Berri/Schmidt/Brook did it in "The Wages of Wins," they got around .40.

It turns out that the season correlation has trended over time, and MacPherson draws a nice graph of that, for 2000-2014. (I'll steal it for this post, but link it to the original article.)  Payroll became more important in the middle of last decade, but then dropped quickly, so that 2012, 2013, and 2014 are the lowest of all 15 years in the chart:






What's going on? Why has the correlation dropped so much?

MacPherson argues it's because it's getting harder and harder to buy wins. There is an "inability of rich teams to leverage their financial resources."  The end of the steroids era  means there are fewer productive free-agent players in their 30s for teams to buy. And the pool of available signings is reduced even further, because smaller-market teams can better afford to hang on to their young stars.


"Having money to spend remains better than not having money to spend. That might not ever change. Unfortunately for the Red Sox and their brethren, however, it matters far less than it once did."


------

My thoughts:

1.  The observed 2014 correlation is artificially low, because it's taken after only about 130 games (late-August), instead of a full season. 

Between now and October, you'd expect the SD due to luck to drop by about 12 percent. So, instead of 2 parts salary to 8 parts luck (for the current correlation of .20), you'll have 2 parts salary to 7.2 parts luck. That will raise the correlation to about .22.

Well, maybe not quite. The non-salary part isn't all binomial luck; there's some other things there too, like the distribution of over- and underpriced talent. But I think .22 is still a reasonable projection.

It's a small thing, but it does explain a tenth of the discrepancy.

------

2.  The lower correlation doesn't necessarily mean that it's harder to buy wins. As MacPherson notes, It could just mean that teams are choosing not to do so. More specifically, that teams are closer in spending than they used to be, so payroll doesn't explain wins as well as it used to.

Here's an analogy I used before: in Rotisserie League Baseball, there is a $260 salary cap. If everyone spends between $255 and $260, the correlation between salary and performance will be almost zero -- the $5 isn't enough of a signal amidst the noise. But: if you let half the teams spend $520 instead, you're going to get a much higher correlation, because the high-spending half will do much, much better than the lower-spending half.

That could explain what's happening here.

In 2006, the SD of payroll was around 42% of the mean ($32MM, $78MM). In 2014, it was only 38% ($43MM, $115MM). It doesn't look that much different, but ... teams this year are 10 percent closer to each other than they were, that has to be contributing to the difference.

(This is the first time I've done something where "coefficient of variation" (the SD divided by the mean) helped me, here as a way to correct SDs for inflation.

Also, this is a rare (for me) case where the correlation (or r-squared) is actually more relevant than the coefficient of the regression equation. That's because we're debating how much salary explains what we've actually observed -- instead of the usual question of much salary leads to how many more wins.)


------

3.  While doing these calculations, I noticed something unusual. The 2014 standings are much tighter than normal. 

So far in 2014, the SD of team winning percentage is .058 (9.4 games per 162). In 2006, the SD was larger, at .075 (12.2 games per 162). That might be a bit high ... I think .068 (11 games per 162) is the recent historical average.

But even 9.4 compared to 11 is a big difference.  It's even more significant when you remember that the 2014 figure is based on only 130 games. (I'd bet the historical average for late-August would be between 12 and 13 games, not 11.)

What's going on? 

Well, it could be random luck. But, it could be real. It could be that team talent "inequality" has narrowed -- either because of the narrowing of team spending (which we noted), or because all the extra spending isn't buying much talent these days.

I think the surrounding evidence shows that it's more likely to be random luck. 

Last year, the SD of team winning percentage was at normal levels -- .074 (12.04 games per 162). It's virtually impossible for the true payroll/wins relationship to have changed so drastically in the off-season, considering the vast majority of payrolls and players stay the same from year to year.

Also, it turns out that even though the correlation between 2014 payroll and 2014 wins is low, the correlation between 2014 payroll and 2013 wins is higher. That is: this year's payroll predicts last year's wins (0.37) better than it predicts this year's wins (0.23)! 

Are there other explanations than 2014 being randomly weird? 

Maybe the low-payroll teams have young players who improved since last year, and the high-payroll teams have old players who declined. You could test that: you could check if payroll correlates better to last year's wins than this year's for all seasons, not just 2013-2014.

If that happened to be true, though, it would partially contradict MacPherson's hypothesis, wouldn't it? It would say that the money teams spend on contracts *do* buy wins as strongly as before, but those wins are front-loaded relative to payroll.

We can see how weird 2014 really is if we back out the luck variance to get an estimate of the talent variance.

After the first 130 games of 2014, the observed SD of winning percentage is .058. After 130 games, the theoretical SD of winning percentage due to luck is .044.

Since luck is generally independent of talent, we know

SD(observed)^2 - SD(luck)^2 = SD(talent)^2 

Plugging in the numbers: .058 squared minus .044 squared equals .038 squared. That gives us an estimate of SD(talent) of .038, or 6.12 games per 162.

I did the same calculation for 2013, and got 10.2.

2013: Talent SD of 10.2 games
2014: Talent SD of  6.1 games

That kind of drop in one off-season pretty much impossible, isn't it? 

If that kind huge a compression were real, it would have to be due to huge changes in the off-season -- specifically, a lot of good players retiring, or moving from good teams to bad teams.

But, the team correlation between 2013 wins and 2014 wins is +0.37. That's a bit lower than average, but not out of line (again, especially taking the short season into account). 

It would be very, very coincidental if the good teams got that much worse while the bad teams got that much better, but the *order* of the standings didn't change any more than normal.

So, I think a reasonable conclusion is that it's just random noise that compressed the standings. This year, for no reason, the the good teams have tended to be unlucky while the bad teams have tended to be lucky. And that narrowed the distance between the high-payroll teams and the low-payroll teams, which is part of the reason the payroll/wins correlation is so low. 

------

4. We can just look at the randomness directly, since the regression software gives us confidence intervals. 

Actually, it only gives an interval for the coefficient, but that's good enough. I added 2 SDs to the observed value, and then worked backwards to figure out what the correlation would be in that case. It came out to 0.60. 

That's huge!  The confidence interval actually encompasses every season on the graph, even though 2014 is the lowest of all of them.

To confirm the 0.60 number, I used this online calculator. If the true correlation for the 30 teams is 0.4, the 95% confidence interval goes up to 0.66, and down to 0.05. That's close to my calculation for the high end, and easily captures the observed value of 0.23 in its low end. 

That's not to say that I think they really ARE all the same, that the differences are just random -- I've never been a big fan of throwing away differences just because they don't meet significance thresholds. I'm just trying to show how easy it is that it *could be* random noise.

I can try to rephrase the confidence interval argument visually. Here's the actual plot for the 2014 teams:




The correlation coefficient is a rough visual measure of how closely the dots adhere to the green regression line. In this case, not that great; it's more a cloud than a line. That's why the correlation is only 0.23.

Now, take a look at the teams between $77 million and $113 million, the ones in the second rectangle from the left.

There are eighteen teams in that group bunched into that small horizontal space, a payroll range of only $46 million in spending. Even at the historically high correlations we saw last decade, and even if the entire difference was due to discretionary free-agent spending, the true talent difference in that range would be only about 3 or 4 games in the standings. That would be much smaller than the effects of random chance, which would be around 12 games between luckiest and unluckiest. 

What that means is:  no matter what happens, that second vertical block is dominated by randomness, and so the dots in that rectangle are pretty much assured of looking like a random cloud, centered around .500. (In fact, for this particular case, the correlation for that second block is almost perfectly random, at -.002.)

So those 18 teams don't help much. How much the overall curve looks like a straight line is going to depend almost completely on the remaining 12 points, the high-spending and low-spending teams. In our case, the two low-spending teams are somewhat worse than the cloud, and the ten high-spending teams are somewhat better than the cloud, so we get our positive correlation of +0.23. 

But, you can see, those two bad teams aren't *that* bad. In fact, the Marlins, despite the second-lowest payroll in MLB, are playing .496 ball.

What if we move the Marlins down to .400? If you imagine taking that one dot, and moving it close to the bottom of the graph, you'll immediately see that the dots would get a bit more linear. (The line would get steeper, too, but steepness represents the regression coefficient, not the correlation, so never mind.)  I made that one change, and the correlation went all the way up to 0.3. 

Let's now take the second-highest-payroll Yankees, and move them from their disappointing  .523 to match the highest-payroll Dodgers, at .564. Again, you can see the graph will get more linear. That brings the correlation up to 0.34 -- almost exactly the average season, after mentally adjusting it a bit higher for 162 games.

Of course, the Marlins *aren't* at .400, and the Yankees *aren't* at .564, so the lower correlation of 0.23 actually stands. But my point is not to argue that it should actually be higher -- my point is that it only takes a bit of randomness to do the trick. 

All I did was move the Marlins down by less than 2 SDs worth of luck, and the Yankees by less than 1 SD worth of luck. And that was enough to bump the correlation from historically low, to historically average.

------

5. Finally: suppose the change isn't just random luck, that there's actually something real going on. What could it be?

-- Maybe money doesn't matter as much any more because low-spending teams are getting more of their value from arbs and slaves. They could be doing that so well that the high-spending teams are forced to spend more on free agents just to catch up. It wouldn't be too hard to check that empirically, just by looking at rosters.

-- It could be that, as MacPherson believes, there are fewer productive free agents to be bought. You couuld check that easily, too: just count how many free agents there are on team rosters now, as compared to, say, 2005. If MacPherson is correct, that careers are ending after fewer years of free agency, that should show up pretty easily.

-- Maybe teams just aren't as smart as they used to be about paying for free agents. Maybe their talent evaluation isn't that great, and they're getting less value for their money. Again, you could check that, by looking at free-agent WAR, or expected WAR, and comparing it to contract value.

-- Maybe teams don't vary as much as they used to, in terms of how many free-agent wins they buy. I shouldn't say "maybe" -- as we saw, the SD of payroll, adjusted for inflation, is indeed lower in 2014 than it was in 2006, by about 10 percent. So that would almost certainly be part of the answer. 

-- More specifically: maybe the (otherwise) bad teams *more* likely to buy free agents than before, and the (otherwise) good teams are *less* likely to buy free agents than before. That actually should be expected, if teams are rational. With more teams qualifying for the post-season, there's less point making yourself into a 98-win team when a 93-win team will probably be good enough. And, even an average team has a shot at a wild card, if they get lucky, so why not spend a few bucks to raise your talent from 79 games to (say) 83 games, like maybe the Blue Jays did last year?

-----

I'll give you my gut feeling, but, first a disclaimer: I haven't really thought a whole lot about this, and some of these argument occurred to me as I wrote. So, keep in mind that I'm really just thinking out loud.

On that basis, my best guess is ... that most of the correlation drop is just random noise. 

I'd bet that money buys free agents just as reliably as always, and at the usual price. The correlation is down not because spending buys fewer wins, but because more equal spending makes it harder for the regression to notice the differences.

But I'm thinking that part of the drop might really be the changing patterns of team spending, as MacPherson described. I wonder if that knot of 18 mid-range teams, clustered in such a small payroll range, might be a permanent phenomenon, resulting from more small-market teams moving up the payroll chart after deciding their sweet spot should be a little more extravagant than in the past. 

Because, these days, it doesn't take much to almost guarantee a team a reasonable shot at a wildcard spot -- which means, meaningful games later in the season than before, which means more revenue. 

In fact, that's one area where it's not zero-sum among teams. If most of the fan fulfillment comes from being in the race and having hope, any team can enter the fray without detracting much from the others. What's more exciting for fans -- being four games out of a wildcard spot alone, or being four games out of a wildcard spot along with three other teams? It's probably about the same, right? 

Which makes me now think, the price of a free agent win could indeed change. By how much? It depends on how increased demand from the small market teams compares to decreased demand from the bigger-spending teams.

------

Anyway, bottom line: if I had to guess the reasons for the lower correlation:

-- 80% randomness
-- 20% spending patterns

But you can get better estimates with some research, by checking all those things I mentioned, and any others you might think of.





Hat Tip: Craig Calcaterra


Labels: , , ,

Tuesday, August 26, 2014

Sabermetrics vs. second-hand knowledge

Does the earth revolve around the sun, or does the sun revolve around the earth?

The earth revolves around the sun, of course. I know that, and you know that.

But do we really? 

If you know the earth revolves around the sun, you should be able to prove it, or at least show evidence for it. Confronted by a skeptic, what would you argue?  I'd be at a loss. Honestly, I can't think of a single observable fact that I could use to make a case.

I say that I "know" the earth orbits the sun, but what I really mean by that is, certain people told me that's how it is, and I believe them. 

Not all knowledge is like that. I truly *do* know that the sun rises in the east, because I've seen it every day. If a skeptic claimed otherwise, it would be easy to show evidence: I'd make sure he shared my definition of "east," and then I'd wake him up at 6 am and take him outside.

But that sun/earth thing?  I can only I only say I "know" it because I believe that astronomers *truly* know it, from direct evidence.

------

It occurred to me that almost all of our "knowledge" of scientific theories comes from that kind of hearsay. I couldn't give you evidence that atoms consist, roughly, of electrons orbiting a nucleus. I couldn't prove that every action has an equal and opposite reaction. There's no way I could come close to figuring out why and how e=mc^2, or that something called "insulin" exists and is produced by the pancreas. And I couldn't give you one bit of scientific evidence for why evolution is correct and not creationism. 

That doesn't stop us from believing, really, really strongly, that we DO know these things. We go and take a couple of undergraduate courses in, say, geology, and we write down what the professors tell us, and we repeat them on exams, and we solve mathematical problems based on formulas and principles we are told are true. And we get our credits, and we say we're "knowledgeable" in geology. 

But it's a different kind of knowledge. It's not knowledge that we have by our own experience or understanding. It's knowledge that we have by our own experience of how to evaluate what we're told -- how and when to believe other people. We extrapolate from our social knowledge. We believe that there are indeed people, "geologists," who have firsthand evidence. We believe that evidence gets disseminated among those geologists, who interact to reliably determine which hypotheses are supported and which ones are not. We believe that, in general, the experts are keeping enough of a watchful eye on what gets put in textbooks and taught at universities, that if Geology 101 was teaching us falsehoods, they'd get exposed in a hurry.

In other words, we believe that the system of scientists and professors and Ph.D.s and provosts and deans and journals and textbook publishers is a reliable separator of truth from falsehood. We believe that, if the earth really were only 6,000 years old, that's what scientists would be telling us.

------

Most of the time, it doesn't matter that our knowledge is secondhand. We don't need to be able to prove that swallowing arsenic is fatal; we just need to know not to do it. And, we can marvel at Einstein's discovery that matter and energy are the same thing, even if we can't explain why.

But it's still kind of unsatisfying. 

That's one of the reasons I like math. With math, you don't have to take anyone's word for anything. You start with a few axioms, and then it's all straight logic. You don't need geology labs and test tubes and chemicals. You don't need drills and excavators. You don't actually have to believe anyone on indirect evidence. You can prove everything for yourself.

The supply of primes is infinite. No matter how large a prime you find, there will always be one larger. That's a fact. If you like, you can look it up on the internet, or ask your math teacher, or find it in a textbook. It's a fact, like the earth revolving around the sun.

If you do it that way, you know it, but you don't really KNOW it. You can't defend it. In a sense, you're believing it on faith. 

On the other hand, you can look at a proof. Euclid's proof that there is no largest prime number is considered one of the most elegant in mathematics. The versions I found on the internet use a lot of math notation, so I'll paraphrase.

-----

Suppose you have a really big prime number, X. The question is: is there always a prime bigger than X?  

Try this: take all the numbers from 1 to X, and multiply them together: 1 times 2 times 3 .... times X. Now, add 1. Call that really huge number N. That huge N is either prime, or is the product of some number of primes. 

But N can't be divisible by X, or anything less than X, because that division has to always leave a remainder of 1. Therefore: either N is prime, or, when you factor N into other primes, they're all bigger than X. 

Either way, there is a prime bigger than X.

------

I may not have explained that very well. But, if you get it ... now you know that there is no highest prime. If you read it in a book, you "know" it, but if you understand the proof, you KNOW it, in the sense that you can explain it and prove it to others.

In fact ... if you read it in a textbook, and someone tells you the textbook is wrong, you may have some doubt. But once you see the proof, you will *never* have doubt (except in your own logic). Even if the greatest mathematician in the world tells you there's a largest prime, you still know he's wrong. 

-----

In theory, everything in math is like that, provable from axioms. In practice ... not so much. The proofs get complicated pretty quickly. (When Andrew Wiles solved Fermat's Last Theorem in 1993, his proof was 200 pages long.)  Still, there are significant mathematical results where we can all say we know from our own efforts. For years, I wondered why it was that multiplication goes both ways -- why 8 x 7 has to equal 7 x 8. Then it hit me -- if you draw eight rows of seven dots, and turn it sideways, you get seven rows of eight dots.

There are other fields like math that way ... you and I can know things on our own, fairly easily, in economics, and finance, and computer science. Other sciences, like physics and chemistry, take more time and equipment. I can probably prove to myself, with a stopwatch and ruler, that gravitational acceleration on earth is 9.8 m/s/s, but there's no way I could find evidence of what it is on the moon. 

But: sabermetrics. What started me on all this is realizing that the stuff we know about sabermetrics is more like infinite primes than like the earth revolving around the sun. Active researchers don't just know sabermetrics because Bill James and Pete Palmer told us. We know because we actually see how to replicate their work, and we see, all the way back to first principles, where everything came from. 

I can't defend "e equals mc squared," but I can defend Linear Weights. It's not that hard, and all I need is play-by-play data and a simple argument. Same with Runs Created: I can pull out publicly-available data and show that it's roughly unbiased and reasonably accurate. (I can even go further ... I can take partial derivatives of Runs Created and show that the values of the individual events are roughly in line with Linear Weights.)

DIPS?  No problem, I know what the evidence is, there, and I can generate it myself. On-base percentage more important than batting average?  Geez, you don't even need data for that, but you can still do it formally if you need to without too much difficulty. 

For my own part -- and, again, many of you active analysts reading this would be able to say the same thing --  I don't think I could come up with a single major result in sabermetrics that I couldn't prove, from scratch, if I had to. Even the ones from advanced data, or proprietary data, I'm confident I could reproduce if you gave me the database.

For all the established principles that are based on, say, Retrosheet-level data ... honestly, I can't think of a single thing in sabermetrics that I "know" where I would need to rely on other people to tell me it's true. That might change: if something significant comes out of some new technique -- neural nets, "soft" sabermetrics, biomechanics -- I might have to start "knowing" things secondhand. But for now, I can't think of anything.

If you come to me and say, "I have geological proof that the earth is only 6,000 years old," I'm just going to shrug and say, "whatever."  But if you come to me and say, "I have proof that a single is worth only 1/3 of a triple" ... well, in that case, I can meet you head on and prove that you're wrong. 

I don't really know that creationism isn't right -- I only know what others have told me. But I *do* know firsthand what a triple is worth, just as I *do* know firsthand that there is no highest prime. 

------

And that, I think, is why I love sabermetrics so much -- it's the only chance I've ever had to actually be a scientist, to truly know things directly, from evidence rather than authority.

I have a degree in statistics, but if nuclear war wiped out all the statistics books, how much of that science could I restore from my own mind?  Maybe, a first-year probability course, at best. I could describe the Central Limit Theorem in general terms, but I have no idea how to prove it ... one of the most fundamental results in statistics, one they teach you in your first statistics class, and I still only know it from hearsay.

But if nuclear war wipes out all the sabermetrics books ... as long as someone finds me a copy of the Retrosheet database, I can probably reestablish everything. Nowhere near as eloquently as Bill James and Palmer/Thorn, and I'd probably wouldn't think of certain methods that Tango/MGL/Dolphin did, but ... yeah, I'm pretty sure I could restore almost all of it. 

To me, that's a big deal. It's the difference between knowing something, and only knowing that other people know it. Not to put down the benefits of getting knowledge from others -- after all, that's where most of our useful education comes from. It's just that, for me, knowing stuff on my own ... it's much more fulfilling, a completely different state of mind. As good as it may be to get the Ten Commandments from Moses, it's even better to get them directly from God.



Labels: , ,

Tuesday, August 12, 2014

More r-squared analogies

OK, so I've come up with yet another analogy for the difference between the regression equation coefficient and the r-squared.

The coefficient is the *actual signal* -- the answer to the question you're asking. The r-squared is the *strength of the signal* relative to the noise for an individual datapoint.

Suppose you want to find the relationship between how many five-dollar bills someone has, and how much money those bills are worth. If you do the regression, you'll find:

Coefficient = 5.00 (signal)
r-squared = 1.00 (strength of signal)
1 minus r-squared = 0.00 (strength of noise)
Signal-to-noise ratio = infinite (1.00 / 0.00)

The signal is: a five-dollar bill is worth $5.00. How strong is the signal?  Perfectly strong --  the r-squared is 1.00, the highest it can be.  (In fact, the signal to noise ratio is infinite, because there's no noise at all.)

Now, change the example a little bit. Suppose a lottery ticket gives you a one-in-a-million chance of winning five million dollars. Then, the expected value of each ticket is $5.  (Of course, most tickets win nothing, but the *average* is $5.)

You want to find out the relationship between how many tickets someone has, and how much money those tickets will win. With a sufficiently large sample size, the regression will give you something like:

Coefficient = 5.00 (signal)
r-squared = 0.0001 (strength of signal)
1 minus r-squared = 0.9999 (strength of noise)
Signal-to-noise ratio = 0.0001 (0.0001 / 0.9999)

The average value of a ticket is the same as a five-dollar bill: $5.00. But the *noise* around $5.00 is very, very large, so the r-squared is small. For any given ticketholder, the distribution of his winnings is going to be pretty wide.

In this case, the signal-to-noise ratio is something like 0.0001 divided by 0.9999, or 1:10,000. There's a lot of noise in with the signal.  If you hold 10 lottery tickets, your expected winnings are $50. But, there's so much noise, that you shouldn't count on the result necessarily being close to $50. The noise could turn it into $0, or $5,000,000.

On the other hand, if you own 10 five-dollar bills, then you *should* count on the $50, because it's all signal and no noise.

It's not a perfect analogy, but it's a good way to get a gut feel. In fact, you can simplify it a bit and make it even easier:

-- the coefficient is the signal.
-- the r-squared is the signal-to-noise ratio.

You can even think of it this way, maybe:

-- the coefficient is the "mean" effect.
-- the (1 - r-squared) is the "variance" (or SD) of the effect.

Five-dollar bills have a mean value of $5, and variance of zero. Five-dollar lottery tickets have a mean value of $5, but a very large variance.  

------

So, keeping in mind these analogies, you can see that this is wrong: 

"The r-squared between lottery tickets and winnings is very close to zero, which means that lottery tickets have very little value."

It's wrong because the r-squared doesn't tell you the actual value of a ticket (mean). It just tells you the noise (variance) around the realized value for an individual ticket-holder. To really see the value of a ticket, you have to look at the coefficient.  

From the r-squared alone, however, you *can* say this:

"The r-squared between lottery tickets and winnings is very close to zero, which means that it's hard to predict what your lottery tickets are going to be worth just based on how many you have."

You can conclude "hard to predict" based on the r-squared. But if you want to conclude "little value on average," you have to look at the coefficient.  

------

In the last post, I linked to a Business Week study that found an r-squared of 0.01 between CEO pay and performance. Because the 0.01 is a small number, the authors concluded that there's no connection, and CEOs aren't paid by performance.

That's the same problem as the lottery tickets.

If you want to see if CEOs who get paid more do better, you need to know the size of the effect. That is: you want to know the signal, not the *strength* of the signal, and not the signal-to-noise ratio. You want the coefficient, not the r-squared.

And, in that study, the signal was surprisingly high -- around 4, by my lower estimate. That is: for every $1 in additional salary, the CEO created an extra $4 for the shareholders. That's the number the magazine needs in order to answer its question.

The low r-squared just shows that the noise is high. The *expected value* is $4, but, for a particular case, it could be far from $4, in either direction.  I haven't checked, but I bet that some companies with relatively low-paid executives might create $100 per dollar, and some companies who pay their CEOs double or triple the average might nonetheless wind up losing value, or even going bankrupt.

------

Now that I think about it, maybe a "lottery ticket" analogy would be good too: 


Think of every effect as a combination of lottery tickets and cash money.

-- The regression coefficient tells you the total value of the tickets and money combined.

-- The r-squared tells you what proportion of that total value is in money.  

That one works well for me.

------

Anyway, the idea is not that these analogies are completely correct, but that they make it easier to interpret the results, and to spot errors of interpretation. When Business Week says, "the r-squared is 0.01, so there is no relationship," you can instantly respond:

"... All that r-squared tells you is, whatever the relationship actually turns out to be, the signal-to-noise ratio is 1:99. But, so what? Maybe it's still an important signal, even if it's drowned out by noise. Tell us what the coefficient is, so we can evaluate the signal on its own!"

Or, when someone says, "the r-squared between team payroll and wins is only .18, which means that money doesn't buy wins," you can respond:

"... All that r-squared tells you is, whatever the relationship actually turns out to be, 82 percent of it comes in the form of lottery tickets, and only 18 percent comes in cash. But those tickets might still be valuable! Tell us what the coefficient is, so we can see that value, and we can figure out if spending money on better players is actually worth it."

------

Does either one of those work for you?  




(You can find more of my old stuff on r-squared by clicking here.)


Labels: , ,