Sabermetric Research: June 2010

Thursday, June 24, 2010

What were the odds of the 70-68 score at Wimbledon?

Two minutes after posting this, I found that Carl Bialik covered this topic much better, and answered some of my questions. You might want to check out his post instead.

---

At Wimbledon, the fifth set of the Nicolas Mahut / John Isner match took 138 games to decide. It was won by Isner, 70-68.

As I understand it, the scoring system for the fifth set at Wimbledon works like this: the first player to get to 6 games wins, except that he has to win by two. That means that the score must have been 6-6 at one point. Then, for every pair of games after that, it must have been that Mahut won one of them and Isner won one of them (in either order), until the last pair, where Isner won them both.

What are the odds of a pair of games being split? The player serving has a strong advantage, and therefore has a much better than 50% chance of winning the game. I'm not sure how much that advantage is (anyone know?), but let's start with a supposition that it's 80%. Since players alternate serves in consecutive games, that means the chance of splitting is 68%. There's a 64% chance that the server wins both games (80% times 80%), and a 4% chance that the non-server wins both games (20% times 20%).

After 6-6, there were 62 such splits, bringing the score to 68-68. The chance of 62 consecutive results with a probability of .64 is just over 1 in a trillion. If you played one set a second, it would take 32,963 years until 68-68 happened.

It seems safe to assume, then, that our estimate of an 80% chance that the server wins is a little too low. Suppose it's 90%. Then the probability of two split games is .82 instead of .68. That brings the chance of 68-68 to only about 1 in 220,000. Better.

What if we go 95%? The probability is now .905 to the 62nd power, which is 1 in 487. That seems much more reasonable -- perhaps even too likely, since 70-68 happens a lot less often than 1 in 487.

Maybe 92%? That's approximately 1 in 19,000. That seems more reasonable to me. Assuming 91% gives about 1 in 66,000. In a chart:

80% -- 1 in 1,000,000,000,000
90% -- 1 in 220,000
91% -- 1 in 66,000
92% -- 1 in 19,000
95% -- 1 in 487

These, by the way, are the maximums that occur when both players are of exactly even skill. If one player is better than the other, or even slightly better, the odds go way down. It's fair to assume, then, that these two players are fairly evenly matched. Of course, when a set is tied after 136 games, you don't need serious probability calculations to guess that.

In the fifth set, Mahut's "receiving points won" was 23% (which means he won 23% of points when his opponent served). Isner's was 17%. Those are indeed lower than in some of their other sets (which ranged from 18 to 41 percent). And a quick look at some of the other matches suggests that 30% is typical (I'm sure some reader knows where to find actual numbers). But this other close match had 28% and 24% overall, which isn't that much bigger (although small changes lead to very large changes in probabilities when you extend them to 124 games).

Anyway, my question to you tennis watchers is this: 70-68 is so unusual that the best guess is that, in this particular set, the server had a much greater advantage than normal. Is that correct? Were either or both players playing a style of tennis that somehow limited their probability of winning receiving points or breaking serve?

Maybe when you get tired after playing so many hours, your ability to return serves drops substantially? But, the match was called at 59-59 on account of darkness, so there were another 9 split pairs when the players resumed this morning, presumably refreshed.

What really happened here?

Labels: streakiness, tennis

Friday, June 18, 2010

Do younger brothers steal more bases than older brothers? Part III

(Note: please read Part I and Part II to understand what I'm talking about here.)

OK, after rereading the Sulloway/Zweigenhaft paper, and digesting some comments from the previous posts, I think I now have a pretty good idea of what the authors did.

In the previous post, I compared all brothers to their siblings. I found that when the older brother was called up first, which was most of the time, the younger brother attempted more steals 54% of the time. When the *younger* brother was called up first, which was seldom, he attempted more steals 87% of the time.

That's because when the younger brother was called up first, he must have been quite a bit better than his older sibling, in order to make the major leagues at a much younger age. So 87% is reasonable for younger brothers who are so good that they get called up even before their older brothers. On the other hand, 54% is normal for the more common case where the older brother gets called up first.

Now, since the authors say explicitly say they ran a regression and controlled for who got called up first, let's do that ourselves.

Simplifying the numbers, let's say

60% outsteal their brothers when called up last and younger.
40% outsteal their brothers when called up first and older.
90% outsteal their brothers when called up first and younger.
10% outsteal their brothers when called up last and older.

Now, let's set that up as a regression. We'll create dummy variables for "called up first" and "older". And so each line of our regression, indpendent variable followed by dependent variables, is:

0.6 ... 0 0
0.4 ... 1 1
0.9 ... 1 0
0.1 ... 0 1

However, the paper used odds ratios instead of probabilities, so let's do that.

A probability of 0.6 corresponds to 0.6 successes to 0.4 failures. That's an odds ratio of (0.6 divided by 0.4), which is 1.5.

Similarly, the odds ratio of 0.4 is 0.667. The odds ratio of 0.9 is 9 (9 successes to 1 failure). And the odds ratio of 0.1 is 0.111.

Now we have

1.50 ... 0 0
0.67 ... 1 1
9.00 ... 1 0
0.11 ... 0 1

We'll do one more thing: The authors say they did a logarithmic tranformation, so we'll take the logarithm of the odds ratios. That makes sense: you'd expect odds to be multiplicative, not additive, and the log of the odds ratio is standard in this kind of regression. So let's do that. Now we have:

+0.405 ... 0 0
-0.405 ... 1 1
+2.197 ... 1 0
-2.197 ... 0 1

Because of the symmetry, we have to tweak the numbers just a little bit to avoid singularity. (I think I changed one of the "405s" to a "407", or something, and a "187" to a "195" or something.)

If we now run that regression, what happens? We get

log of the odds ratio = (1.79 * dummy for called up first) - (2.60 * dummy for being older) + 0.404

And that means that being older subtracts 2.60 from the log of the odds ratio. The antilog of 2.60 equals 13.5. And so, being younger means your odds ratio goes up by 13.5 times!

The authors of the study came up with 10.6 times, but their data was different from mine. I think that what I've done and what the study did are pretty much the same thing.

----

(One quick note: the odds ratio of 13.5 does NOT mean that your chance is 13.5 times higher. It means that your ODDS are 13.5 times higher. Suppose your original odds were 2:1 in your favor, which is a probability of .667. Now, your odds multiply by 13.5 times, which is 27:1. That's a probability of .964. Not 13.5 times the chance -- 13.5 times the odds. Obviously, a .964 shot is not 13.5 times the chance of a .667 shot.

Where the original study and NYT article say "10 times as likely", they really mean "10 times the odds" in this narrow sense. The authors use "X times more likely" throughout the paper, when I think they mean "has X times the odds ratio".)

----

Going back to the 13.5 odds ratio: why is it wrong? Well, it isn't. It's right!

Suppose that you're an older player who got called up first. Your odds of attempting more steals are 0.667 to 1 (40%). Now, suppose you're magically turned into the younger player, holding everything else equal (so you still got called up first). Now your odds of attempting more steals are multiplied by 13.5. 0.667 times 13.5 equals 9. Your odds are 9 to 1, or 90%, just as we assumed at the beginning!

And, suppose that you're the older player who got called up last. Your odds of attempting more steals are 0.111 to 1 (10%). Now, suppose you're magically turned into the younger player, holding everything else equal (so you still got called up last). Now your odds of attempting more steals are multiplied by 13.5. 0.111 times 13.5 equals 1.5. Your odds are 1.5 to 1, or 60%, just as we assumed at the beginning!

It actually works out!

So does that really mean that younger players DO have 13.5 times (or 10.6 times, as the real study found) the odds of outstealing their older brothers? Yes, in the regression, but not in real life.

In real life, whether you get called up first doesn't really affect your steals directly. Being called up first matters because it's a proxy for ability. If you're younger and called up first, you're likely to be a great ballplayer. Medium-skill slow ballplayers might get called up at 23 or 24, but not at 20. Early callups are reserved for the most excellent players, who are also likely to be fast.

So, if you change old to young, and "keep everything constant," you're really NOT keeping everything constant. By keeping "called up first" constant, but changing older to younger, you're actually CHANGING "of roughly average ability" (old player called up first) to "of much better than average ability" (young player called up first). It's a little sleight of hand, using a proxy variable that means different things depending on the other variable. Younger brothers steal at 13.5 times the odds of old players only if, at the same time, they are much, much better players than the old players.

Analogy again: suppose I run this new regression

Amount of money = number of $5 bills in wallet + dummy for whether number of $1000 bills equals number of $5 bills

Suppose you have no $5 bills and the dummy is "yes", meaning you have no $1000 bills either. Your estimated wealth is therefore $0.

But now, if your inventory of $5 bills increases by 1, *holding everything else constant*, the regression will tell you that $5 bill is worth $1005! Because "holding everything else constant" means you "still" have the same number of thousands as of fives, which means you now have another $1000 bill! The "constant" refers only to holding the regression variable constant. It isn't holding the *real-life* variable constant, which is the number of $1000 bills. What's hidden is that in order to hold the regression variable constant, I have to give you $1000.

Same thing happening here. If you hold "called up first" constant, while changing "older brother" to "younger brother", you're not really holding things constant in the real life sense. To hold the regression variable "called up first" constant, you have to give the younger brother a hidden $1000 worth of talent. And that's why the odds ratio turns out so high.

Want another analogy? Suppose you predict whether a person is likely to be diagnosed a dwarf based on their height and age. Suppose you find that if you hold height constant at 4 feet tall, but increase the person's age from 8 to 28, he's now much more likely to be diagnosed a dwarf. Does that mean age causes dwarfism?

----

So, anyway, that's what's going on. I believe their regression does indeed find a coefficient equal to 10.6, as they report, but when you use a bit of common logic, you see that what they found is absolutely consistent with about 50 to 60% of younger brothers outstealing their older brothers -- not 90%.

Labels: baseball, baserunning, psychology, regression, siblings

Wednesday, June 16, 2010

Do younger brothers steal more bases than older brothers? Part II

A couple of weeks ago, I wrote about a study that purported to show huge differences in steal attempts between brothers in major league baseball. According to the New York Times article describing the study,

"For more than 90 percent of sibling pairs who had played in the major leagues throughout baseball’s long recorded history, including Joe and Dom DiMaggio and Cal and Billy Ripken, the younger brother (regardless of overall talent) tried to steal more often than his older brother."

If that's true, that would be huge. It would, I think, be one of the most surprising findings ever, that nine out of ten times the younger brother steals more than the older brother. But I checked, and found nothing close to that. I was left wondering how the authors, Frank J. Sulloway and Richard L. Zweigenhaft, got the results they did.

Since then, I managed to get hold of the actual study. And I still don't know where the 90% figure comes from.

Our raw numbers, it appears, are almost the same. The authors' results are different from mine, but only by a bit; they might have better data, or used different criteria for eliminating pairs from the study.

(Update: strikeouts below are where I misinterpreted some of the numbers in the authors' tables. The authors didn't actually give the numbers I thought they did.)

Let me start with steals. My numbers showed 56% of younger brothers outstealing their older brothers (adjusted for times on base). ~~Their numbers actually showed only 48%. Pretty close.~~ However, that's not the right comparison, because the authors are talking about steal *attempts* -- that is, SB+CS, not just SB. (I missed that the first time.)

Rerunning the numbers for attempts instead of steals, I get that 57 out of 98 younger brothers out-attempted their older brother, or 58%. ~~What did the authors find? 97 out of 185, or 52%.~~

Putting that in table form:

58% me
52% them

So we're on the same page, right? I think so. It sure seems like their comparisons line up with mine.

So, then, why does the article say 90%? ~~Because the authors, after showing 52% in their chart, change their estimate to 90% later.~~

Why? I can't figure it out for sure; I don't completely understand where they're coming from. The logic is flawed somewhere, but the authors don't explain themselves fully, so I can't actually spot where the flaw happens.

Let me show you what the authors give as the broad reason ~~for bumping the 52% up to 90%~~: they say the sample is biased. Why?

"Call-up sequence and its relationship to athletic talent introduces a potential bias in athletic performance by birth order, as older brothers were more likely to be called up first owing to the difference in age. The extent of this effect turns out to be pronounced, with older brothers being 6.6 times more likely than their younger brothers to receive a call first to the major leagues. We are therefore comparing somewhat more talented athletes who were called up first (and who, by virtue of their relative age, tend to be older brothers) with somewhat less talented athletes who were called up second (and who tend to be older brothers). To correct for this bias, we have controlled performance data for call-up sequence in all analyses involving birth order."

(UPDATE: I had wondered why the authors assumed older brothers were better. I now see they ran a regression and found the older brothers had longer careers.)

But why is this bias?

Suppose you wanted to know if younger brothers grew up to be taller than older brothers. You measure a couple of hundred sets of siblings, and you find they grew to exactly the same height, on average. But, wait! At the time the older brother started high school, he's almost always taller than the younger brother! See? We have bias!

Well, of course, we *don't* have bias. Height (and talent) is related to age of the person, not calendar year. You'd expect brothers to be of equal height at equal ages, and of equal major-league experience at equal ages, NOT at equal calendar years. The "entering high school" and "getting called up" are just red herrings.

Regardless, the authors proceed to divide the sample of players into three groups:

-- younger brother called up first
-- younger and older brother called up the same year
-- older player called up first

If you do that, you'll find that the younger brother has a huge advantage in the first two groups. Why? It's obvious: if the a player gets called up before his brother *even though he's younger*, he's probably a much better player. In addition, speed is a talent more suited to younger players. So when it comes to attempted steals, you'd expect younger brothers called up early to decimate their older brothers in the steals department.

I ran my numbers for all three groups, and got:

-- 87% of younger brothers called up first attempted more steals (7/8)

-- 71% of younger brothers called up the same year attempted more steals (5/7)

-- 54% of younger brothers called up last attemped more steals (45/83)

The overall average, of course, is still 58% (57/98). But by splitting the data this way, it makes the effect seem larger. But that's not really telling you anything -- you're selectively sampling based on the results, because the younger players who come up first are precisely those players who you'd expect to steal only for reasons of being called up young.

Another analogy: home teams win 54% of games. But home teams that only use up one out in the ninth inning win 100% of games! That doesn't mean the 54% is biased, nor does it mean that you learn any more about home field advantage by breaking out walk-off wins. It's just another way of looking at the same result.

Anyway, having claimed to correct for a bias that I don't think really exists, they go on to compute odds ratios for the three cases separately, and use something called the "Mantel-Haenszel" technique to combine them. The statistical techniques look OK, and it seems to me that should have given them the same 52% that they found for the one group. But they somehow come up with 90%.

So, again: what's going on?

My only guess is that they somehow decided that the "younger brother called up first group" is the important one, and they just quoted that number. I got 87% for that, and their numbers are a little different, so maybe they got 90%. Another possibility: it might have something to do with the regression they used to correct for a bunch of things, including which "called up first" group the batter belonged to. They don't give the regression equation or the details, but depending on the way they chose dummy variables, I can see the regression looking something like:

Percentage of younger brothers outstealing older brothers equals:
-- 90%
-- minus 20% if the brothers were called up at the same time
-- minus 20% if the older brother was called up first.

So you can see "90%" coming up as a estimate in a regression equation. But still, the authors would certainly have noticed that you can't just quote the 90% figure without adjusting for the other dummies.

----

Here's another way they might have got it wrong. As you see in the quote above, the authors note that the older players turned out to play more games, and they were also likely to get called up first. So maybe they adjusted the numbers in the service of "controlling athletic performance" (the actual words they use).

Suppose that in 1950, older brother A stole 10 bases and younger brother B didn't steal any because he wasn't in the majors yet. If you "adjust" for that by including it in the regression, effectively subtracting 10 for every line of A's career, I can see how the "adjustment" might now show B having a 90% chance to steal more "adjusted" bases than A.

Again, I'm speculating, which I shouldn't. I don't think the authors did this, but I wouldn't be surprised if it's an adjustment something along those lines, just more subtle.

----

Another related stat the study comes up with is that younger brothers are 10.58 times more likely to steal a base than their older brother. I don't know where that one comes from either. Again, the authors' own raw data comes up with something much more realistic. As a ratio of times on base, the authors' SB+CS rates were

5.6% for older brothers
9.3% for younger brothers

That's only 1.66 times more likely.

So, obviously, the authors feel there's some huge bias here that, when you correct for it, brings the 1.66 up to 10.58. I can't imagine what that might be, and I can't really duplicate the authors' regression because they don't even give the equation.

Here are my calculated batting lines for the two groups. Is there really a factor of ten difference here?

------ AB -R --H 2B 3B HR RBI BB SB CS -avg RC/G
Young 553 70 146 26 04 11 062 50 12 05 .265 4.39
-Old- 546 75 148 26 03 17 074 54 08 04 .271 4.98

You'd think that they'd give a simple English explanation of how, when you look at the batting lines of the two groups, you see young players attempting steals at maybe 50% more than the older ones, but how when *they* look at the two batting lines, they see one line stealing bases at almost 1000% more than the older ones. But I couldn't find any such explanation in the paper.

So when they say,

"... younger brothers were 10.6 times more likely than their older brothers to attempt steals"

... well, I don't see any way that can possibly be true.

-----

UPDATE: batting lines updated, they were slightly off ... also updated to reflect the authors' explanation of something I had missed.

-----

ADDENDUM: as you can see, my data show the younger players indeed attempting more steals than the older players. In my sample, it's 46% more; in the authors' sample, it's 66% more.

My 46% seems like a lot ... I wondered there's perhaps a real effect there. So I ran a simulation to check for statistical significance.

For each of my 98 pairs of brothers, I *randomly* decided which one to consider as older, instead of looking at their real ages. Then I computed the relative attempt rate between the two random groups. I repeated that 60,000 times.

The results? 4,126 of the 60,000 random sets had one group attempting at least 46% more steals than the other group. That's a p-value of 0.069 ... not significant at 5%, but close.

I had expected more significance than that, but the players are "lumpy," in that some players steal a lot, and some don't. A few faster players landing in the same group can make a big difference.

-----

ADDENDUM 2: In the comments, I think Guy came up with the answer. Younger brothers have shorter careers. Shorter careers tend to be centered on lower ages (there are 40 year old players, but no 10 year old players). Young players are faster.

That's why younger brothers have more attempted steals -- they play more years during peak stealing age.

I bet if you did it season-by-season and controlled for age, most of the effect would disappear.

Good catch by Guy.

Labels: baseball, baserunning, siblings

Tuesday, June 15, 2010

Huge "choke" effect reported in soccer, part II

Here's Part I.

-----

OK, someone was kind enough to send me a copy of one of the studies (gated link), the one that presents this result:

"... 86.6 percent for the first shooter, 81.7 for the second, 79.3 for the third and so on."

The authors looked at three tournaments: Copa America, the European Championship, and the World Cup. Strangely, or perhaps not so strangely (you guys can tell me what you think), the percentages varied quite a bit for the three tournaments:

82.7% Copa America (133 kicks)
84.6% European Championship (104 kicks)
71.2% World Cup (153 kicks)

The difference between the World Cup and European Championship is 3.8 standard deviations.

(Of the missed kicks, here are the percentage that were saves (as opposed to hitting the post, or shooting wide or high):

78.3% non-goal saves Copa America
57.9% non-goal saves European Championships
70.5% non-goal saves World Cup

But, as the authors point out, those differences aren't significant because of the small numbers of misses.)

So why is the percentage of goals so low in the World Cup? I'm not sure. The authors think it's pressure and psychology. They mention the possibility of better goaltending, but argue that the save percentages were "almost 10 percent higher" in the Copa America than the World Cup. But they weren't: those were the save percentages of non-goals. The true save percentages were

13.5% saves Copa America
08.9% saves European Championships
20.3% saves World Cup

It seems to me that better goaltending could still be part of the reason. If you think goaltending is a bigger factor than kicking -- the same way the starting pitcher is a bigger factor in a baseball game than the opposing hitters -- that theory makes a lot of sense.

-----

Anyway, moving on to the shooters. Here's the raw data for each of the shooting orders:

86.6% shooter 1
81.7% shooter 2
79.3% shooter 3
72.5% shooter 4
80.0% shooter 5
64.3% shooter 6+

(It should be noted that the "shooter 5" is subject to "bottom of the 9th" effects -- the team leading won't take its 5th shot if not necessary, so the fifth kick should be weighted more towards the first-kicking team. Not sure how big a deal that is, but there were only 55 kicks there out of 82 shootouts. The "shooter 4" had 80 kicks, and the "6+" was only 28 kicks.)

My first reaction is still: aren't teams just putting their best shooters first? The authors mention this theory, but then don't come back to it.

What they do instead is run a regression. They include tournament, positional role of the player (forwards should be better at scoring than defenders), and age of the player. They also include a dummy for playing time, in case the substitutes are less tired because they didn't play as much.

What are the results? Well, as you'd expect, young players score more than old players. Forwards score more than midfielders who score more than defenders. And playing time doesn't seem to matter much.

But what about shooting position? After the adjustments, does shooter 1 still score more often than shooter 4?

Shockingly, the authors don't tell us! They list the results for everything in the regression except shooting order. I can't figure out why they would choose to omit that one.

Nonetheless, they write,

" ... tournament and kick order were most strongly related to kick outcome .... there were especially marked differences ... between kicks #6-9 and kick #1 (p=.0002). All kicks except kick #4 were significantly different from kicks #6-9 in the analysis."

So the results appear to be roughly the same as above, even after adjusting for age and position and tournament.

Doesn't that still support the hypothesis that it's just skill? The authors make a good case for that:

" ... skilled players are probably picked for the first kick ... and less-skilled players are picked for the sixth kick and on, because the five most skilled players have already been used for kicks #1-5. Skilled players may have also been picked for kick #5, which would explain why these kicks, contrary to what would be expected from the trend of the other kicks, were more successful than kicks #3 and #4."

Sounds right to me. But, then, the authors make a last-ditch attempt to preserve their "choking under pressure" hypothesis:

"However, these confounding factors cannot explain the decline in success between kicks #2, #3, and #4. Thus, the hypothesis about the influence of kick importance is still plausible."

Er, *why* can't skill explain the decline from #2 to #4? And if skill is not a factor, and players choke under pressure, why is #5 so high?

This seems to be a case where the study couldn't be much more compatible with the obvious explanation, but the authors support the less-plausible hypothesis anyway.

-----

They also say:

"... coaches may be interested to know that forwards have a tendency to score more goals than defenders ... and younger players often score more than older players."

If the authors think coaches don't know this, why do they think some players are assigned to be forwards and some not? And why do they think there are so few 45-year-old professional soccer players?

-----

Anyway, we now have a better idea of what might be causing the other result. From the NYT article:

"Kick takers in a shootout score at a rate of 92 percent when the score is tied and a goal ensures their side an immediate win. But when they need to score to tie the shootout, with a miss meaning defeat, the success rate drops to 60 percent."

The two situations are probably about equally likely if you're taking your fifth shot: my guess is that a 4-4 is about as likely as a 5-4 there. But if you're on your sixth shot, it's much more likely that it's 1-0 and not 0-0 (because there's a 64% chance that the other team scored on their sudden death shot).

So, when you're tied, it's more likely that you're on your fifth shot, where your chance of scoring is high. When you're behind by one, it's a little less likely, which means it's a little more likely that you're on your sixth shot, where your chance of scoring is low (because you're using a lesser shooter).

And that probably explains at least part of the "92% to 60%" result -- it's better players, not just "stress". We'd need the other study to know for sure.

Labels: clutch, psychology, soccer

Sunday, June 13, 2010

Huge "choke" effect reported in soccer

According to an academic quoted in this NYT article, there's a huge "choke" effect in soccer penalty kicks.

Gier Jordet, a professor at the Norwegian School of Sport Sciences in Oslo, reports that, when the score is tied, penalty kick shooters succeed at a 90% rate. But when the shooter's team is behind by a goal, and presumably there's more pressure, he succeeds only 60% of the time.

Wow. That's some serious choking. The effect is so large I can barely believe it.

Another effect Jordet found is that, when the game is decided by penalty kicks in a "shootout" after a drawn match, shooting percentages drop with each successive attempt:

"... 86.6 percent for the first shooter, 81.7 for the second, 79.3 for the third and so on.

“It demonstrates so clearly the power of psychology,” [Jordet] said."

That's difficult to explain too, although I suppose it could be that the team puts the best shooter out first, then the next best shooter, and so on. That would neatly account for the decline.

Still, these are both very surprising results. As always, anyone with access to the studies the article is based on, if you wanted to send along a copy, I'd appreciate it.

(Hat tip: Elizabeth)

UPDATE: Part II is here.

Labels: clutch, psychology, soccer

Friday, June 04, 2010

Payroll and wins and correlation

Yesterday, Stacey Brook posted about payroll and wins. Brook ran a regression and found that, as of June 3, about a third of the way through the 2010 baseball season, the correlation between team salaries and team wins was .224. He writes,

" ...that the two variables move together just over 22%; or there is about 78% not moving together. That does not seem to me a great deal of support for the hypothesis that as team's spend more on payroll, it results in higher team winning percent (or better quality teams)."

What he's saying, paraphrased, is: ".22 is a low number. Therefore, the relationship between payroll and wins is low. QED."

But that's simplistic and wrong. You know what's an even lower number? .00142. Much lower, right? More than 100 times lower. Really small number. Turns out that's Yao Ming's height, in miles. Boy, Yao must be a pretty short little man!

On the other hand, here's a big number: 2,290,000,000. That's Yao's height in nanometers. Huge number! Yao must be really tall!

Well, which is it? Is Yao short or tall?

Obviously, the number alone isn't enough -- you need the units. Brook is simply saying "0.22 is low" without figuring the units, and that's where the problem is. (If anyone reading this thinks otherwise, I invite you to offer me 0.22 Megadollars for my car.)

What are the units of the correlation coefficient of .22? Well, Brook is right when he says it measures how the two variables move together. It means that for every 1 SD that salary moves, you'll see winning percentage moving by 0.22 SDs. Just like "one knot" is a rate of distance divided by time, the correlation coefficient 0.22 is a measure of the SD of winning percentage divided by the SD of money. So we should be able to convert that to wins per dollar. There are 5280 feet in a mile. How many wins per dollar are there in one "correlation coefficient"?

One SD of salary this year is about $38 million, or about $12 million in the 55 games so far. One SD of winning percentage so far this season is .095, or about 5 wins in the 55 games so far.

So one correlation coefficient = (5 wins /12 million dollars) = .42 wins / million dollars = 4.2 * 10^-7 win/dollar.

So 0.22 correlation coefficients equals 0.22 times that, which is 9.2 * 10^-8 win/dollar.

THAT is the number that Brook should be checking to see if it's big or small. Which is it? Well, if you take the inverse, it's about $11 million dollars per win. $11 million is the number Brook should be looking at, not .22.

At the margin, an extra $11 million in spending buys you an extra win. That's the number that the regression is telling us. That's exactly what the 0.22 means, when you figure out what units it's denominated in.

----

Over at "The Book" blog, Tom Tango criticizes Brook on the same grounds: that the correlation coefficient can be made as high or as low as you like just by using a larger or smaller sample. Again, it's a matter of units. If you use (say) only ten games, you get a very small correlation number, but a large unit of variance. If you use many seasons' worth of games, you get a higher correlation coefficient, but a small unit of variance. It's .001 kilometers vs. .999 meters. The numbers are extreme, but the units make up for it, and the end result is almost exactly the same.

It makes sense that the result should be the same -- after all, if one thing causes another thing at a certain rate, it should cause it at a certain rate no matter the size of the sample. If smoking causes cancer, smoking causes cancer. If payroll causes wins over 10 seasons, then payroll causes wins over 55 games. It doesn't matter that the correlation coefficient over 10 seasons is big, and the correlation coefficient over 55 games is small. They are not comparable without computing the units. Once you put in the units, they'll tell you exactly the same story, subject, of course, to the fact that the 55-game sample will have more random variation.

----

Another problem is that Brook dismisses the idea that money has been buying wins because the results of his regression are statistically insignificant:

"In other words in statistical terms payroll has zero effect on winning percentage at this point in the season."

That's just not right, for two reasons.

First: Suppose I claim that I can have an ability called "sensory perception," which other people call "eyesight". You toss a coin, and I will be able to tell you whether it landed heads or tails -- just by looking at it! You don't believe me. So you toss a coin. I look at it and tell you, "heads." You do it again, and I look and say "tails". Then you do it a third time, and after looking I say "tails" again.

I've called it correctly three times in a row. You run a statistical test on it, and find that the chance of me getting three in a row is 0.125 -- a lot higher than the threshold of 0.05 that you need for statistical significance.

And so you say, "in statistical terms you looking at the coin has no effect on whether you are able to guess it right."

Well, that's not a fair argument about me being wrong about having eyesight. Because, after all, I did exactly what I said I could do. If that's not enough evidence for you, that's a fault of your own experiment. You could have made me call ten tosses, or 20 tosses, or 100 tosses, and then you certainly would have had enough evidence! The fact that three tosses isn't enough to convince you is an issue with your experiment, not with real life.

It's true that your weak experiment doesn't show statistical significance for my ability to call coins. But it doesn't show statistical significance against my hypothesis of *always* being able to call coins. So the results are as consistent with my hypothesis as they are with your hypothesis -- even more so, in fact. So why are you rejecting my hypothesis but not yours?

If the results of your experiment are consistent with both your hypothesis (money doesn't buy wins) and your critics' hypothesis (money *does* by wins, at the rate of several million dollars each), you haven't proven anything.

Second, Brook contradicts himself. First, he claims that " in statistical terms payroll has zero effect on winning percentage." But then, he claims that's false:

"Over time (in other words adding more seasons) we do find a statistically significant relationship ... "

So what's the point of trumpeting the new experiment? It doesn't contradict what we already know -- it actually confirms it. If a big experiment finds a statistically significant relationship between salary and wins, and a small experiment finds approximately the same relationship, but without enough data to be statistically significantly different from zero ... then why would you argue that the small one contradicts the big one? It doesn't -- it's exactly what you would expect as confirmation!

In fairness, I think what Brook is doing is again just looking at a single number and giving a gut reaction. For this regression, he looks at the significance level, sees it's not significant, and realizes that, if not for the other study, he would be allowed to conclude that the relationship between salary and wins was zero. If he can "almost" conclude that the relationship is zero, then at the very least it must be small, right?

Well, no, not right. There's a big difference between the size of the effect, and the size of the evidence. Suppose I claim there's an elephant in the room, and show you a picture, but you choose to dismiss my claim on grounds of insufficient evidence. That doesn't give you the right to conclude, "therefore, if there IS an elephant in the room, he's probably a very small one."

Labels: baseball, payroll, regression, statistics, The Wages of Wins

Sabermetric Research

Thursday, June 24, 2010

What were the odds of the 70-68 score at Wimbledon?

Friday, June 18, 2010

Do younger brothers steal more bases than older brothers? Part III

Wednesday, June 16, 2010

Do younger brothers steal more bases than older brothers? Part II

Tuesday, June 15, 2010

Huge "choke" effect reported in soccer, part II

Sunday, June 13, 2010

Huge "choke" effect reported in soccer

Friday, June 04, 2010

Payroll and wins and correlation

About Me

My stuff

Hardcore Sabermetric Research Links

Other Sports Research Links

Medium Core Sabermetric/Baseball Links (more to come)

More Baseball Stuff

Blogroll

Previous Posts

Archives