Wednesday, May 26, 2010

Do younger brothers steal more bases than older brothers?

Alan Schwarz's latest "Keeping Score" column, which appeared in last Sunday's New York Times, quotes an academic study that found startling sibling effects among baseball players.

A hypothesis in psychology is that younger siblings exhibit riskier behavior than older ones, "perhaps originally to fight for food, now for parental attention." If that's the case, you'd expect younger brothers to attempt more stolen bases (baseball's equivalent of risky behavior) than older ones.

In an just-published academic study, psychologists Frank J. Sulloway and Richard L. Zweigenhaft checked that, and found that evidence supporting their hypothesis to a very significant degree: a full 90 percent of younger brothers outstole their older siblings!

That's astonishing, to find an effect that large. Since the study isn't available online, I tried to reproduce the study. I didn't get 90% -- I got 56%. Which makes a lot more intuitive sense.

Here's what I did. I went to this Baseball Almanac page, which lists all brother combinations in history. I downloaded their list and fixed the spellings as best I could. Then, I eliminated

(a) all sets of twins;
(b) all sets of brothers where either or both was born before 1895 (Babe Ruth's birth year);
(c) all sets of brothers where one or both was primarily a pitcher;
(d) all sets of brothers who had identical SB rates (always both zero, I think).

That left 114 sets of batting brothers. I then computed their rate of SB per (1B + BB), to see which brother tended to steal more bases. (The original study used H+BB+HBP instead of 1B+BB, but I don't think that would affect the results much.)

64 of the 114 younger brothers outstole their older siblings. Since random chance would be 57, I don't think there's an effect there. It's 1.3 SD above expected.

I have no idea if I did something wrong, or if the authors of the study did something wrong. I'm betting it's them, just because 90% is kind of outrageous.

My data, in not particularly easy to read format, is here.


UPDATE: The authors' study appears to have been a regression where:

" ... several other factors were considered, like age differences, body size and even the order in which the players were promoted to the majors."

Still, it seems unlikely that those factors would raise the rate from 56% to 90%.


UPDATE: Here are the career batting lines for both groups, divided by 1000:

------ AB -R --H 2B 3B HR RBI BB SB -avg RC/G
Young 444 58 118 20 05 06 051 36 10 .267 4.22
-Old- 538 75 145 24 05 11 068 48 12 .270 4.55

And per 600 PA (differences in rate stats due to rounding):

------ AB -R --H 2B 3B HR RBI BB SB -avg RC/G
Young 554 73 148 25 06 08 064 46 13 .267 4.24
-Old- 550 76 149 25 06 11 070 50 12 .271 4.62

So the older brothers were bit better than the younger brothers, although the younger ones stole bases at a slightly higher rate.

Labels: , ,

Friday, May 21, 2010

Properly interpreting data is more than just "numeracy"

Here's a recent post from Freakonomist Stephen Dubner on numeracy. Dubner quotes John Allen Paulos on a specific, interesting example on how numbers can mislead:

"Consider the temptation to use the five-year survival rate as the primary measure of a treatment for a particular disease. This seems quite reasonable, and yet it’s possible for the five-year survival rate for a disease in one region to be 100 percent and in a second region to be 0 percent, even if the latter region has an equally effective and cheaper approach.

"This is an extreme and hypothetical situation, but it has real-world analogues. Suppose that whenever people contract the disease, they always get it in their mid-60s and live to the age of 75. In the first region, an early screening program detects such people in their 60s. Because these people live to age 75, the five-year survival rate is 100 percent. People in the second region are not screened and thus do not receive their diagnoses until symptoms develop in their early 70s, but they, too, die at 75, so their five-year survival rate is 0 percent. The laissez-faire approach thus yields the same results as the universal screening program, yet if five-year survival were the criterion for effectiveness, universal screening would be deemed the best practice."

Dubner argues that, in order to better spot these kinds of situations, we need a public that's more numerate.

But, I think, numeracy isn't really what's needed here: it's not really mathematics, but just common logic. Really, you can rewrite Paulos's situation involving almost no numbers at all (except numbers of years, which everyone understands):

"Suppose that if you're diagnosed with a certain disease in one region, you always die within five years. But if you're diagnosed with the same disease in another region, you *never* die in five years. Does that mean that treatment is better in the second region? No, not necessarily. It could be that the disease takes ten years to kill you, and no treatment helps. In the "bad" region, it's not diagnosed until year eight. In the "good" region, they have universal screening, and the disease is always diagnosed in year one. So even if the universal screening does no good at all, it *looks* like it does."

All but the most innumerate person should be able to understand that, right? It's not really a question of mathematics: it's a question of *logic*. Sure, the more used to numbers you are, the more likely you are to think of this possibility -- at least to a certain extent. But, in this particular case, having an open mind and being able to think clearly are more important than mathematical ability.

And it needs a certain amount of creativity, cleverness, interest and skepticism. And data. Even a Ph.D. in math wouldn't necessarily be able to figure out what's going on, because you need the right details. Suppose a newspaper editorial calling for pre-screening simply said,

"In region A, which prescreens for the disease, five-year survival rates after diagnosis are 100%. In region B, which does not, the survival rate is zero. This evidence strongly suggest that we need to encourage prescreening."

If you'd never seen this example before, would you immediately realize the possibility that the pre-screening was no use at all? I probably wouldn't, and I regularly read academic studies with a skeptical eye. How would you expect a guy skimming the morning paper on his way to work to think of it?

Sure, I might think of it if I had seen other studies that had shown pre-screening to be ineffective. Then, I'd think, geez, how did this study come up with such a contradictory result? And maybe I'd then figure it out. But I'd have to care enough to want to think about it skeptically.

And even if I'd figured out that was a logical possibility, consistent with the numbers ... a possibility doesn't mean it's actually happening, right? I might correctly think, "yeah, it could be just a question of when the disease is diagnosed ... but is it really likely that that's *all* of it? It's more likely that *some* of the effect is early diagnosis, and the rest of the effect is more time to treat after diagnosis."

To know for sure, it's not enough to be numerate, or reasonable, or logical. You need more data. But the data might not exist. If you're lucky, the issue arises from a study somewhere, and you can read that study and figure out what's going on. But if it's just raw data, and all that's being recorded is the five-year survival rate, all you can do is note that both hypotheses fit the data: that prescreening always helps, and that it never helps.

Dubner recommendations:

"Footnotes, I guess, and transparency, and a generally higher level of numeracy among the populace."

All of which are good things. But "footnotes" assume that the authors of the study realize what might be going on. From what I've seen in sports studies, that's generally not the case. My experience is that authors do just fine at the "numeracy" stuff, dealing with the numbers and doing the regressions and statistical tests. But they're weaker at properly understanding what the numbers actually *mean*. I'd bet you that eight out of ten researchers who found data matching Paulos' example wouldn't think of Paulos' explanation. Peer reviewers would be even more likely to miss it.

Since Dubner's kind of "numeracy" is so difficult, isn't it more efficient to better use what's already out there, instead of making what would probably be a futile attempt to create more of it? If you really want to try to solve the problem of improper conclusions in studies like this, just get lots of people to look at the study, and give those people incentives to spot these kinds of things.

Here's how such a plan might work:

-- a researcher submits a study to the journal, as normal, and peer review proceeds as normal.

-- after deciding to provisionally accept the paper, the journal immediately posts it online, and allows for online comments and discussion.

-- after a suitable interval, a second set of senior referees review the comments and peer-review the paper again in light of those comments.

-- those peer reviewers allocate a fixed sum of money per paper -- $500, say -- among the commenters, in proportion to the usefulness of their comments, whether the paper gets published or not.

The benefits of a system like this are pretty clear: a potentially unlimited number of peer reviewers, motivated by money and reputation, are more likely to spot flaws than a couple of anonymous peer reviewers who are perhaps motivated to go easy on their peers. That means a lot fewer flawed papers, and a lot fewer press reports of incorrect conclusions -- conclusions that newspaper readers, no matter how intelligent or numerate, aren't given enough evidence to question.

There are other beneficial side effects beyond the obvious advantage of fewer fallacious papers:

-- the system provides an incentive for authors to give their conclusions more careful thought before submitting, saving the system time and money.

-- it provides a way for smart grad students to make a little extra money to support themselves.

-- it's cheap. I bet $500 a paper would be enough to clear most flaws. If you figure how long a paper takes to write, and how much professors get paid, $500 a study is a drop in the bucket compared to the cost of the original study.

-- the record of corrections makes it clear to everyone in and out of academia what kinds of flaws to avoid. (Within a few months, nobody will be neglecting to talk about regression to the mean, and everyone will know what selective sampling is.)

-- it will allow for a more accurate measure of academic accomplishment. Since academia is known to be a competitive environment but without formal scorekeeping, anything that allows a more accurate subjective rating of a researcher's proficiency has to be a good thing.

But I don't think this will ever happen, for political reasons. Nobody likes to have their errors exposed publicly. The current system, where peer review is mostly private, allows everyone to save face.

Furthermore, professors would probably be somewhat reluctant to criticize other professors, so the most effective critiquing would come from outside of academia. I am pretty certain that academic Ph.D.s, as a group, will never stand for a system that lowers their status by having laymen publicly correct their mistakes.

Labels: ,

Thursday, May 13, 2010

Why are Yankees/Red Sox games so slow? Part II

This is part II -- part I is here.


After posting the results of my regression on how players influence the length of baseball games, some commenters here and at "The Book" blog suggested I add more variables to the mix, to see if that would affect the results.

So I added a few things. I added pickoff throws (even though those weren't available for the first couple of years of the study). I added whether the game was on the weekend. Finally, I split pitches into whether the bases were empty or not (Mike Fast noticed that pitches come a lot slower with runners on).

The "weekend" variable was not significant. Including pickoff throws helped a fair bit. But it was splitting the pitches that made the biggest improvement. It turns out that with the bases empty, a pitch takes about 19 seconds. With runners on, it takes 27 seconds. That was a big difference.

The r-squared of the regression went only from .93 to .94, but I think some of the coefficients became more accurate. The estimate for the effect of adding an extra half-inning went from less than zero (which was obviously wrong) to two minutes (which is probably right). My explanation for why this happened is here.

Still, there wasn't much difference in the overall results. The players who came out as fast last time still came out as fast now, and the slow players are roughly the same too. The rankings changed a little bit, though. Here's a new Excel file with the results (two worksheets), for those who are interested.


Over at "The Book" blog, Mike Fast raised some legitimate questions about whether the results are correct. He wrote,

"I believe these regressions are giving results that are incorrect ... [by actual clock times, in a certain subset of games], Jeter uses a little more than one extra minute per game as compared to Jose Lopez, rather than 6 extra minutes that the regression tells us. Now that’s just comparing between-pitch-time, but I would expect that to be the bulk of the difference between any two players."

So there are three possibilities:

1. Mike's sample of games is not representative;
2. Jeter and Lopez affect game times in other ways than just speed between pitches;
3. The regression has incorrect results.

I think there might be a little bit of all three there, but especially the third. Let me go through them:

1. I ran the regression for different sets of years, and I found that Jeter seemed to be slower in the early years of the decade than in the latter years. Mike's empirical data came from 2008-2009, so that might explain part of it.

2. Here's a New York Times article about the first game of the season, Derek Jeter stepped out of the box for 13.6 seconds between pitches. That's not the total time, just the time he stepped out of the box. Robinson Cano delayed 18.7 seconds.

Now, it's only one game. But in the full study, Jeter still comes out as slower than Cano: +3.48 minutes to +2.05 minutes. Again, the NYT only measured one game, which is perhaps not representative. And perhaps Jeter sees more pitches than Cano. But this might be very slight evidence that something else is going on.

What could it be? It would have to be something not covered by the regression. Maybe it's mound conferences. Does Jeter initiate them or prolong them? (That's not a rhetorical question: I legitimately have no idea.) Maybe it's that Jeter takes a long time to get into the batter's box at the beginning of an at-bat. If he takes an extra 10 seconds, that's 40 seconds a game, which is quite a bit. Again, I don't know. Maybe he takes a long time to get to first base on a walk? Nah, that couldn't be more than a second or two per walk, which wouldn't add up to much over an entire season. Maybe when he's on first and a subsequent batter hits a long foul, he runs all the way to third base and takes a long time to get back? Doesn't sound plausible. But, anyway, those are the kinds of things we'd be looking for.

3. Is it possible that the numbers are wrong? For sure, they're somewhat wrong: a regression only gives estimates, with standard errors. For Jeter, the estimate was 3.48 minutes with a standard error of 0.65 minutes. Since, in general, an estimate is outside two standard errors 95% of the time, that means there's a 5% chance that Jeter's actual effect is less than 2.18 minutes, or more than 4.78 minutes. Splitting the 5% between "too low" and "too high," we might guess that there's a 2.5% chance that Jeter's impact is actually less than 2.18 minutes, even though he came out at 3.48 minutes.

But, actually, it's more than 2.5%. That's because by choosing Jeter, we're falling prey to selective sampling: we're only picking on Jeter because he came out at the top of the list (actually, second from the top, after Denard Span). And players at the extremes are more likely to be inaccurate than players in the middle.

You can see why this is so if you imagine taking all the batting lines from one day, and putting them in order. All the players at the top -- the ones who were player of the game, who went 3-for-5 with a home run -- overachieved their actual talent. And all the players at the bottom -- the ones who were 0-for-5 -- underachieved their actual talent.

When Ryan Howard goes 3-for-4, it would be silly to assume that .750 is a good estimate of his actual ability. When Ken Griffey Jr. goes 0-for-5, it would be equally silly to assume .000 is a good estimate of his actual ability.

The same thing is true here. When Derek Jeter shows up at 3.48, which is the ".750 batting average" of game slowness, we have to keep in mind that it's probably significantly too high. And when Nick Markakis appears at -2.77, the third-fastest batter in the study, it's probably significantly too low.

How much too high or too low? I don't know. But what we *do* know is that we expect 2.5% of our players to be more than 2 standard errors too high. There are 694 batters in the study, so 17 batters fall into that category. Is Derek Jeter among them? We can't say for sure. We *do* know that those 17 players should be concentrated near the top of the list of players, so there's a pretty good chance that Jeter is one of them. Even if he's not, there's a good chance that he's still too high, just less than 2 standard errors too high.

Having said that: you can only figure the extremes are more likely to have been "lucky" if, after you strip out a reasonable amount of "luck", the player would collapse into the middle of the other players. If not, you have less of a reason to believe that luck was a big deal. For instance, as we figured, if you take 2 SE of luck out of Derek Jeter, he goes from 3.48 to 2.18. That moves him from 2nd place to ... 10th place. That's not that big a drop, and it means that although Jeter is still likely to have his slowness overestimated, it's probably not overestimated as much as if he dropped closer to the middle of the pack. If I absolutely had to guess Jeter's actual slowness, based only on the results of the regression, I'd say maybe regress him to the mean by about 1.5 SDs . That's just a gut feeling.

What we *can* conclude is:

-- the order of the players is roughly correct. As a group, the players at the top are indeed much slower than the guys at the bottom.

-- A player at the top is always more likely to be slower than a player below him in the list. Even though Jeter, the second-slowest hitter in the study, is likely too high, so are the third- and fourth and fifth-slowest hitter, so Jeter is still likely to be slower than they are. Not 100% likely, but always more than 50% likely.

-- most of the players near the top and bottom of the list project to being significantly more extreme than they actually are in real life. They are still significantly faster or slower than average -- just not as much so as the raw numbers imply.

-- your limit of plausibility is about 3 standard errors (1 in 100 players, or maybe 10 in the list, would be that far off). So for Tim Wakefield, who is 11.5 standard errors faster than your typical pitcher ... well, maybe, in the worst case, he's only 8.5 SE faster. So he's obviously still really fast. (And, since even that would only move him from fastest to fifth-fastest ... well, 3 SE is probably too much to deduct from him in the first place.)

-- if you do NOT selectively sample your players based on how high they are on the list, you should expect the regression estimates to be about right. So if you average a whole team, or look at lefties vs. righties, or high draft choices vs. low draft choices, or some other criterion that doesn't have to do with how fast or slow a player is ... you should be able to reasonably trust the results.


So, anyway, my guess about Jeter is that, these days, he's maybe 2 minutes slower than normal, not 3.5. Why? Because

-- he's faster now than he used to be
-- empirical data suggests he's not as slow as 3.5 minutes, although that only looked at pitch times
-- he finished near the top of the list of players, which suggests that he needs to be regressed to the mean a fair bit.


One technical point: I was assuming that, for the 694 batters, the estimates are all independent (which is why you'd expect 2.5% of players to be too high). I'm not sure if that's true ... it could be that if you found out for sure that Jeter is only (say) 90 seconds slower than average, and forced that into the regression, a whole lot of other players would change significantly. If anyone has expertise on this point, or any other statistical argument raised here, please chime in.

Labels: , ,

Friday, May 07, 2010

Stumbling on Wins: Are NBA rebounds consistent because of talent or opportunities?

In David Berri and Martin Schmidt's "Stumbling on Wins," the authors paraphrase JC Bradbury on what makes a useful player-evaluation statistic. They write,

"First, one must look at how the measure connects to current outcomes. Then, one must look at the consistency of the measure over time."

Fair enough. But there's a third criterion that the authors need to add.

To see why, take, for instance, saves in baseball. By the first criterion, saves are obviously important -- that's why teams put their best reliever in the stopper role. By the second criterion, saves are very consistent -- for Yankee pitchers over the last 15 years, there's a very high correlation between saves last year and saves this year. There's a much higher year-to-year correlation for saves than any other measure -- ERA, WHIP, DIPS, even strikeouts.

Does that mean that saves are the most useful way to assign value to a reliever? Does it really mean that Mariano Rivera, with 30 saves, is fifteen times as talented at saving than some other guy in the bullpen with two saves? Of course not. The number of saves depends mostly on opportunities. And opportunities are not a characteristic of the player -- they're a characteristic of the manager, who decides how to assign the workload. Yankee pitchers are not consistent because Mariano Rivera has ten or more times as much "save talent" than any other Yankee. Rather, they're consistent because Yankee managers are consistent in giving Mariano almost all the save opportunities.

So, I propose:

Third, one should look at how much the measure is a true reflection of the player's talent, and how much is a measure of factors outside the player's control other factors unrelated to talent, such as opportunities.

(Note: above update 3/10/10 after suggestion from Guy in the comments.)

The reason I bring this up is that Berri and Schmidt use the first two criteria to defend why they assign the value of rebounds to the player who grabbed the ball:

"When we look at consistency, ... we see that 90% of the variation in a player's per-minute rebounds is explained by a player's per-minute rebounds the previous season. There appear to be no statistics in baseball or football that are as consistent as rebounds in basketball."

But that doesn't mean that rebounds are a useful statistic. They could be like saves -- it could be that the consistency is due to consistency of *opportunities*, not talent. And many people, myself included, have argued that, that certain players position themselves to compete for rebounds, and others do not. If player X is the designated "rebound guy" on the team, year after year, that would explain the consistency without providing evidence of talent.

If Berri and Schmidt are using the high r-squared to defend their hypothesis that rebounds are talent, then they don't succeed. Indeed, I think the high r-squared shows the opposite. Given that there's a certain amount of binomial randomness in who gets any particular rebound, there's a limit to how much consistency you'd be able to see if everyone had the same number of opportunities. The exceedingly high r-squared is an indication that the cause is probably more than just talent.

I should explain that better. Here's a baseball example. Suppose you computed the year-to-year correlation in hits among players who had at least 400 AB. The r-squared wouldn't be 1, because players don't hit the same every year. Someone who got 150 hits last year might get 160 next year, and vice-versa. Almost everyone would be in the 100 to 200 range, clustering maybe around 150. And you'd get an r-squared of maybe 0.2 (I'm guessing).

Now, suppose you include *every* player, not just those with 400AB. Now, players are much more likely to have similar results than last year. You get your typical regular who has 150 this year and 160 next year. Then you have your utility player who has 40 one year and 27 the next year. And you have your pitchers, who have 8 hits last year and 11 this year.

And so you have an r-squared that's much higher, maybe .7 or more. But the jump in r-squared is measuring consistency of *opportunity*, not talent.

So when you have one argument that rebounds are almost all talent, and another argument that rebounds have a huge component in there that reflects opportunity -- and then you get a high r-squared -- that result better supports the second argument, not the first.


Anyway, that's my main point. While I'm here, a couple of other smaller things I disagree with in that section of the book (pages 33 to 39):

1. The authors list the r-squareds for different measures in various sports; they find that their correlations for basketball are higher than other sports, and therefore argue that NBA statistics are more useful than others. But as I have pointed out before, you can't just use the raw r-squared or correlation coefficient as a measure of persistence of talent. The r-squared is dependent on many other factors -- most notably (as Tango has also pointed out many times), the length of a season. The authors found an r-squared of QB completion percentage of 24%, but a 90% r-squared for rebounding. That doesn't necessarily mean anything on its own. That's because the QB numbers are over 16 games and maybe a few hundred attempts, whereas the rebounding numbers are over 81 games and several thousand attempts. You just can't compare raw r-squared values that way, without first interpreting them.

2. When the authors say "there are no statistics in baseball as consistent as rebounds" ... well, they didn't include saves. I don't know for sure if saves have a higher r-squared or not, but I'd certainly be willing to bet they do.

3. The authors do indeed mention that the football season is shorter than the basketball season, but they don't seem to realize that that fact, in and of itself, affects the r-squareds. Instead, they have two alternative explanations. The first is that football statistics depend more on teammates than basketball statistics do -- which doesn't seem unreasonable, even without evidence backing it up.

But their second argument I'm not sure about. Berri and Schmidt argue that another reason professional football players are inconsistent is because of lack of experience. Why lack of experience? Because football players play only 16 games a season, so they're less experienced than basketball players, who play 81. Moreover, basketball players probably played pickup basketball every day as teenagers, while football players had to wait for organized leagues, because they couldn't just get a few friends together and play a real football game. So NBA players are more experienced because they've played a lot more basketball in their lives than NFL players have played football.

Well, it's probably true that NBA players have spent more time in games than NFL players, but I'm not sure why that's important. Why does playing fewer games (but still a lot of games -- a regular lot, rather than a huge lot) make you less consistent?

If I shoot foul shots for 15 minutes every day for a decade, and you shoot foul shots for 30 minutes every day for a decade, it would be expected that you'd be better than me. So maybe suppose I have more talent, so that even with less practice, I'm as good as you. Now: why would you really be more consistent than me? We're both 70% shooters, say. For me to be less consistent, I'd have to have more 60% years and more 80% years, while you'd hover closer to 70% every year. Why would that be the case? I suppose it's possible, but it doesn't seem plausible to me. Where's the evidence? Why would it matter that we got to the same point with different amounts of practice time?

Would I be more variable day to day, too, so I'd wind up having more 60% games and more 80% games? If that were true, if I'm sometimes 80% and sometimes 60%, my shots will be clustered together more than average. That means I'm more likely to make a shot after I've made my previous shot, and I'm more likely to miss a shot after I've missed the previous shot. That's the equivalent of saying that inexperienced players have a "hot hand" effect. But given that numerous "hot hand" studies have failed to find any effect, doesn't that suggest that all players are equally (binomially) consistent within their level of talent?

Now, I suppose you can make the argument that because of inexperience, football players are more likely to still be learning their technique, so they might be continuously improving. In that case, you might see a QB go from 20% to 25% in some measure more often than a basketball player goes from 20% to 25% in a similar measure. But if that were true, wouldn't the QB be improving throughout his entire career, given that he plays only 16 games a season? In that case, he'd still be improving into his 30s, so his age-related dropoff would be mitigated, and he would look *more* consistent later in his career. So there would be a balance: young players appearing less consistent between seasons, and old players appearing more consistent. The result should be a wash.

So I just don't understand how any inconsistency caused by "inexperience" would happen.


It looks to me like the authors are looking at the raw r-squareds, and then coming up with possible explanations for why they differ. But, as I said, they miss what is by far the biggest explanation, which is simply sample size. It's just the nature of how correlations work that the smaller the sample, the more luck dominates the results, and the lower the season-to-season r-squareds. I bet if you looked more closely than just listing correlation coefficients, you'd discover the difference in opportunities accounts for almost all the difference right there.

We can do a quick calculation.

The authors found that NFL QB completion percentage had a year-to-year r-squared of .24. Suppose that's because you have 24 points of variance caused by talent, and 76 points of variance caused by luck.

Now, suppose you played 80 games in an NFL season instead of 16 -- five times as many games, and close to the 82 games that the NBA plays. Now you'd still have 24 points of variance caused by talent, but only one-fifth the original variance caused by luck, which works out to 15.2 points. That would give you an r-squared of (24/39.2), or .61. That fits right in to what you get for similar NBA year-to-year r-squareds:

.47 NBA field goal percentage
.59 NBA free throw percentage
.61 NBA turnovers per minute
.61 QB completion percentage (projected)
.68 NBA steals per minute
.75 NBA points per minute

See? It's just opportunities. Those other explanations, about teammates an inexperience, might be factors too. But they're minor factors at best, and, without evidence, they're just speculation.

In fairness, the authors may have evidence for them that they're not telling us about. They don't say that the apparent inconsistency "may" be caused by inexperience, or that they "suspect" or "wonder" if that's the cause. Rather, they say:

"The inconsistency with respect to football statistics can be traced to two issues: inexperience and teammate interactions." [emphasis mine.]

So they imply they traced the effect, but they don't say *how* they did the tracing. So while I'm currently very skeptical that the apparent "inconsistency" is anything more than just straight sample size, I'm still willing to look at the authors' evidence, when they choose to show it.

Labels: , , , ,