Sabermetric Research: May 2012

Friday, May 25, 2012

May, 2012, "By the Numbers" now available

The May, 2012 issue of "By the Numbers" is now available.

The issue (.PDF) is here. Back issues are on the SABR website, and also on my website here.

It features four articles:

-- Charlie Pavitt: "Academic Research: Correcting the Aging Bias"

-- Phil Birnbaum: "Review: "Basic Ball""

-- Donald A. Coffin: "Trends in "Strategy" in Major League Baseball"

-- Kevin D. Dayaratna and Steven J. Miller: "First Order Approximations of the Pythagorean Formula"

Hope you enjoy the issue!

Labels: baseball, SABR

Wednesday, May 23, 2012

Racial bias and baseball card values

Recently, I found out about "Econ Journal Watch," a journal and website that debunks bad papers. According to its website, EJW

"watches the journals for inappropriate assumptions, weak chains of argument, phony claims of relevance, and omissions of pertinent truths."

Most excellent! And, this issue, there's a sports article, critiquing and extending a study that searched for racial bias in baseball by looking at baseball card values.

In 2005, four researchers -- John D. Hewitt, Robert Munoz Jr., William L. Oliver, and Robert M. Regoli [call them HMOR] -- published a study in "Journal of Sport and Social Issues" that looked at the rookie card values of 51 Hall of Famers. They ran a regression to predict card price based on the player's statistics, race, and scarcity of the card. They found no significant effect for race.

Now, David W. Findlay and John M. Santos reviewed the HMOR study, and tried to reproduce it. They found a few problems.

-- first, there were some problems with some of the data being wrong -- probably transcription errors. They fixed those.

-- then, they found that the original authors had taken some of their stats from one edition of Total Baseball, and some from another edition (in which the formulas had changed). They corrected that, too.

-- third, they found that, for five of the players, HMOR had used career numbers from the 1989 edition of Total Baseball -- even though the players were still active at that time!

-- fourth, they noticed that the authors used scarcity numbers from PSA Authenticators, but card values from Beckett. When they substituted PSA values instead, they got a much better fit.

-- fifth, they criticize the authors for omitting Hispanic players from the sample (I don’t agree that this one is a problem).

After all that, they reproduced the analysis, and still found that there was no significant effect for race. However, to their credit, they write,

“Although our results indicate that player race has no statistically significant effect on baseball card prices, we are mindful of Ziliak and McCloskey (2004, 334) who note that "statistical significance, to put it shortly, is neither necessary nor sufficient for a finding to be economically important." The estimated coefficient on the Black dummy variable indicates that the price of a black player’s rookie card, all else fixed, is 9.3% lower than that of an otherwise identical white player."

------

Excellent stuff ... “Econ Journal Watch” is providing an extremely valuable function, giving authors a place to publish their critiques, and thus creating an incentive to do this kind of checking. Findlay and Santos write that they submitted their article to the journal that published the original, but it was rejected as “not a good fit.” One suspects that if EJW hadn’t existed as a backup, they wouldn’t have bothered to investigate in the first place.

So, kudos to EJW, and to Findlay and Santos.

------

While data errors are indeed a big concern -- especially the ones resulting from truncated careers -- I think there are problems with the study that are far more serious. Those, however, are more in the line of “subject matter” issues. Even with the data corrected by Findlay and Santos, the flaws are so large that I don’t think the study means anything.

-----

1. For their measure of player performance, the authors used Pete Palmer’s “Total Player Rating” and “Total Pitcher Index”. Those are denominated in Runs Above Average. Do we really value a player only by his career runs? The various Hall of Fame methods, such as Bill James’, all recognize that there are other factors that influence Hall of Fame voting, such as pitcher wins, times leading the league, times hitting .300, and so on. Wouldn’t it be expected that collector popularity be similar?

In fairness, the authors did try to improve on the measurement by trying average runs per season, instead of career total runs. They found little difference.

Still, that’s not enough. It implies that Lou Brock should be only as esteemed as any other player producing +10.5 runs over a long career. That ignores that there are very good reasons that Brock is in the Hall, whereas most other players at +10.5 are not.

If white players tend to create runs in ways that are more valued than ways that black players create runs, that would create a false perception of racial bias in favor of whites -- and vice-versa. Even more important: the study is very small, with only 51 players (and only 2 black pitchers!). Even if blacks and whites are the same, it’s very possible that just by random chance, the whites in this study just happened to create runs in more popular ways. Over several hundred players, you might be able to assume that the effects would even out. But not with 51 players.

2. For card scarcity, the authors used data provided by Professional Sports Authenticator (PSA), a company that grades, authenticates, and slabs cards.

The company provides a "Population Report", listing the number of each card graded by PSA. But PSA doesn’t grade cards randomly -- it grades them at the owner’s request and expense. It stands to reason that owners will submit more valuable cards much more often than less valuable cards. That will tend to understate the actual scarcity.

To take an extreme example: from the 1960 Topps set, there were 187,192 cards graded. From the 1988 Topps set, there were only 8,043. But, of course, there were many, many more cards printed in 1988 than 1960 -- from these estimates, by a factor of perhaps 100 (and even that seems a bit low to me). People just don’t get 1988 Topps cards graded -- because the cards are worth a penny, and grading costs $10 or $20 or more.

Using the PSA numbers conflates two conflicting effects: scarce cards are graded less often. But scarce cards are expensive, and expensive cards are graded *more* often. There’s no obvious way to figure out how to break down the two effects.

And, this creates a very strong bias. There were more white superstars than black superstars in the 50s. But the model underestimates the scarcity of their rookie cards. Therefore, the model predicts a lower price, which can be misread as a racial preference for whites.

3. The only two factors the studies considered were performance and scarcity. But there are obviously other important reasons that a player may be more popular than another. For instance: team. It goes without saying, doesn’t it, that a New York Yankee superstar should be more popular, than, say, a Minnesota Twins superstar with the same stats?

If the Yankees were less likely to have black superstars than other teams, that would account for some of the difference. If the Yankees were more likely to have white superstars with low print runs but high grading numbers -- say, Mickey Mantle -- that would cause the model to doubly underestimate what the value of the card should be.

4. There are many other factors that influence popularity, that are specific to the particular player. Mark Fidrych and Kerry Wood, for instance, are loved for reasons other than their career totals. We Blue Jays fans have a bigger soft spot for Ernie Whitt than for George Bell, for reasons that (I would argue) are related more to personality than race.

You’d also think that players who spent their career with one team would have different fan bases than players whose careers spanned multiple teams. Carl Yastrzemski, for instance, had lots of seasons to make Red Sox fans love him, and the fact that he played his entire career there makes fans love him more. On the other hand, Dave Winfield -- the most similar player to Yastrzemski -- left strong memories in at least four different places.

What’s more important for popularity: having lots of short-term fans in different cities, or having long-term fans in one city?

I don't know. But the answer matters. And it won't necessarily even out in a sample of only 51 players.

------

Anyway, I could probably go on ... the point is, that any one of these four factors could significantly affect the findings of this study. All four, taken together ... well, I don’t think the results tell us anything at all about race affects card prices -- or even about how performance affects card prices, or how scarcity affects card prices.

Yes, the authors screwed up the data a little bit, but ... well, that’s by far the least of this study’s problems.

Hat Tip: Marginal Revolution

Labels: academics, baseball, baseball cards, economics, race

Monday, May 14, 2012

A model for explaining home field advantage between sports

It occurred to me that it might be possible to predict, or explain, the difference in home field advantage between different sports, based on their rules and outcomes. In this post, I'll just talk about the "absolute" home field advantage, in terms of goals or points. (The translation to winning percentage is easy after that, but I'll save that for a future post.)

Take a look, and let me know what you think.

----

Here's the home field advantage (HFA) for three different sports, in terms of goals or points.

0.453 - Premier League Soccer (2010-11)
4.000 - NBA (estimate)
0.783 - NHL (1980-81 to 1984-85)

As expected, the HFA is highest for basketball, where the most points are scored, and lowest for soccer, where the fewest "points" are scored. Still, they're very different in terms of rates. Here are the percentages by which the home team outscores the visiting team:

38% - Premier League (+0.453 home goals per 1.1737 road goals)
4% -- NBA (roughly, +4 per 100)
27% - NHL (+.783 per 2.937)

These HFAs are all over the place. In basketball, the home team only outscores the visiting team by 4 percent. But, in the NHL, it jumps to 27%, and in soccer, it's almost 40%!

Why the big differences?

It has nothing to do with the length of the game. If the home team scores 4% more points over 48 minutes, you can also expect it to score 4% more points over a minute, a quarter, or a season. (It's the percentage of how many more *wins* the home team gets that depends on game length, but, again, that's not what we're discussing in this post.)

So, what is it then? There are probably many contributing factors, but I think the biggest has to do with the structure of the individual games. That's because it's easy to get a higher or lower HFA just by changing the rules.

------

Start by looking at the NBA, where the HFA is about 4 points a game. Let's change the way basketball works, to move that difference away from 4 points.

In fact, let's do that while keeping many aspects of the game constant. We'll stay with a game where each team gets 100 possessions, has to throw a ball through a hoop on a basketball court, and has an average score of 100. We'll just change the "internal" rules.

------

Suppose we change the game to consist of only foul shooting. Each possession, the team gets two foul shots. If it sinks them both, it gets two points. Otherwise, it gets zero. We can assume the average player shoots 71%, so that the probability of two straight makes is almost exactly 50%. (If 71% seems a bit low, just imagine that we make the hoop a bit smaller at the same time we change the rules.)

We have empirical data that lets us figure out what HFA would be, thanks to King Yao, who compiled home and road free-throw percentages for a few recent NBA seasons. The numbers were:

75.95% home team
75.72% visiting team

For the chance of making two straight shots, then, we can just square those numbers:

57.68% home team
57.34% visiting team

The difference is 0.34%. Over 100 possessions, that's .34 extra scores, or around 0.7 extra points. That's much smaller than the 4 point HFA in "real" basketball. We've reduced HFA by 80 percent just by changing the rules!

------

How can we construct a game where HFA is higher? Again, that's easy -- try "double or nothing" basketball.

In that game, when you score a field goal, you don't get the two points yet. Instead, you immediately get a second possession, and you have to score on that one too. If you do get two in a row, you get 4 points. If you don't get the second one too, you get zero.

A game consists of 100 possessions for each team (so each team will get somewhere between 100 and 200 attempts to make a field goal).

In this game, the HFA will be roughly double. How do we know? Well, the real life HFA is 4 points, and each team gets roughly 100 attempts. So, we can guess that, in normal basketball, the home team might score on around 52% of attempts, while the visiting team will score only on 50%.

But, now, each team has to make two in a row. The home team will do that around 27% of the time (52% squared), while the visiting team will be at 25% (50% squared). That's still two extra scores per game, but now each score is worth four points. So, instead of winning 104-100, the home team will win 108-100.

The change in the rules has increased the HFA from 4 points to 8 points -- from 4% to 8%.

------

So: three different hoops games, three different HFAs. That shows that you have to examine the rules in order to understand where HFA is coming from. It can't be just crowd influence, or referee bias, or familiarity with the home court, or anything like that. Those are things that could *cause home field advantage to exist*. But they aren't things that could, on their own, cause the level of home field advantage to *vary between sports*. For that, you need to examine the rules.

------

So, is there a factor that explains how the HFAs change for the three games?

It seems to me that the answer is: the level of "compounding" of events, the number and difficulty of the things that all have to go right for you to score.

For the "two in a row" game, you need to score twice in a row, not just once. If you have a 4% advantage on each one, you'll have an 8% advantage on two of them compounded. (1.04 squared is about 1.08.)

For the foul shooting game: In "real" basketball, there's more than just shooting. To score a field goal, you have to do a whole bunch of things right. For instance, you have to (a) pass the ball around accurately; (b) deke out a defender enough to get a good shot; (c) have the other members of your team distract the other defenders so they can't block; and then (d), take an accurate shot.

That's four things that might all have to go right. From the discussion above, we know the HFA for two consecutive foul shots is 0.34%. Suppose each of those four of those things, from (a) to (d), have that same 0.34% advantage. Then the home team gets an advantage of roughly four times 0.34%, or roughly 1.4 percent.

That hypothetical only gets us to 1.4 percent, not to 4 percent. That suggests that more than four compoundings are necessary -- maybe as many as 10. That's certainly reasonable. Foul shooting seems to be something that's simpler than most other basketball skills. We assumed, for convenience, that "taking an accurate shot" was exactly as complex as foul shooting ... but it might be twice as complex. You have to shoot accurately, but first you also have to judge the shot. If the other three things are also twice as complex, which doesn't seem implausible, then we have eight compoundings.

Anyway, the point is not to get this particular example to work out perfectly, but to show that it's at least a decent approximation.

-------

The "compounding" explanation also seems to work if you compare hockey to basketball.

In basketball, the net is unguarded. In hockey, there's a goalie trying to stop the puck. The goalie has his own HFA, while the hoop presumably does not. So, that's at least one extra compounding in favor of hockey.

In basketball, it's difficult to force a turnover; it's a fairly rare event. In hockey, it's easy, since physical play is allowed -- you just run into the puck carrier, if you can, and dislodge him.

So, in hockey, part of the goal scoring process is avoiding checks from the opposition. If four players touch the puck before a shot, and each one has the same chance of losing the puck as the entire team has in a basketball possession ... then you have three extra compoundings, since there's an additional HFA for each of the four players.

In soccer, it's even more extreme: it isn't unusual to take 10 or 15 passes before you get a decent chance at a shot. So, ten things have to go right. If each pass in soccer has the same chance of being intercepted as a pass in basketball, but soccer requires three times as many passes before a goal ... well, now you have three more compoundings.

This is all theoretical, of course, but we can check the numbers to see if they're reasonable.

In soccer, the absolute HFA was 38 percent more goals. In basketball, it was only 4 percent. To get from 4 percent to 38 percent, you need about eight times as many compoundings (since 1.04 to the eighth power equals approximately 1.38).

Is eight compoundings reasonable? I think it is, because we can get a similar result another way.

In the NBA, about 50 percent of possessions result in a score. In soccer, it's probably, what, around 2 percent?

If soccer had two compoundings to every one basketball compounding, the scoring rate would still be 25 percent (you'd have to do something with a 50 percent success rate, twice). If it had three compoundings, it would be 12.5 percent. Four compoundings, about 6 percent. Five compoundings, 3 percent. Six compoundings, 1.6 percent, and we're there.

So, a naive estimate is that it to score a goal in soccer, you have to be skilled enough to do what it takes to score a goal in basketball, six consecutive times.

We were expecting 8 compoundings from the "compare HFA" argument. The "probability of scoring" method suggests 6 compoundings.

Not bad! Those two estimates, 8 and 6, are pretty close. Why aren't they closer?

Well, it could be that some of my estimates were off, like the one where I guessed that 2 percent of soccer possessions score.

Or, it could be that there's a large difference in competitive balance between the two leagues. The more lopsided the talent, the lower the HFA (when the better team is so good that it always wins, the HFA is obviously zero).

But, most importantly, it could be that there are factors outside of "compoundings". For instance, referee bias, which need not be anywhere near the same order of magnitude as player HFA. Actually, that could very well be it: in soccer, the referee can have a very large impact on the game. In the NBA, a blown call is worth a couple of points out of 100. But, in soccer, a blown penalty call could be one goal out of two.

------

Anyway, if you buy the idea that these two estimates of compounding should be the same, that suggests a way to get a rough estimate of what HFA should be in other sports, at least for sports that are similar to basketball/hockey/soccer, in the respects we used in our arguments. Specifically:

-- you can divide the game into possessions

-- each team gets roughly an equal number of possessions

-- you can only score once per possession

-- you can estimate a probability of an average team scoring on each possession

-- what keeps you from scoring at will is a defense that's similar to defenses in basketball/hockey/soccer and also subject to HFA (which excludes, say, foul shooting or skills competitions)

-- referee bias is roughly the same order of magnitude as for basketball/hockey/soccer.

To get an estimate, you start with a known sport and a known HFA, and you adjust it by the differences. Let's use the NBA as our reference point. It has a 50% percent chance of scoring on each possession (0.5), and an absolute HFA of 4% (1.04).

We now adjust for the number of compoundings based on the difference in probability of scoring on a possession. That leads to this formula:

Let p be the probability of scoring on a single possession in your particular sport. Then:

HFA = 1.04 ^ [log(p)/log(0.5)]

(Checking that it works for the "double or nothing" basketball variation: p=0.25 gives HFA=1.08, which is 8% more points, which is correct.)

That's your rough estimate of HFA. I emphasize it's *rough*. You'd still need to adjust it (slightly) up if your league has more competitive balance than the NBA, or down if it has less. And you'd have to adjust it up (perhaps substantially) if you think the effect of referee bias on HFA is higher, or down if you think referee bias is lower.

And, of course, there might be other factors I haven't thought of.

-------

Does someone want to try this for other leagues or other sports, and see how close it comes? I'd be curious to see the NLL, which has, maybe, 12 goals per team per game. The problem is estimating possessions, which is hard for lacrosse but easier for, say, the WNBA.

Labels: basketball, hockey, home field advantage, NBA, NHL, soccer

Tuesday, May 08, 2012

Factors influencing home field advantage

The more games in a season, the more likely the best teams will rise to the top, and the worse teams will fall to the bottom. That's just common sense, and the law of large numbers.

Similarly, the more innings in a game, the more likely the best team will win. If you put the whole season into a single 1,458-inning game, there's no doubt that (for instance) the Yankees would beat the Twins.

Home field advantage (HFA) is one of those things that makes teams better. And so, the longer the game, the more likely the home team's advantage will show up in the results. HFA for a 3-inning game would be smaller than HFA for a 9-inning game.

So, it's easy to compare HFA within one given sport. But how do you compare two? According to "Scorecasting," from 1989 to 1999, in the NBA, home teams went .605. In the NHL, they went .557. Why the difference?

In the past, I've used an argument that I now somewhat regret. It went something like this: "We know that a longer game means a higher HFA. Therefore, basketball must be "longer" than hockey in some sense. Perhaps there are more confrontations between players, or something, which allows the HFA to expose itself more easily."

But now, I think, that line of thinking is too vague. It's almost a circular argument. "Why is the NBA higher?" "Because the game is longer." "What do you mean by longer?" "I don't know exactly, but it's the attribute of NBA games that makes home field advantage bigger."

It's like, suppose we don't know what causes lung cancer, except smoking. And then we find a country that has a high rate of lung cancer, even though they don't smoke much. Do we say, "that country must be somehow 'cigarettier'?" That would be silly.

And, in any case, we can do better. There are identifiable reasons why the NBA home record is higher than the NHL home record. They don't solve the problem entirely, but at least they're concrete factors.

-------

I'm going to start by calculating the theoretical HFA for the National Hockey League, step by step.

From 1980-81 to 1984-85, the home team outscored the visiting team by .70619 goals per game.

The home team scored an average 4.264 goals per game. Since it's typically assumed that goals have a Poisson distribution, the SD of goals per game is the square root of that, or 2.065. (It's a property of the Poisson distribution that the SD is the square root of the mean.)

The visiting team scored an average of 3.557 goals, for an SD of 1.886.

So, the SD of (home team - visiting team) is 2.797.

Therefore, if the two teams were exactly equal, the goal differential would be a normal curve with mean 0, and SD 2.797.

But the HFA makes those two teams unequal, by .76019 goals. Divide that by 2.797 and we see that they're unequal by 0.271 of an SD. Therefore, the home team wins if the random outcome is greater than -0.271 SDs.

Going to a normal distribution table, that probability is 0.607.

In those actual NHL games, the home team actually had a winning percentage of .592. Not bad!

Why is our theoretical estimate too high? Well, one reason is that our calculation assumed two equally talented teams. But in real life, there are always differences in talent, sometimes large ones.. And HFA decreases as the talent gets more uneven. (If an .000 team plays a 1.000 team, the HFA is obviously zero.)

So, that's one reason our estimate is too high. It's probably not all of it.

-------

Now, let's go back to game length. We all agree that if we increased the length of an NHL game, say from 60 minutes to 120, the HFA would increase.

Suppose the league does that. But, at the same time, it decides to also reduce the number of goals scored. Now, every time a goal is scored in the six-period game, the referee flips a coin. If it's heads, the goal stands. If it's tails, the goal doesn't count.

That means the average game score is the same. The distribution of goal differential is the same. The only thing that's different is the length of the game -- the number of confrontations between players, and the length of time one team has to show it's superior to the other team.

So, we should expect the HFA to go up, right?

It doesn't. It stays the same. (actually, it goes down a bit, but never mind.)

In the old NHL, home goals had a Poisson distribution with mean 3.557. And in the new NHL, home goals *also* has a Poisson distribution with mean 3.557. The distributions are identical, because Poisson applies (as an approximation) to any rare events. Whether it's over 60 minutes or 120 minutes, 3.557 goals qualifies as rare.

So if we repeat the calculation for HFA, every step is exactly the same as before! And so we get the same answer.

-------

So if "length of the game," in terms of confrontations or action, doesn't matter, what *does* matter?

Goals. The more goals scored, the higher the Poisson mean, and so the higher the SD of game results. That means more randomness, a wider spread. If there's a wider spread, that means the HFA of 0.706 goals is smaller relative to luck. And so, it has less opportunity to express itself, and we get a lower HFA.

Just to give you an example: suppose the average goals per team increases to 6. That means the SD of a game difference is 3.46. The difference of 0.706 goals is now only .204 of an SD, which gives you 58.1 percent of a normal curve. So the HFA drops to .581.

Goal difference is part of the reason that I got a theoretical HFA of .592, but "Scorecasting" showed an actual HFA of only .557. The Scorecasting study used the ten seasons ending 2009, when scoring was historically low. I used 1980 to 1984, when scoring was historically high.

-------

So, have we found an answer? Can we say that one reason why basketball HFA is higher than hockey HFA, is that basketball has so much more scoring? Well, yes and no. Yes, scoring is part of it, but, no, we can't use this particular argument, because basketball is not Poisson.

Indeed, non-Poisson-ness is one of the factors boosting the NBA home field advantage. As it turns out, the farther the distribution gets from ideal Poisson, the lower the random variance. And lower randomness boosts HFA, by providing less noise to drown out the HFA's signal.

If the NBA switched to Poisson, by making the game 20 times longer, and making baskets 20 times harder to achieve, HFA would go down, even though scoring would stay the same.

Well, not necessarily. It depends whether teams change the way they play under the new system. The home advantage in the NBA is, what, 3 or 4 points a game? If the hoop became 20 times harder to hit, that 3 or 4 points might change to something else entirely, and we'd have to recalculate.

--------

So we have two factors affecting HFA so far, all else being equal:

1. Non-Poisson-ness increases HFA.
2. More scoring decreases HFA.

--------

You can probably think of more factors. I've got a couple I'll save for a future post.

Labels: basketball, hockey, home field advantage, NBA, NHL, statistics

Monday, May 07, 2012

Solution to "Another puzzle"

This is my solution to the puzzle from the last post. You probably want to read that other post first, or this discussion won't make any sense at all.

--------

The puzzle asked for a proof that the rule change does not change the odds of either team winning the game.

It seems like that shouldn't be true: the rule change gives the better team more possessions. It definitely causes many scores to be very, very lopsided in favor of the better team. How can it improve the score, but not change the odds of winning?

In one sentence, the answer is: all those extra points always go to the team who would have won by the old rules anyway.

Let me show you why. First, I'll give you an explicit "proof", then I'll try to explain why this happens in a sentence or two.

--------

First, note that under the new rules, if a team passes 100 possessions, it is sure to win. Why? Because, by the rules, the game will be over with the other team having exactly 100 possessions (since we stipulated that the 100th possession is a miss, and when both teams hit 100 or more, the game ends.

At that point, the two teams will have equal numbers of misses (because of equal numbers of "innings"), but the first team will have more possessions. Therefore, it must also have more points.

(So we might as well stop the game as soon as one team hits its 101st possession and declare it the winner!)

Now we show that if the game had gone according to the old rules -- stopping at 100 possessions each, regardless -- that same team would have won.

(We're assuming that both teams' first 100 possessions would have been in exactly the same sequence -- although, of course, the teams would have alternated by possession instead of by "inning").

Suppose Team A won under the new rules. Then, both teams had the same number of misses. All Team B's misses came in the first 100 possessions. But at least one of Team A's s misses came *after* 100 possessions. Therefore, Team A had fewer misses than the second team in the first 100 possessions. Therefore, it must have had more points, which means it would have won by the old rules.

Similarly, if a team would have won by the old rules, it would have won by the new rules. Suppose team A won under the old rules. Then, after 100 possessions, it had fewer misses than team B. In that case, team A wouldn't have been stopped at 100 possessions -- it would have kept going somewhere earlier in the game, until it caught up to team B's miss total. Therefore, team A would have had more possessions than team B under the new rules. Therefore, it would have won under the new rules.

---------

That's basically the solution in a nutshell. The team that hits 100 possessions with the fewest misses will have won, even if it doesn't get any more possessions at all. So any possessions after 100 -- the ones that cause a blowout -- will have occurred after the win was assured. So they can't increase the chances of winning.

---------

(I haven't mentioned ties yet ... I won't go through the whole explanation, but if there's a tiebreaker required under one set of rules, it would also be required under the other set. And both sets would break the tie the same way.)

---------

P.S. I always try to explain things too many ways. Let me continue that tradition with one last alternative explanation.

For any sequence of hits and misses, this set of alternative rules would lead to exactly the same score as if you had used the "new" rules:

Both teams get 100 possessions. The team leading at that time gets to pad its score by taking extra possessions until it has the same number of misses as the trailing team.

The only difference is that under this set of rules, the extra, "over 100" possessions come after the other team's possessions are done, instead of while the other team still has some to go. And it's obvious they don't affect who wins.

Sunday, May 06, 2012

Another puzzle

UPDATE, 4:45pm Sunday: Oops! I had to add another condition to make the puzzle work. See below.

---------

(Note: This time, the puzzle has some sports content.)

The rules in the Oversimplified Basketball Association (OBA) are as follows: Each team takes turns getting a possession of the ball. If they score a field goal, they get two points. There are no three-point shots, rebounds, or fouls. If there's a turnover or a missed shot, the referee blows the whistle and the other team inbounds the ball to begin their possession.

So, all possessions are independent and identical. The game lasts exactly 100 possessions for each team. (Except that if the score is tied, each team takes one extra possession to try to break the tie. If the score is still tied after that, each team gets one more, and so on.)

The average field goal percentage in the OBA is .500, so the average team scores 100 points. But some teams have a field goal talent of, say, .550, and score an average 110 points a game, while others have a talent of, say, .400, and score only 80 points a game.

One year, the owners decide the fans like blowouts more than close games. They suggest changing the rules slightly.

Under the new proposal, if a team makes a field goal, it gets to keep the ball for another possession. Only when they miss does the ball get turned over to the other team. The game ends when both teams have an equal number of misses, and they both have passed 100 possessions. (If the game is tied at that point, the game continues, until each team has missed once more. If it's still tied, repeat.)

It might help to think of the new rules by a baseball analogy. The game begins. The first team gets the top of an "inning", consisting of as many baskets as it can make until it makes an "out" by missing. Then the second team gets the bottom of the inning.

Innings repeat until, at the end of a given inning, both teams have had 100 possessions ("at bats"). Then the game is over, and the most baskets ("runs") wins. If there's a tie, play an extra "inning" until the tie is broken.

The owners argue to the fans that this should work. They reason,

"Suppose team A has a .750 field goal percentage, and team B is .500. Before the rule change, team A would go 3-for-4, and team B would go 2-for-4. So, team A would outscore team B by a 3:2 ratio.

"But with the new rules, team A scores 3 field goals for every miss. Team B scores only 1 field goal for every miss. So, now, team A should outscore team B by 3:1, not 3:2.

"So, instead of winning by an average 150-100, team A should now win by an average 300-100."

The owners also realize they can sell more commercials, because the games will be longer than before. For instance, by the time team B gets to 100 possessions to end the game, team A will probably have had 200.

So, the OBA implements the rule change, and there are lots of blowouts, exactly as expected.

-------

(UPDATE) One more condition, to make the puzzle work: both teams always miss on their 100th possession.

Given that condition, prove that the rule change does not affect the odds of who wins the game.

-------

Hint: As in the previous puzzle, the solution does not require any fancy math.

Sabermetric Research

Friday, May 25, 2012

May, 2012, "By the Numbers" now available

Wednesday, May 23, 2012

Racial bias and baseball card values

Monday, May 14, 2012

A model for explaining home field advantage between sports

Tuesday, May 08, 2012

Factors influencing home field advantage

Monday, May 07, 2012

Solution to "Another puzzle"

Sunday, May 06, 2012

Another puzzle

About Me

My stuff

Hardcore Sabermetric Research Links

Other Sports Research Links

Medium Core Sabermetric/Baseball Links (more to come)

More Baseball Stuff

Blogroll

Previous Posts

Archives