Monday, November 30, 2009

Academic article on predicting hitting performance

A new academic article called "Hierarchical Bayesian Modeling of Hitting Performance in Baseball" attempts to beat existing prediction methods -- PECOTA, Marcel, et al -- using a more complicated model and Bayesian techniques.

It's the new issue of the academic journal "Bayesian Analysis".

The article is accompanied by three reviews; I'm the co-author of one of them, with Jim Albert. (Disclosure: we discussed the article by e-mail, but Jim wrote most of it, except for a few paragraphs that I provided in Section 6.)

There's also an article where the authors, Shane T. Jensen, Blakeley B. McShane, and Abraham J. Wyner, respond to the reviews.

All five articles are available at the above link, near the top.

Labels: ,

Thursday, November 26, 2009

The Bradbury aging study, re-explained (Part II)

This is a follow-up to my previous post on J.C. Bradbury's aging study ... check out that previous post first if you haven't already.

My argument was that players with shorter careers should peak earlier than players with longer careers. Bradbury disagreed. He reran his study with a lower minimum, 1000 PA instead of 5000. He found that there was "no drop".

I decided to try to run his study myself, the part where he looks at batter performance in Linear Weights. I think my results are close enough to his that they can be trusted. Skip the details unless you're really interested. I'll put them in a quote box so you can ignore them if you choose.


Technical details:

Here's what I did. I took all players whose careers began in 1921 or later, and looked at their stats until the end of 2008 (even if they were still active). They had to have had a plate appearance in each of at least ten separate seasons. In seasons in which their age was 24 to 35 (as of July 1), they had to have had at least 5000 plate appearances.

Any player who did not meet the above criteria was not included in the regression. Also, the regression included only seasons from age 24-35 in which the player had at least 300 PA.

Each of those seasons was a row in the regression. The model I used was:

Z-score this season = a * age this season + b * age^2 this season + c * career average Z-score + d * player dummy + constant + error term

I didn't include dummy variables for individual seasons (Bradbury's "D" term, if you look at his paper) or park factors. I think those would change the results only slightly.

Another difference I noticed later is that when I calculated the Z-scores, I used the standard deviation only of players who were 24-35 and had 300 PA. Bradbury, I believe, used the SD of all players, regardless of PA. Again, I don't think that affects the results much (although it makes his coefficients about twice as big as mine).

Finally, I'm not 100% sure that I did exactly what Bradbury did in other respects. The study is vague about the details of the selection criteria. For instance, I'm not sure if any ten seasons qualified a player, or only ten seasons of only 300 PA. I'm not sure if the player need 300 PA every season between 24 and 35, or if that didn't matter as long as the total was over 5000. So I guessed. Also, for Linear Weights, I used a version that adjusts the out for the specific season, whereas Bradbury used -0.25 for all seasons (and compensated somewhat by having a dummy variable for league/season).


Anyway, here is my best-fit equation, followed by Bradbury's:

Mine: Z = 0.760 * age - 0.0133 * age^2 - 0.901 * mean - 10.6802 + dummies
J.C.: Z = 1.322 * age - 0.0224 * age^2 - 1.205 * mean + other stuff + dummies

These equations look different, but that's mostly because Bradbury used a different definition of the Z-score. If you look at the significance levels, they're similar: for mine, about 12 SDs; for Bradbury, about 11 SDs. Bradbury might be smaller because his regression was more sophisticated, with certain corrections that likely brought the significance down.

More importantly, our estimates of peak age, which can be calculated as - ( coeff for age ) / ( 2 * coeff for age^2 ):

Mine: 28.62 peak age
J.C.: 29.41 peak age

Why the difference? My guess is that there was something different about our criteria for selecting players for the sample. Again, I don't think the difference affects the arguments to follow.

Now, this is where J.C. says he ran the regression again, for 1000PA and no 10-year-requirement, and got no difference in peak age. I did the same thing, and I *did* get a difference:

Mine, for 5000 PA: 28.62
Mine, for 1000 PA: 28.06

It looks like a small difference, only .56 years -- and the total of 28.06 is still above the previous studies' conclusion that the peak is in the 27s. However, as it turns out, the way the study is structured, that small difference is really a big difference. Let me show you.

First, I ran the same regression, but this time only for players with 3000-5000 PA:

3000-5000 PA: 27.61

So, these guys with shorter careers did have an earlier peak, about a year earlier than the guys with the longer careers. What if we now look at the guys with really short careers, 1000-3000 PA?

1000-3000 PA: 147.00

That's not a misprint: the peak came out to age 147! But the coefficients of the age curve were not close to statistical significance -- neither the age, nor the age-squared. Effectively, these guys performed almost the same regardless of age. They didn't peak at 29, but neither did they peak at 27. They just didn't peak.

And so, it's reasonable to conclude that one of the reasons the peak age dropped so little, when we added more players like Bradbury did, is that the regression wasn't able to find the peak for the players with the shorter careers. And so the sample still consists of mostly players with longer careers.


Can we solve this problem? Yes, I think so. The procedure cut off the sample of players at 24 and 35 years of age. If we eliminate the cutoff, the results start to work.

I reran the regression with no age restrictions: players had to have 5000 or 1000 PA anywhere in their careers, not just between 24 and 35. Also, I considered all seasons in which they had 300 PA, regardless of how old they were that year. The numbers are similar:

28.97 for 5000 PA+
28.66 for 1000 PA+

The difference is smaller now, 0.31 years. But the important result is the breakdown of the 1000+ group:

28.97 for 5000 PA+
27.72 for 3000-5000 PA
26.61 for 1000-3000 PA (now significant)
28.66 for the overall sample

It seems like the shorter the career, the earlier the peak.

But, still, the overall average seems to only have dropped 0.31 of a year, and it's still around 29 years. Isn't that still evidence against the 27 theory?

No, it's not.

Take a look at the above table again: we have three peaks, 28.97, 27.72, and 26.61. Those three numbers average to 27.77. Why, then, is the "overall" number so much higher, at 28.66?

It's because there were a lot more datapoints in the 5000 PA+ category than the others. And that makes sense. The more PA, the more seasons played. And each season gets a datapoint. So the top category is full of batters with 10 or more seasons, while the bottom category is full of batters with only a few seasons. In fact, some of them may have only 1-2 qualifying seasons of 300 PA or more.

If a player has a 15-year career, with a peak at age 29, he gets fifteen "29" entries in the database. If another player has a 3-year career with a peak of 27, he gets only three "27" entries. So instead of the result working out to 28, which is truly the average peak of the two players, it works out to 28.7.

Another way to look at it: Player A has a 12-year career. Player B has a 2-year career. What's the average career? It's 7 years, right? And you get that by averaging 12 and 2.

But the way Bradbury's study is designed, it would figure the average career is 10.57 years. Instead of averaging 12 and 2, it would average 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 2, and 2. That's not the result we're looking to find.

This is less of a problem in Bradbury's original study, because, by limiting players to 12 years of their career, and requiring them to play 10 seasons, most of the batters in the study would be between 10 and 12 years, so the weightings would be closer. Still, this feature of the study means that it's probably overestimating the peak at least a little bit, even for that sample of players.

So, anyway, if 28.66 is not the right average because of the wrong weights, how can we fix it? Simple: instead of weighting by the number of regression rows in each group, we weight by the number of players in each group:

28.97 for: 640 players with 5000+PA
27.72 for: 595 players with 3000-5000 PA
26.61 for 1148 players with 1000-3000 PA
27.52 overall average

So what looked like a small drop when we added the shorter-career players -- 0.31 years -- turns into a big drop -- 1.55 years -- when we weight the data properly.

Now, this only works when there actually IS a drop between the 5000+ and the 1000+ groups. We found a drop of 0.31. But on his blog, Bradbury said that with his data, he found no drop at all.

How come? I'm not sure. But one reason might be random variation (if he used different selection criteria). Another might be his age restriction causing nonsensical results in the important 1000-3000 group. And there are his other variables for "missed information resulting from playing conditions". Or, of course, I may have done something wrong.


So we're down to 27.52. That's pretty close to the traditional estimates of 27ish. But I think we're not necessarily done: there are at least two factors I can think of that suggest that the real value is lower than even 27.52.

First, we showed that the regression overestimates the peak age by overweighting long careers relative to short careers. We were able to get the average to drop from 28.66 to 27.52 just by breaking the sample down and reweighting.

By the same logic, all three groups above must also be overestimates! In the middle group, players with 5000 PA are going to be weighted 67% higher than players with only 3000 PA. If we were to rerun the regression after breaking the group down further, into (say) 3000-4000 and 4000-5000, we'd get a lower estimate than 27.52. In fact, we could break those new groups down into smaller groups, and break those groups down into smaller groups, and so on. The problem is that the sample size would get too small to get reasonable results. But I'm betting the average would drop significantly.

Second, the study leaves out players with less than 1000 PA. That's probably a good thing, because with only 1 or 2 seasons, it's hard to fit a trajectory properly. Still, it seems likely that if there were a way of figuring it out, we'd find those players would peak fairly early, bringing the average down further.


So, in summary:

-- If we use the Bradbury model on groups of players with fewer PA, we find that those players are estimated to have lower peak age. This supports the hypothesis that choosing only 5000+ PA players biases the result too high.

-- The model used in Bradbury's study consistently overestimates peak age for another reason. That's the weighting problem -- it figures the peak for the average *season*, not for the average *player*.

-- Correcting for that shows that if we look at players with 1000 PA, instead of just players with 5000 PA, the peak age drops to the mid 27s.

-- Other corrections that we can't make, because of sample size issues, would drop the peak age even further.

-- There is good evidence that the shorter the career, the younger the peak age.

-- It doesn't seem possible, with this method, to get a precise estimate of average peak age. "Somewhere in the low 27s" is probably the best it can do, if even that.

Labels: ,

Monday, November 23, 2009

The Bradbury aging study, re-explained

A few days ago, J.C. Bradbury responded to my recent post on his age study.

Bradbury had authored a study claiming that hitters peak at age 29.4, contradicting other studies that showed a peak around 27. His study was based on the records of all batters playing regularly between age 24 and 35. I argued that, by choosing only players with long careers progressing to an relatively advanced age, his results were biased towards players who peak late -- because, after all, someone with the same career trajectory, just starting a few years earlier, would be out of baseball by 35 and therefore not make the study.

In response, Bradbury denies that selective sampling is a problem. He writes,

"Phil Birnbaum has a new theory as to why I’m wrong (I suspect it won’t be his last)."

Actually, it's not a new theory. I mentioned it at exactly the same time and in the same post as another theory, last April. Bradbury actually linked to that post a few days ago.

Also, the reason "it won't be my last" is that, like many other sabermetricians, I am curious to find out why there's a difference between Bradbury's findings, which find a peak age of 29+, and many previous studies, which find a peak age of 27. They can't both be correct, and they way to resolve the contradiction is to suggest reasons and investigate whether they might be true.

But, Bradbury also said that I showed "a serious lack of understanding of the technique I employed." He's partially right -- I did misunderstand what he did. After rereading the paper and playing around with the numbers a bit, I think I have a better handle on it now. This post, I'm going to try explaining it (and why I still believe it's biased). Please let me know if I've got anything wrong.


Previously, I had incorrectly assumed that Bradbury's study worked like other aging studies I've seen (such as Justin Wolfers', or Jim Albert's (.pdf)). In those other studies, the authors took a player's performance over time, smoothed it out into a quadratic, and figured out the peak for each player.

Then, after doing that for a whole bunch of players, those other studies would gather all the differently shaped curves, and analyze them to figure out what was going on. They implicitly assumed that every player has his own unique trajectory.

Bradbury's study doesn't do that. Instead, Bradbury uses least-squares to estimate the best single trajectory for *every batter in the study*. That's 450 players, all with exactly the same curve, based on the average.

According to this model, the only difference between the players is that some players are more productive than others. Otherwise, every batter has exactly the same shaped curve. The only difference the model allows, between the curves of different players, is vertical movement, up for a better player, down for a worse one.

For instance: take Carlos Baerga, whose career peaks early with a short tail on the left and a long tail on the right, peak in his early 20s. Then take Barry Bonds, whose career is the opposite: his career peaks late, with a long tail on the left and a short tail on the right.

What Bradbury's model does is take both curves, put them in a blender, and come out with two curves that look exactly the same, peaking in the late 20s. The only difference is that Bonds' is higher, because his level of performance is better.

The model fits 450 identical curves to the actual trajectories of the 450 players. They can't be particularly good fits, because they're all the same. If you look at those 450 fitted curves, they're like a vertical stack of 450 identical boomerangs: some great hitter at the top, some really crappy hitter at the bottom, and the 448 other players in between.

I can pull a boomerang off the top, and show you, this is what Barry Bonds looks like. The best fit is that he started low, climbed until he reached 29 or so, then started a symmetrical decline (the model assumes symmetry). You'll ask, "what does Carlos Baerga look like?" I'll say, "it's exactly the same as Barry Bonds, but lower." I'll take my Barry Bonds boomerang, and lower my arm a couple of inches. Or, I can just pull the Baerga boomerang out of the middle of the stack.

(One more way of putting it. See this chart? This is how Justin Wolfers represents the careers of a bunch of great pitchers. He smoothed the actual trajectories, but modeled that every pitcher gets his own peak age, and his own steepness of curve. But for this study, they would all be the same shape, just one stacked above the other.)


Now, it's seems to me that the model is way oversimplified. It's obviously false that all players have the same trajectory and the same peak age. People are different. They mature at different rates, both in raw physical properties, and in how fast they learn and adapt. Indeed, this is something the study acknowledges:

"Doubles plus triples per at-bat peaks 4.5 years later for Hall-of-Famers, which indicates that elite hitters continue to improve and maintain some speed and dexterity while other players are in decline."

So, implicitly, even Bradbury admits that the model's assumptions are wrong: some players age differently than others.

However, even if the model is wrong in its assumptions and in how it predicts individual players, it's possible to argue that the composite player it spits out is still reasonable.

For instance, suppose you have three people. One is measured to be four feet tall, one five feet, and one six feet. There are two ways you can get the average. You can just average the three numbers, and get five feet.

Or, you can create a model, an unrealistic model, that says that all three are really the same height, and any discrepancies are due to uncorrelated errors by the person with the measuring tape. If you run a regression to minimize the sum of squares of those errors, you get an estimate that all three people are actually ... five feet.

The model is false. The three people aren't really of equal height, and nobody is so useless with a tape measure that their observations would be off by that much. But the regression nonetheless gives the correct number: five feet. And so you'll be OK if you use that number as the average, so long as you don't actually assume that the model matches reality, that the six-foot guy is really the same height as the four-foot guy. Because there's no evidence that they are -- it was just a model that you chose.

I think that's what's happening here. It's obvious that the model doesn't match reality, but it has the side effect of creating a composite average baseball player, whose properties can be observed. As long as you stick to those average properties, and don't try to assume anything about individual players, you should be OK. And that's what Bradbury does, for the most part, with one exception.


A consequence of the curves having the same shape is that declines are denominated in absolute numbers, rather than percentages of a player's level. If the model says you lose 5 home runs between age X and age Y, then it assumes *everyone* loses 5 home runs, everyone from Barry Bonds to Juan Pierre -- even if Juan Pierre didn't have 5 home runs a year to lose!

If Bonds is a 30 home run guy at age X, he's predicted to drop to 25 -- that's a 17% decline. If Juan Pierre is a 5 home run guy at age X, he's predicted to drop to 0 -- a 100% decline.

In real life, that's probably not the way it works -- players probably drop closer to the same percentage than by the same amount. Table VII of the paper says that a typical hitter would lose about half his homers (on a per PA basis) between 30 and 40. If Bradbury used a season rate of 16 homers as "typical," that's a 8 HR decline. But what about players who hit only 4 homers a year, on average? The model predicts them dropping to minus 4 home runs!

Now, that's a bit of an unfair criticism. The text of the study doesn't explicitly argue that a Bonds will drop by the same number of home runs as a Baerga, even though the study deliberately chose a model that says exactly that. Remember, the model is unrealistic, so as long as you stick to the average, you're OK. Bonds and Pierre are definitely not the average.

But, then, why does Bradbury's Table VII deal in percentages? The model deals in absolutes. Bradbury obtained the percentages by applying the absolutes to a "typical" player, presumably one close to average. So why not put "-8 HR" in that cell, rather than "-48.95%"?

By showing percentages, there's an unstated implication, that since the model shows an average player with 16 HR drops to 8, you can extrapolate to say that a player with 40 HR will drop to 20. But that would have to be backed up by evidence or argument. And the paper provides neither.


To summarize:

-- the model assumes all players have the same peak age, and the same declines from their peak (which is another way of saying that it assumes that all players have the same shape of trajectory.)

-- it does assume some players (Barry Bonds) have a higher absolute peak than others (Jose Oquendo), but still have the same shape of career.

-- it assumes that all players rise and decline annually by the same absolute amount. In the agespan it takes for a 10-triple player to decline to 5 triples, a 6-triple player will decline to 1 triple, and Willie Aikens will decline to -5 triples.

What can you get out of a model like that, with its unrealistic assumptions? I think that you can reasonably look at the peak and shape as applied to some kind of hypothetical composite of the players used in the study. But I don't think you can go farther than that, and make any assumptions about other types of players.

So: when Bradbury's study comes up with the result that his sample of players peaked at 29.5 years (for Linear Weights), I think that's probably about right -- for his sample of players. When he says that the average home run hitter loses 8 home runs between 30 and 40, I think that's probably about right too -- for his sample of players.

My main argument is not that the model is unrealistic, and it's not that there's something wrong with the regression used to analyze the model. It's that the sample of players that went into the model is biased, and that's what's causing the peak to be too high.

Bradbury's model works for his sample -- but not for all baseball players, just the ones he chose. Those were the ones who, in retrospect, had long careers.

To have a long career, you have to keep up your performance for many years. To keep up your performance for many years, you need to have a slower decline than average. If you have a slower decline than average, a higher proportion of your value comes later in your career. If a higher proportion of your value comes later in your career, that means that you'll have an older-than-average peak.

So choosing players with long careers results in a peak age higher than if you looked at all players.

Bradbury disagrees. He thinks that Hall of Fame players may have a significantly different peak than non Hall-of-Fame players, but doesn't think that players with long careers might have a different peak than players with short careers.

That really doesn't make sense to me. But Bradbury has evidence. In his response to my post, he reran his study, but for all players with a minimum of 1000 PA, instead of his previous minimum 5000 PA. That is, he added players with short careers.

He found no difference in the peak age.

That's a pretty persuasive argument. I argued A, Bradbury argued B, and the evidence appears to be consistent with B. No matter how good my argument sounds, if the evidence doesn't support it, I better either stop arguing A, or explain why the evidence isn't consistent with B.

Still, the logic didn't seem right to me. So I spent a couple of days trying to replicate Bradbury's study. I wasn't able to duplicate his results perfectly, but many of them are close. And I'm not sure, but I think I have an idea about what's going on, why the evidence is consistent with A. That is, why Bradbury's 1000+ study comes up with a peak of 29 years, while other studies have come up with 27.

I'll get to that in the next post.

Labels: ,

Monday, November 16, 2009

Selective sampling and peak age

Back a couple of years ago, I reviewed a paper by J.C. Bradbury on aging in baseball. J.C. found that players peak offensively around age 29, rather than the age 27 found in other studies.

I had critiqued the study on three points:

-- assuming symmetry;
-- selective sampling of long careers;
-- selective sampling of seasons.

In a blog post today, J.C. responds to my "assuming symmetry" critique. I had argued that if the aging curve in baseball has a long right tail, the median of the symmetrical best-fit curve would be at a higher age than the peak of the original curve. That would cause the estimate to be too high. But, today, J.C. says that he tried non-symmetrical curves, and he got roughly the same result.

So, I wondered, if the cause of the discrepancy isn't the poor fit of the quadratic, could selective sampling be a big enough factor? I ran a little experiment, and I think the answer is yes.

J.C. considered only players with long careers, spanning ages 24 to 35. It seems obvious that that would skew the observed peak higher than the actual peak. To see why, take an unrealistic extreme case. Suppose that half of players peak at exactly 16, and half peak at exactly 30. The average peak is 24. But what happens if you look only at players in the league continuously from age 24 to 35? Almost all those players are from the half who peak at 30, and almost none of those guys are the ones who peaked at 16. And so you observe a peak of 30, whereas the real average peak is 24.

As I said, that's an unrealistic case. But even in the real world, you expect early peakers to be less likely to survive until 35, and your sample is still skewed towards late peakers. So the estimate is still biased. Is the bias significant?

To test that, I did a little simulation experiment. I created a world where the average peak age is 27. I made two assumptions:

-- every player has his own personal peak age, which is normally distributed with mean 27 and variance 7.5 (for an SD of about 2.74).

-- I assumed that for every year after his peak, a player has an additional 1/15 chance (6.6 percentage points) to drop out of the league. So if a player peaks at 27, his chance of still being in the league at age thirty-five is (1 minus 8/15), since he's 8 years past his peak. That's 46.7%. If he peaks at 30, 35 is only five years past his peak, so his chance would be 66.7% (which is 1 minus 5/15).

Then, I simulated 5,000 players. Results:

27.0 -- The average peak age for all players.

28.1 -- The average observed peak age of those players who survived until age 35.

The difference between the two results is the result of selective sampling. So, with this model and these assumptions, J.C.'s algorithm overestimates the peak by 1.1 years.

We can get results even more extreme if we change some of the assumptions. Instead of longevity decaying by 1/15, suppose it decays by 1/13? Then the average observed age is 28.5. If it decays by 1/12, we get 28.9. And if it decays by 1/10, the peak age jumps to 30.9.

Of course, we can get less extreme results too: if we use a decay increment of only 1/20, we get an average of 27.6. And maybe the decay slows down as you get older, and we might have too steep a curve near the end. Still, no matter how small the increment, the estimate will still be too high. The only question is, how much too high?

I don't know. But given the results of this (admittedly oversimplified) simulation, it does seem like the bias could be as high as two years, which is the difference between J.C.'s study and others.

If we want to get an unbiased estimate for the peak for all players, not just the longest-lasting ones, I think we'll have to use a different method than tracking career curves.

UPDATE: Tango says it better than I did, here.

Labels: ,

Friday, November 13, 2009

Consumer Reports alarmist on reverse mortgages

(Warning: non-sports post.)

In their September issue, Consumer Reports issues another muddled panic about a financial product; this time it's reverse mortgages.

Basically, a reverse mortgage is a loan you take out using your house as collateral. Normally, you'd do that with a line of credit -- you borrow the money as you need it, and make at least your minimum payment every month (as interest accrues). It's like a credit card, but with a much lower interest rate because it's backed by your house.

The reverse mortgage is also a loan on your home equity, but it's meant for poorer elderly people who don't have the income to make payments on the loan. With the reverse mortgage, you still get the loan, but the interest accumulates and compounds, and you don't have to pay it back until you move out of the house (or die). The idea is that when you're no longer living in the house, you sell it and use the proceeds to pay off the loan.

What if the loan has compounded so high that the value of the house isn't enough to pay it off? In that case, the borrower is off the hook. One of the benefits of the reverse mortgage is that the borrower is never on the line for more than the house itself.

As CR points out, this benefit has a price: the borrower winds up paying for "insurance" against that happening, insurance that tops up the loan if the house is eventually not worth enough. It's government insurance, and comes with government regulations on reverse mortgages. For instance, you have to be over 62, and you have to do lots of expensive legal paperwork.

It's called a "reverse mortgage" because it's often taken out to provide a stream of payments to supplement social security. That stream of payments is backwards from a normal mortgage: instead of you paying off the mortgage every month, the mortgage pays you.

My feeling, and CR's too, is that a reverse mortgage is a reasonable thing to do if you plan to stay in your house forever, and won't need money afterwards (because either you've died, or you're so ill you move to a nursing home paid for by government). Why die with money in the bank (or equity in your house)?

Another benefit of the reverse mortgage is that it sometimes it can provide the only way to get a lump sum of money in case of sudden need, like a medical emergency.


So what's CR's problem with reverse mortgages? They have a few. Some of them are not completely unreasonable. CR gives stories of people being sold expensive reverse mortgages in order to use the money for inappropriate investments, which is certainly a bad thing. But that's not the fault of the reverse mortgage -- seniors are sold questionable financial products all the time, and sometimes persuaded to borrow money in other ways.

And CR gives examples of seniors who didn't really understand what they were getting into. For instance, sometimes salespeople hand customers overoptimistic projections of what their house will be worth, misleading them about the amount of equity that will be left for their children to inherit. But, again, biased salespeople are hazards of any financial transaction.

CR is also concerned that due to the housing meltdown, a lot of reverse mortgages end in the red, where the government-sponsored insurance comes into play. The payouts have started to exceed the premiums paid by borrowers, and CR is concerned about the burden on the taxpayer. It could be "the next financial fiasco."

I don't really understand their concern. In 2008, the worst year of the housing crisis, the fund only had to pay $400 million in claims. Suppose they pay at that rate for five years. That's about $6 per American. Compare that to the $7 trillion bailout, which is $13,000 per American. Is it really worth worrying about a $6 "fiasco" while ignoring the $13,000?

Not only is it a small amount of money, but you could argue that it's money well spent. Reverse mortgage insurance money is not a gift to the irresponsible: it's part of a social program that allows senior citizens to hold on to their homes, while living better in their old age. I'm not a big fan of government spending, but such a small sum, for such a good purpose, as the result of once-in-a-lifetime anomaly in housing prices, is probably 1000th on my list of government policy issues people should be concerned about.


But the thing that *really* bugged me about the article is the lead anecdote that purports to show the human side of why reverse mortgages are harmful. But, as with their medical credit card screed last year, they got their conclusion completely backwards! Their example actually shows a reverse mortgage that handsomely rewarded the borrower.

When Ernest Minor was 61, his wife had serious medical problems. The Minors still owed $70,000 on their home. They took out a reverse mortgage loan for $176,000. $70,000 of that simply replaced the outstanding mortgage balance. $15,000 went for fees and insurance. And $92,000 went to pay the medical bills.

Because Mr. Minor was not yet 62 (but his wife was), they had to transfer the deed to Mrs. Minor's name in order to be eligible -- which they did.

Mrs. Minor died two years later. That made the loan come due. With interest, it's about $200,000. Obviously, Mr. Minor can't afford to pay that, so he will lose his home.

Sure, it's sad that Mr. Minor will lose the house he lived in for so many years. But, after all, he needed $92,000 for medical bills. Without the reverse mortgage, what would the Minors have done? They'd have sold the house, rented an apartment, and used the proceeds to pay the bills. They would have lost the home immediately. But, with the reverse mortgage, they got to live in the house a couple more years. Plus, if Mrs. Minor had gotten better, they would have been able to stay there indefinitely! Of all the Minors' options, the reverse mortgage was, in fact, the very best way to handle the situation.

CR is upset that the broker had Mr. Minor take his name off the deed to get the loan -- if he hadn't, he'd still be allowed to stay in the house until he moved out. But, unfortunately, government regulation gave him no choice! Instead of picking on the salesman, maybe CR should lobby the government to lower the minimum age, or at least allow one of the spouses to be under 62.

Anyway, Mr. Minor benefits even further. He owes $200,000. But because of the real estate meltdown, his home is only worth $130,000. You'd think he'd be in the hole to the tune of $70,000. But he's not! Remember, with the reverse mortgage, you can't owe more than the house is worth. Mr. Minor can walk away, because his $70,000 deficit is covered by the insurance he bought!

Let's do an accounting. Suppose that before the real-estate crisis, the house was worth $300,000 (which is about right, based on what CR tells us about what percentage of the home's value is loanable). If Mr. Minor had raised the $92,000 some other way, via a conventional loan, he'd be liable for that $92,000. He'd still owe $70,000 on his mortgage And he'd have made maybe $20,000 in interest payments over the three years on that $162,000.

So his total owing would be $182,000. His house would be worth only $130,000. His net worth would be negative $52,000.

But, with the reverse mortgage, he just walks away! He loses the house, but his net worth is $0. The reverse mortgage saved him $52,000! Of course, most of that savings came from avoiding the housing crisis, but still -- rather that Mr. Minor being a reverse mortgage sob story, he should be a success story!

CR prints a half-page photo of a sad Mr. Minor, holding a photo of his deceased wife. It's indeed very sad and moving that the illness lost him his wife and his house. But the reverse mortgage was the lone, small bright spot. It wound up saving Mr. Minor thousands and thousands of dollars. And CR didn't even notice.


Saturday, November 07, 2009

Do younger umpires call a more accurate strike zone?

In a post on the economic incentives facing would-be umpires, J.C. Bradbury has an interesting study on how older umpires are more likely to have a larger or smaller strike zone.

Bradbury ran a regression, to predict an umpire's season strikeout-to-walk ratio based on his age and who he is. He found that the older the ump, the more different his strike zone size. He writes,

"It turns out that every year an umpire ages he increases his deviation from the league average by about 0.8% [.16 change in K/BB ratio]. That doesn’t seem like a lot—and it really isn’t a huge effect—but over a period of 12 years that pushes the umpire a full standard deviation (9.5%) [.19] above/below the average deviation. Thus, by the end of an umpire’s career, his calls are about two standard deviations from the typical deviation. This is evidence of a tenure effect or a loss of competency." [square brackets mine.]

How big is that effect? Well, 0.19 might be the difference between a K/BB ratio of 2 and a ratio of 2.19. Assuming the same number of total K+BB, that means that instead of (say) 20 strikeouts for every 10 walks, there would be 20.6 strikeouts for every 9.4 walks. That turns 0.6 walks into strikeouts per 30 K+BB events, which is about 0.35 runs.

In 2009, there were 10.33 such events per game (per team), not 30. That means a standard deviation is worth a bit over a third of .35, or .12 runs per game. So a pitcher's ERA might go up or down by .11. For two standard deviations, it would be .24 runs or .22 earned runs. Assuming both teams are equal, there would obviously be no effect on who wins the game (although it seems likely that pitchers may be affected differently).

What's that in terms of pitches? A study I did (.pdf, page 4) finds that the difference between a ball and a strike is about 0.14 runs. So, at 2 standard deviations, an umpire calls about 1.7 pitches differently per game, per team. That means that more than 95% of umpires are less than 1.7 pitches per game different from the mean.

But since the pitchers and batters are presumably aware of the differences between umpires, they would adjust accordingly. So the effect might be more than .12 runs per SD -- it could be .12 for the actual strikeouts and walks observed, but it might cost the batter another .12 (or some other number) in having to swing at bad pitchers (which would be called balls by another umpire).

And, of course, even umpires who call normal numbers of strikeouts and walks might have their own particular strike zone -- it might be the same size, but have a different shape or location.

Bradbury concludes that, as umpires gain experience, they get more confident and less reverent, and feel less of a need to stick to the league's interpretation of where the strike zone should be. That sounds reasonable, especially considering that Bradbury's regression controlled for calendar year.

It's worth reading Bradbury's entire post, which includes a list of umpires and their individual K/BB ratios.


(Note: post was updated in response to an e-mail from Guy, who pointed out I had misinterpreted Bradbury's percentages and got the effect being half of what it should be. I think it's correct now.)

Labels: ,

Thursday, November 05, 2009

Do field-goal kickers do worse in the clutch?

My favorite studies are ones that don't need any fancy math or statistics, but those where you can just look at the data and answer the question almost instantly.

Brian Burke, of "Advanced NFL Stats," had one of those last week. He wondered: is there an overall "choke" effect for field goal kicking? Are kickers less likely to make their kick when the game is on the line, either because of nervousness, or because defenses change their strategy?

The answer appears to be: no. Adjusted for distance, the clutch success rates track the overall success rates almost exactly, except for one blip in the data at 44 yards.

Of course, that doesn't mean that *no* kicker is different in the clutch. The data are consistent with the possibility that only one or two kickers are clutch or choke, which wouldn't be enough to show up in the graph. It's also possible that half of all kickers are clutch, and the other half are choke, and they exactly offset each other so there appears to be no effect.

But in view of the baseball evidence, which shows only very, very slight "clutch" variation among hitters, it doesn't seem likely that field-goal kickers would be significantly clutch.

Furthermore, considering how much data it took to test the clutch hypothesis for baseball, it's probably impossible to find an effect for kickers, even of the same rough size as for batters (less than 3% variation in success rate). Baseball hitters get several hundred opportunities in a season; kickers get maybe 30 or 40. If an effect for individual kickers exists in the NFL, it would have to be huge to have any chance of being detected.

Labels: , ,