Sabermetric Research: July 2006

Sunday, July 30, 2006

Are pitchers from the South too politically correct to hit black batters?

Are pitchers from the South too genteel to plunk black batters? Maybe.

In this academic study posted at the Retrosheet research page (and filled with psychological jargon about “social identity,” “hostile attributional bias,” and such), author Thomas A. Timmerman studies a few decades worth of Retrosheet HBP data and finds some anomalies among Southern-born pitchers.

Timmerman used a logistic regression to predict the probability of a hit batsman from a whole bunch of factors, including pitcher wildness, batter quality, etc. He specifically considered three retaliation situations: (a) whether the previous batter hit a home run; (b) whether this particular batter hit a home run his last AB against this pitcher; and (c) whether one of the pitcher’s own teammates had previously been hit by the opposing pitcher.

He also considered race of the batter, and only studied HBPs by white pitchers.

The basic results were pretty much as you’d expect. Batter quality (OPS) was the most significant factor, twenty standard deviations above the mean. The year, score, and wildness of the pitcher were also highly significant.

The most important retaliation factor was whether the pitcher’s teammate was hit – if he was, that increased the logit of the HBP probability by 0.321, which translates to about a 38% increase in the chances of a HBP.

(Mathematical explanation which you can skip: If I remember my logistic regression, the 0.321 is the log of the number you need to multiply the odds by. Specifically: the antilog of 0.321 is about 1.38. In the entire study, the odds of a HBP were 20,357 to 3,973,869. To get the odds after the teammate is hit, you multiply by 1.38, and the new odds are now 28,093 to 3,973,869. This is equivalent to a probability increase from .00512 to .00707, which is a 38 percent increase. So hitting a batter increases the change of your own team getting hit by 38%.

The results in the author’s tables are presented only as logit coefficients. That’s kind of annoying, and you have to read the text to get the percentages out – or figure them yourself by studying all the interactions. It took two of my fourth-year statistics courses for me to be able to figure out the table. I hope I got it right.)

Second most important was if the previous batter hit a home run – that would increase the odds by 19%. Third, if the same batter had hit a homer last at-bat, that would increase the odds by 11% (which wasn't enough to be significant at the 5% level). And, finally, there was no significant difference between black batters and white batters.

So far, nothing unusual. But now, Timmerman considered the birthplace of the pitcher. This is where the shock starts:

For case (a), when the previous batter hit a home run, normally there would be a 19% increase in HBP. But when the previous batter hit a home run and the pitcher was from the south, the increase was 22%, suggesting that southern pitchers are 16% more vindictive than average.

But look at the breakdown by race. If there were no discrimination on the part of the pitchers, you’d expect equal results: a 22% increase against white batters, and a 22% increase against black batters. But instead, there was a 55% increase among white batters, and a 4% decrease among black batters!

For case (b), when a black player hit a home run off the southern pitcher his previous time up, he was 10% more likely to be hit than average. But when a white player hit a home run off the southern pitcher his last time up, he was 50% more likely to be hit.

Finally, for case (c), southern pitchers did still exact revenge on black batters – 12% more HBPs than normal. But for white batters, it was 55% more. (Also notable is that southerners were no more likely than Northerners to hit a batter just because their own teammate was hit. That is, they were more eager to avenge themselves, but no more eager to avenge a teammate.)

So black batters are treated very, very nicely by Southern pitchers. Timmerman suggests the southerners are motivated by a stronger desire not to appear racist. Or, he says, perhaps it’s that pitchers from the South are afraid of blacks, and worry that they’ll charge the mound if hit. Personally, I think the first explanation is more likely.

Either way, the results are still freak me out. I never would have expected this.

----
(late note: found a calculation error in para beginning "For case (a)". Correct number is 22%. Now fixed.)

Friday, July 28, 2006

Tom Ruane on value added

The “Value Added Approach” to offense, first described by Gary Skoog in the 1987 Baseball Abstract, credits the batter (and debits the pitcher) with the difference in run potential caused by the plate appearance.

For instance, in the 1982 American League, there was an average 0.500 runs scored per inning. With one out and nobody on, the future potential dropped to 0.265 runs. So if Alfredo Griffin led off an inning with a ground ball out, he would receive the difference, or negative 0.235 runs.

In this 2005 study published on the Retrosheet research page, Tom Ruane thoroughly analyzes and computes value added runs for every player from 1960 to 2004. “Thoroughly” is probably not strong enough – Microsoft Word’s thesaurus also lists “methodically,” “carefully,” “systematically,” “painstakingly,” “meticulously,” “scrupulously,” “comprehensively,” and “exhaustively,” and it probably takes all of them added up to equal the study’s level of detail.

First, Tom generates run potential charts and linear weights values for each league and year. Then, he presents the raw numbers for batters, both season and career. Then, he adjusts for park. Then, he adjusts for position. Then, he carefully explains how it happens that Denis Menke figured to lead the National League in 1970. Then, he compares the method to linear weights, and shows which players varied the most in the two measures, either by hitting well or poorly in the clutch.

Then, he repeats all that for pitchers (except the Denis Menke part).

Tom argues that this method is probably not as good as Linear Weights in predicting future performance, because value added contains lots of luck (clutch hitting and situations faced) that do not repeat from year to year. I agree with Tom, and Value Added is not my favorite offensive stat for that reason.

The hidden surprise is that Tom gives us the 90 sets of full, year-to-year and league-to-league linear weights values (embedded partway through the essay), full park factor data based on the run potentials (linked to in the text), and finally the 90 run-potential tables (again via link). Even if the rest of the study was missing, the free data would be enough to ensure the awesomeness of Tom's research.

Ripsometrics

OK, so it's probably not really a sport, and few details are provided, but Steven Levitt (Freakonomics) is obviously doing some kind of scientific experiment with sabermetric overtones.

A non-sabermetric approach to strategy can be found here.

And I promise to get back to baseball real soon.

Thursday, July 27, 2006

"Do Dramatic Wins Matter?"

From the Retrosheet research page, a study by Brian Connolly checks whether dramatic ninth inning victories translate into a psychological boost that means more wins in the future.

The answer: no.

But there was one anomaly: teams that came from one run behind today turned out to have won yesterday's game at a .580 rate (versus only .508 on the season). No idea why that would be the case.

Hockey study: how much is a faceoff worth?

A recent issue of JQAS contains an interesting academic study on hockey that reaches a few conclusions about strategy through Markov Chain analysis.

(I tried to write an easy primer on Markov chains, but I should get it read by people who know what they’re talking about. For Markov chains in baseball, the world leader is Mark Pankin, who has done a whole load of batting order studies.)

The author, Andrew C. Thomas, divided a hockey game into 19 states. Nine states are the 3x3 combinations of team in possession of the puck (team A, team B, faceoff) and zone (team A’s zone, team B’s zone, neutral zone). Two more states are faceoffs at the respective blue lines. Two more states are goals having been scored. Four states are possession in the offensive or defensive zone after a turnover. And the last two states are possession in the defensive zone after deliberately retreating to avoid a forechecker.

Based on observations of 18 games of the Harvard men’s hockey team, he then calculated probabilities of moving from one state to another. Having figured the states and the probabilities, this allowed him to use Markov Chain techniques to analyze certain aspects of the game.

I don’t understand everything Thomas did, but it seems more complicated than I thought it would have to be. For instance, the study does a lot of work to include a continuous time factor in the Markov Chain. In reality, it doesn’t matter how long it takes a team to move from the defensive zone into the neutral zone – there’s no 24-second rule in hockey, so you can take as long as you want. All that extra complicated math (over the use of a discrete-time Markov chain, like a baseball lineup) doesn’t seem to add much to the conclusions the study draws.

Also, I’m not able to figure out, from the study, how Thomas gets his probabilities. I would have thought he would just watch the games, watch how often things happened, and use those observations as probabilities. But he does something more complicated – “Bayesian inference with a multinomial/Dirichlet model.”

(I’m not an expert on Bayesian, but I know you use that kind of model when you have prior information on what to expect. For instance, if a player goes 3-for-4, the naïve statistician would estimate that he’s a .750 hitter. The Bayesian statistician would note that he can’t be a .750 hitter, because hitters are normally distributed with a bell-shaped curve that ends well below .400. The Bayesian approach is to say, what can you expect from the 3-for-4 hitter *given* that he’s pulled from that normal (prior) distribution? And the answer might be, he’s a .275 hitter on average.)

The implication is that there is prior knowledge about what that number should be, and so even if the state goes from defensive zone to neutral zone 75% in real life, you can’t take that figure at face value. But I can’t figure out what that knowledge is – why, if the observed proportion of pucks brought out of the zone is 75%, the study wouldn’t just go ahead and use 75%.

Or maybe I just don’t understand the Bayesian technique at all.

Anyway, given the model, Thomas comes up with these findings:

After 40 seconds, the current situation is no longer dependent on the starting situation. That is, if you start out in your own zone, you’re less likely to score in the first ten seconds than if you’re in the opponent’s zone. But you’re *equally* likely to score between the 40th and 50th second no matter where you start.
Carrying the puck into the opponent’s zone is, in terms of goal differential, almost exactly as valuable as the dump-and-chase. However, the dump-and-chase leads to a slightly lower probability of either team scoring, and so perhaps is slightly worse when behind by one goal in the closing minutes of a game.
If you start with the puck in your own zone, you should expect to be outscored by .0043 goals over the next 40 seconds. That is, every 233 own-zone starts cost you one goal.
If the other team has the puck in your zone, you should be outscored by .0258 goals . That’s one goal for every 39 possessions.
If you give the puck away in your own zone, it only costs you .0244 goals (1 in 41). That’s actually less than if the opposition brings the puck in themselves.
It’s one goal for every 47 faceoffs won in the offensive zone, one in 143 for neutral zone faceoffs, and one in 67 for faceoffs at the blue line.

Wednesday, July 26, 2006

Frederick Mosteller and Poisson golf scores

Baseball Toaster’s Bob Timmermann blogs the death of Frederick Mosteller, a sabermetric pioneer who published on baseball in 1952.

From King Kaufman’s article in Salon:

Mosteller's Washington Post obituary, like various other online citations, credits "The World Series Competition" as "the first known academic analysis of baseball."

It showed that even a very good team relies heavily on luck in winning a short series. "The probability that the better team wins the World Series is estimated as 0.80," the abstract reads.

Pretty simple stuff now, but we stand on the shoulders of giants, etc.

The reason I run this note, redundant as it is following the postings of Bob and Mr. Kaufman, is that Bob sent me a link to another Mosteller abstract where we learn something about golf:

Professional golf players on the regular tour are so close in skill that a few rounds do little to distinguish their abilities. A simple model for golf scoring is "base + X" where the base is a small score for a round rarely achieved, such as 64, and X is a Poisson distribution with mean about 8.

I assume the base and Poisson mean vary by golfer.

Does a soccer team playing well have a better chance of winning?

A study of mine a few years ago found an unexpected result: even after a starting pitcher gives up three or more runs in the first inning, he returns to form for the rest of the game, pitching to the level of his overall ability. From just the first inning, you’d expect that he doesn’t have his stuff that day, and that he’d continue to pitch poorly, but he doesn’t. Just because the opposition dominates him for a short period doesn’t mean they will continue to dominate him. In effect, this kind of in-game “momentum” does not exist.

Is the same true in soccer?

I would argue that the answer is yes – based on a closer look at the data in a recent economics study by Ricard Gil and Steven Levitt.

(Levitt is the co-author of the bestselling book Freakonomics (my review), and it was through an entry on his blog that I learned of his study.)

The paper looked at data from the online gambling site intrade.com. Intrade allows bettors to make bets with each other even while the game is on. The odds, of course, will change as events unfold in the game, and are determined by supply and demand, like prices in the stock market.

Gil and Levitt found that the betting market is very efficient, in the following ways:

When a goal was scored, the price of a bet on the winning team increased immediately;
Even in retrospect, it would have been difficult, to the point of nearly impossible, to profit by following any specific betting strategy;
When a mathematical opportunity arose for making a certain profit (for instance, better-than-even odds of team A winning combined with better-than-even odds of team A not winning), alert bettors seized the opportunity in less than fifteen seconds;
And market makers, who offer to take both sides of the bet at different odds hoping to make money on the difference, actually lost money overall, as sophisticated bettors took advantage of mispriced bets.

All this suggests that the intrade market correctly captures the chances of winning at any given time.

Which brings us back to the momentum issue.

In their chart on page 19, Gil and Levitt show that the intrade odds, and therefore the game probabilities, are constant in the fifteen minutes before a goal is scored.

Now, in any game, one team will be playing better than the other, even dominating the game. You would expect it to be that team that eventually scores the goal. But even though bettors are watching the game and can see which team is playing better, they did not bid up the odds of the dominating team winning before the goal was scored!

That is, suppose team A and team B initially have equal odds of winning. Team A comes out flying in the first half, dominating the play and getting several good shots on goal. After 15 minutes, they finally score a goal, making it 1-0. When they score, the odds immediately adjust. But in the 15 minutes preceding the goal, while team A was dominating, its odds did not change. You could have bet A at the same odds as before the game even started.

This very strongly suggests that dominating the play does not tell us anything about the chances of eventual victory. It means that domination is “random,” that either team may dominate at any time, and that teams dominating one minute are equally likely to be dominated the next.

Put another way: if you can’t predict the next fifteen minutes from the previous fifteen minutes, it means that teams don’t appear to have “on days” and “off days.” They just play to the level of their ability, and the appearance of domination is, like the “hot hand” in basketball, simply an illusion.

I’m surprised by this. But I’m even more impressed that bettors on intrade already knew it. The workings of markets are truly amazing.

Tuesday, July 25, 2006

Ball trajectories: do they increase the reliability of offensive stats?

In their JQAS article (mentioned below) “Who Controls the Plate? Isolating the Pitcher/Batter Subgame,” the authors adjust every player's statistics by where he hits the ball. For instance, if a player hit five line drives to an area of the outfield where, historically, 80% of balls have been hits, the player would be credited with the equivalent of 4-for-5, regardless of the actual results.

(Recently, ProTrade used the same idea to figure out which players and teams were having lucky years.)

The idea, I guess, is that this gives you a more realistic picture of the player’s performance than counting the actual results. After all, if two players on two different teams hit exactly the same ball, but one was caught by team A’s gold-glove right fielder, while the other was missed by team B’s slow-footed old guy, why not treat them equally? After all, it’s not B’s ability that got him the hit. He was just lucky to be facing an inferior defense that game.

But there’s another side to it. For some hits (line drives specifically), the difference between the hit areas and the out areas is quite small -- a few feet.

Suppose again that the two players hit the ball exactly the same way, but this time the two fielders were playing in slightly different positions. Under the “credit them by where the ball was hit” method, they both get treated the same. But what if one saw where the fielder was standing, and deliberately hit a line drive in front of him to land for a hit? In that case, they shouldn’t be treated the same, because, even though they did the same thing, player A did it because it was the better thing to do, while player B failed to do the better thing and hit the ball a bit differently.

And, of course, there are many cases of the defense repositioning itself for a pull hitter. For those hitters, the “go by where the ball was hit” method won’t work. With Barry Bonds batting, the defense shifts from where the average player is likely to hit the ball, to where Barry Bonds is more likely to hit the ball. Bonds hits lots of balls that would drop in for hits if, say, Randy Winn hit them, but are easy outs with the defense shifted. The trajectory method would be completely invalid for a player like Bonds, unless you had a separate set of data for a defense in the Bonds configuration.

Given all that, I’m thinking it might actually be less accurate to go by where the ball was hit, and you’re better off just recording whether it was a hit or not. The simpler way, sure, you’re going to credit players with hits and outs they didn’t deserve. But those come randomly, and there are ways to handle random errors. The location way, the results are biased for many players -- and that can ruin a lot of potential studies. (The JQAS study is not really affected because it doesn't depend on the numbers of any specific players.)

But I don't really know which is better. You could find out buy running a study -- just see if players' trajectory-adjusted stats tend to predict future batting average better than batting average itself.

Tangotiger on leverage and reliever usage

From Tangotiger (AKA Tom M. Tango), a three-part article on leverage, which is the assessment of a situation’s impact on who wins the game. (Down 13-1 in the fifth has low leverage, but tied in the ninth with the bases loaded has very high leverage.)

Tom talks about four different ways to measure leverage and which one is best, and provides a chart of the leverage for every situation. The highest, if I read the chart correctly, occurs when down by a run, in the bottom of the ninth, with the bases loaded, two outs. That plate appearance has 10.9 times as much impact as average on who wins the game.

The idea, of course, is that you want to get a handle on these situations to help you save your best pitchers for situations with the highest leverage.

In Part 2, Tom suggests a change to my own “relative importance” number from a previous article in BTN (.pdf, see page 7). Tom is correct – his measure is better. But for the specifics of my BTN analysis, the two methods are almost identical.

Tangotiger’s own website is worth a look – there’s lots of other good stuff there, some of which I’ll get to here eventually.

Hockey game "entertainment value"

Remember Bill James’ “game score” for roughly rating a starting pitcher’s performance based on his pitching line? Here, hockeyanalysis.com’s David Johnson presents a similar intuitive formula for determining an NHL hockey game’s entertainment value. Basically, points are awarded for (a) close games, (b) hitting and fighting, and (c) goals and shots.

You can quibble with the methodology – one of the worst-ranking games is Atlanta 9, Carolina 0, which I think would be pretty entertaining if I were a Thrashers fan – but it seems reasonable overall.

It’s not really serious analysis, but I like it anyway. Besides, it’s a rare measure where my beloved Toronto Maple Leafs rank number one.

Are some managers as important as superstars?

At Baseball Think Factory, Chris Jaffe has an amazing two-part (part 1) (part 2) article on how many extra runs managers throughout history have produced for their teams.

Basically, Chris took the methodology that I used to rate “lucky” teams (powerpoint slides here), but allotted the luck to the manager.

By this system, there are five factors considered:

· Did the manager beat the team’s Pythagorean Projection?
· Did the manager beat his team’s Runs Created estimates?
· Did the manager’s opponents underperform their Runs Created estimates?
· Did his hitters have lots of “career years” where they played better than expected?
· Did his pitchers have lots of “career years” where they played better than expected?

In my study, I suspected all five of these things to be just luck. But Chris found a huge manager effect – in fact, that some managers did so well in these five categories that their apparent influence was greater than that of a superstar player. And, when Chris split the managers up by number of games managed, he found that in every case, groups with more games outperformed groups with fewer games.

Furthermore, the managers we all acknowledge as the best are the ones that repeatedly come out on top.

The results are pretty amazing, and bring up lots of questions. For instance: How can a manager consistently beat Pythagoras? If a manager consistently beat Runs Created, doesn’t that mean his teams hit in the clutch? And doesn’t that contradict clutch hitting as random?

For career years, my algorithm was designed for 1960-2001. Chris used it back to the 19th century, and I suspect the farther back to you go, the more it overestimates the effects of luck. So the results might be exaggerated a bit that way. But, still, I would have never suspected that some managers can somehow cause the other team to underperform in Runs Created.

There must be an explanation, but I don’t know what it is.

Once in 175 Years

On July 15, there were no saves recorded in a full slate of 15 games. Baseball Toaster’s Mike Carmanati calculates the odds of that happening as about 1 in 26,000.

With a full slate of games happening, say, 150 times a year, this should be expected to happen about once in 175 years.

Yes, it did happen in 1978. But that was when there were only 13 games a day, not 15, and back when saves only happened in 40% of wins (rather than today’s 49%). Mike doesn’t give the odds for 1978, but for 1979, there was a 1 in 778 chance of a saveless 13-game day.

Monday, July 24, 2006

Reporter misrepresents sabermetrics

From Michael Radano, of the South Jersey Courier Post, July 22:

Phillies manager Charlie Manuel sat at his desk and composed a lineup that tore out the hearts of sabermetrics experts everywhere.
Both Pat Burrell and David Bell were in Friday's lineup against Atlanta starter John Smoltz, with a combined history of 2-for-43 against the pitcher.

For “sabermetrics experts everywhere,” read “sabermetrics experts in Michael Radano’s mind.” Radano gets it absolutely backwards -- sabermetricians have long ridiculed those who consider samples as small as 43 AB to be meaningful.

This spring, the authors of The Book demonstrated that there is no evidence that any pitcher owns any batter, or vice-versa.

How Do Touches and Dribbles Relate to Scoring?

A basketball study at 82games.com (from Roland Beech?) investigates how scoring rate changes depending on how many players touch the ball per second (when the ball is at least within three point range). It turns out that the longer the ball is held, the lower the scoring rate:

Touches per second.....Points per 100 possessions
.01 to .14 ..................... 92.8
.15 to .24 ..................... 92.9
.25 to .34 .................... 105.6
.35 to .44 .................... 114.1
.45+ .......................... 122.5

This shows that teams should try to move the ball around faster. Or perhaps it just shows that the high-scoring opportunities that arise also happen to be the fastest (such as offensive rebounds). The study argues that the effect is a combination of both reasons.

There was no obvious scoring relationship found for number of touches, or number of dribbles.

New Issue: Journal of Quantitative Analysis in Sports

The Journal of Quantitative Analysis in Sports is an online academic journal of sports analysis. The quality of articles, in my opinion, varies, and most of them have a lot of technical statisticky stuff in them that makes parts of them tough sledding. But, in general, they’re very good work and well worth reading.

You ostensibly need a subscription, but the articles can be viewed for free if you give ~~your~~ an e-mail address.

The new issue has five articles:

“Which Ball is the Roundest? - A Suggested Tournament Stability Index” by Torbjörn Lundh. This is more recreational mathematics than sports analysis, discussing the mathematical properties of a particular way to measure competitive balance.

“A Variance Decomposition of Individual Offensive Baseball Performance,” by David Kaplan. This figures how much of a player’s performance comes from his own abilities, and how much of it comes from influences of the team he’s on.

“A Simple and Flexible Rating Method for Predicting Success in the NCAA Basketball Tournament,” by Brady T. West. The author comes up with a regression method to predict how many March Madness wins a team will get -- from zero (losing its first game) to seven (winning the championship).

“Who Controls the Plate? Isolating the Pitcher/Batter Subgame,” by Benjamin Alamar Ph.D., Jeff Ma, Gabriel M. Desjardins, and Lucas Ruprecht. How much of the result of a plate appearance (the “subgame”) is caused by the pitcher’s ability, and how much by the batter’s?

“A Review of ‘The Wages of Wins’” by Roland Beech. A fairly negative review of the recent book by three academic economists.

More details on most of these articles to come.

Which Catchers Are Best At Blocking Wild Pitches?

A recent study by Sean Forman of baseball-reference.com comes up with an excellent method of determining which pitchers are best at blocking balls in the dirt, thus preventing wild pitches and passed balls (which he collectively calls “missed pitches” (MPs)).

Noting that a pitcher’s wildness affects his propensity for pitches in the dirt, Sean ran a couple of regressions on missed pitches versus walks, strikeouts, HBPs, and whether or not the pitcher throws a knuckleball. He found a strong linear relationship.

The regression equations thus estimate of the number of MPs a given pitcher would allow.

Then, if a pitcher’s stats suggest he should be giving up 30 missed pitches, but he only gives up 20, we can conclude that the catcher is responsible for saving the other ten. If, on the other hand, he’s expected to give up 30, but he gives up 40, we can conclude that the catcher is below average.

A typical outstanding year in either direction would be about 20 pitches a season. 30 pitches puts you in the top 3 of the last 40 years.

Sean gave a talk on his method at the recent SABR convention in Seattle, and deservedly won the prize for best research presentation.

The full set of Sean’s slides is here. If you only care about the catchers and not the methods, single season bests (Brian Downing leads) are here. Worsts (Bob Uecker, in only 68 games!) here. And here are career bests and worsts.

Sabermetric Research