Wednesday, May 29, 2013

The OBP/SLG regression puzzle -- Part III

I ran regressions in the previous posts to predict winning percentage from on-base percentage and slugging.  In those regressions, I had adjusted all teams to the league SLG and OBP.  I had to.  If you don't adjust, the results vary a lot.

Here's the regression completely unadjusted.  (It's all teams from 1961 to 2009, except strike seasons.)  Here's the equation.  (I'll put the OBP/SLG coefficient ratio in brackets too.)

wpct = (2.19*OBP) + (0.07*SLG) - .2405  [ratio: 30]

That's an OBP/SLG ratio of over 30!  We were expecting 1.7.  It seems like slugging barely matters at all!

Compare that to the "regular" regression, which adjusts for league-season:

wpct = (2.70*OBP) + (0.89*SLG) - .7843   [ratio: 3]

OK, that's a bit better.  The ratio is down to 3.

Guy argued, in the comments to the first post, that I need to adjust for park, too.  He's right.

If I change winning percentage to what it would be if the team had posted those stats in a neutral park -- while still keeping the league adjustment -- I get this equation:

wpct = (2.65*OBP) + (1.09*SLG) - .8504  [ratio: 2.43]

Even better: we're down to 2.43!

An easier way might be just to not adjust anything, but include the league and park in the regression:

wpct = (2.63*OBP) + (1.15*SLG) - (2.58*league OBP) - (1.16*league SLG) - (0.0029* BPF) + 0.091  [ratio: 2.1]

Now, the ratio is all the way down to 2.1.

What's going on?

This one's pretty simple.  When a team has a high OBP or SLG, it's a combination of two things:

-- batting talent, and
-- a high run environment for the league and park.

The first one actually has an impact on winning percentage.  The second one doesn't.  A high SLG doesn't help you if it's caused by the park, because the opposition benefits from it too.

The same is true for OBP.  But ... SLG should be affected *more*.  There are more high-HR and low-HR parks than there are, say, low-walk parks.  The "steroids era" was mostly home-run related.

Comparing 1968 to 2001:

1968: OBP .299, SLG .340
2001: OBP .332, SLG .427

OBP increased 11 percent, but SLG increased 26 percent.

So, when you don't adjust, slugging doesn't matter as much, because it benefits the opposition too.  That makes OBP look more important, relatively speaking.

(All credit for this finding goes to Guy ... he actually explained all this to me in his comment.)

-----

As you'd expect, the problem goes away when you combine team offense with opposition offense in the same regression.  Even without adjusting, you don't have a big problem, because both teams are affected the same way.

I used the *differences* between team OBP/SLG and opposition OBP/SLG, without any adjustument, and got

wpct = (2.09*OBP) + (0.897*SLG) + .5  [ratio: 2.33]

That's a ratio of 2.33.

-----

But why do we need to care about the opposition at all?  Commenter Alex suggested that if we try to predict "runs per game" instead of "winning percentage," we'll get even better results, because the opposition won't matter.

I'm checking that out for a future post.

------

Update: that future post, part IV, is here.

Labels: , , ,

Monday, May 27, 2013

The OBP/SLG regression puzzle -- Part II

A point of OBP seems to be worth 2+ times as much as a point of SLG, when you run a regression on team performance.  But, when you look at the marginal effects on an average team, you get 1.7.  Why the difference?

Last post, I suggested two reasons:

1.  Different rates of increasing returns on OBP and SLG;

2.  Different values of OBP and SLG depending on ratio of singles/walks/power.

My argument then was mostly about #1.  But, now, I realize the answer is probably almost all #2.  The evidence was there all along.

When Tango did the analysis that got him the 1.7 factor, he showed the OBP and SLG run equivalents for the various events.  Here they are:

1B:  actual 0.474,  estimate 0.485
2B:  actual 0.764,  estimate 0.786
3B:  actual 1.063,  estimate 1.087
HR:  actual 1.409,  estimate 1.389
BB:  actual 0.336,  estimate 0.313
out:  actual -.302,  estimate -.286

("Actual" refers to the known, accepted values from other methods; "Estimate" refers to the approximation from the OBP/SLG method.)

The values are reasonably close, but not exact.  The differences are:

1B: -0.014
2B: -0.022
3B: -0.025
HR: +0.020
BB: +0.023
out: -0.016

The discrepancies are actually pretty large.  Why?  And, why are there discrepancies at all?

Because: there just isn't a way to get an OBP/OPS linear relationship to be as accurate as one where you look at the underlying events.

It's like ... suppose I create two stats for money. "Bigness" (BGN) reflects whether the bill is \$50 or higher.  "One-ness" (ONE) reflects whether the value contains a "1" (\$1, \$10, \$100).  When I use those instead of the real values, I'm obviously losing accuracy, because I'm eliminating valuable information.  (For one thing, a \$1 and a \$10 look exactly the same to those two stats.)

You can get equal points of BGN and ONE in different ways: For example, "\$1 and \$50", has the same effect on BGN/ONE as "\$100 and \$5."  If you increase your stats with \$100s and \$5s, you're going to have more money than your BGN+ONE suggests.  If you do it with \$1s and \$50s, you're going to have less than your BGN+ONE suggests.

It's the same for OBP and SLG.  You can get a high OBP and SLG in two separate ways: walks, or hits.  If you do it with walks, you're going to score more runs than your OBP/SLG suggests.  If you do it with hits, you're going to score fewer runs (unless you hit more home runs than any other type of hit, which is unlikely).

Now, that wouldn't be a problem, if all teams had the same walk/hit tendencies.  In that case, the regression would just smooth everything out, and the discrepancies would all cancel.

Suppose it's true that teams don't have different tendencies.  Then, the 1.7 holds everywhere, and a team's success is linear on (OBP*1.7 + SLG).  That means if you have two teams, one of which is the league average .333 and .430 (for a 1.7-weighted total of .996), but the other team is (say) .350 and .401 (for the same .996), you'd expect the two teams to have the same BB/hit ratio.

That doesn't seem right, does it?  It seems like the .350/.401 team should be hitting more singles and fewer home runs.  I mean, not necessarily ... I guess you could come up with a scenario where it hits .200 with lots of walks and power.  But that seems unlikely.  It seems like the higher the contribution of slugging percentage, the more hits relative to walks.

And, yes, that's how it is.  I ran a regression to predict the walk ratio (BB/(BB+H)) from 1.7-weighted OPS, and SLG.  The results:

Walk ratio = (0.9295*weighted OPS) - (1.5234*SLG) - 0.267

See?  The higher the SLG, the fewer the walks.  Therefore, high-SLG teams will underperform their regression estimate, and high-OBP teams will overperform.

And that's why, when you look at all teams, the regression "notices" that OBP teams are underestimated relative to SLG teams.  And so, it moves the OBP coefficient higher, and the SLG coefficient lower.

And that's the answer to why the ratio is higher than the 1.7 we'd otherwise expect.

------

Part III is here.

Labels: , , ,

Thursday, May 23, 2013

The OBP/SLG regression puzzle

In the second "puzzle" last post, I noticed that, when you run a regression to predict winning percentage from on-base percentage and slugging percentage, you get that a point of OBP is worth between two and three times as much as a point of SLG.  That's different from the consensus value of 1.7 (which Tango derives here).  Why the difference?

When I wrote that post, I thought I knew the answer ... but I actually hadn't.  So I spent the last few days trying to figure it out.  I actually jumped around among a whole bunch of possibilities, and hit a lot of dead ends ... but I think I finally got somewhere.  Here's the current state of my thinking.

As usual, I could be wrong.  I was wrong a few times in the course of working on all this ...

------

I think the issue is one of non-linearity.  The regression assumes that runs/wins are linear in OBP and SLG, but they might not be.  In fact, in a bit, I'll show they're not.

Why does non-linearity matter?  Because, the "1.7" comes from adding events to an average team, and looking at the marginal impact.  If there's linearity, then we know that impact must be the same for all teams.  But, if there's not, that doesn't necessarily work.

To see why: consider a relationship that's actually cubic:

0, 1, 8, 27, 64, 125, 216

If you do a linear regression to predict x-cubed from x, for those seven values, you get

y = 34x - 39

The regression says that if you increase X by 1, Y increases by 34.

But ... that's not true for the *average* value of X.  The average value is 3.  The difference between 3.5 cubed and 2.5 cubed isn't 34 -- it's 27.25.  (To be more precise, we can take the first derivative of x-cubed at 3, and get 27.)

So the average coefficient is higher than the coefficient at the average.

Sometimes the coefficient at the average will be "too high", and sometimes, like now, it will be "too low".  I'm not completely sure of the exact conditions for each.  The point is, though, that if you have non-linearity, the two coefficients will probably be different.

And that means if you have two variables, the ratio will be different.  Suppose you have Y = a^3 + b.  The regression will give you coefficients of 34 and 1.  But the values at the average will be 27 and 1.  So the ratio is 34 overall, but 27 at the average.

That might be what's happening here.  OPS and SLG are non-linear in separate ways, and that changes the ratio from 2.3 overall, to 1.7 at the average.

-------

OPS and SLG are indeed non-linear in a certain way.  In the way I'm going to show you, you don't need any baseball knowledge.  I'm going to show you that they're non-linear, not in terms of *runs*, but in terms of *raw events*.

Suppose you have a batting line, and you want to add walks to raise the OBP by one point (.001).  How many walks do you have to add?  Well, it depends on your original OBP.

Suppose you're at .333 -- you have 333 "OBs" (walks or hits) in 1000 PA.  How do you get to .334?  You can't just add one walk, because that only brings you to .333666 (334/1001).  It turns out you have to add approximately 1.5015 walks.  That brings you to 334.5015 OBs in 1001.5015 PA, which brings you to .334.

But, now, suppose you're at .400, and you want to get to .401.  How many walks do you have to add now?  This time, it's 1.66945.  401.66945 divided by 1001.66945 equals .401.

(I did a little algebra to get the formula that, for every 1000 PA you start with, the number of additional walks you need is (1000 divided by (.999 minus OBP)).  That's where those two numbers came from.)

That is: points of OBPs give you increasing returns *in terms of number of walks*.  So the more OBPs you get, the more each additional one is worth.  Or, if you want to put it another way, the more OBPs you have, the harder it is to get another one, because you need more walks to get it.

Again, this is not a baseball observation.  The same thing applies, to, say, games of gin rummy.  If you're at .333 and you want to get to .334, you only need to win your next 1.5015 games.  If you're at .400 and you want to get to .401, though, you have to win your next 1.66945 games.

-------

So: as we saw, a point OBP offers a higher return when OBP is already high.  That, by itself, is enough to make the coefficient of OBP different from the marginal value *for an average team*, which is where the 1.7 came from.

But ... what about SLG?  If SLG also offers increasing returns, its coefficient will vary, too.  If it varies the same way, we should still get 1.7!

Yes, indeed.  But, who knows if SLG *does* have increasing returns?  And who knows if it does, if it's exactly equal to the increasing returns of OBP?  That would be quite a coincidence, wouldn't it?

Since we have no reason to expect OBP and SLG to offer the exact same distortion caused by increasing returns ... we have no reason to expect the ratio OBP/SLG to be exactly 1.7.

This doesn't explain why it's at the level it's at -- "slightly higher than 1.7," we could call it.  From the logic we've seen so far, it could be anything: lower than 1.7, higher than 1.7, much different, a little different, whatever.

But: that's why, in theory, it won't be exactly 1.7.  If that's all you're looking for, an explanation of why it *could* be different, there it is.  I'm going to keep going, but it gets boring and technical and long for the next bit.

------

OK, so we talked about adding a point of OBP by adding walks.  Now, let's talk about adding a point of SLG by adding an extra base.

Adding extra bases doesn't change the denominator of SLG (which is at-bats).  So, if you want to add one point of SLG where there's 1,000 AB, you can just add one extra base.  Change a double to a triple, or something.

But: the denominator, the number of AB, is not the same for every team.  The more AB you have, the more valuable a point of SLG.  At 1,000 AB, you need only 1 extra base.  At 1,020 AB, you need 1.02 extra bases, which is 2% more valuable.

AB is hits plus outs.  In our regression, every team has roughly the same number of outs (since we did full seasons only), so the only difference is hits.  So, the more hits a team has, the more valuable a point of SLG from extra bases.  And hits correlates highly with OPS.

So: the more OPS a team has, the more valuable a point of SLG.  But ... well, it's a weak increase, compared to the OPS increase.  I'm almost willing to call this one linear.

------

What about adding a point of SLG by adding a single?  That's different, because a single affects both SLG and OBP.  So, we need to do this in two steps: we add enough singles to raise SLG by a point, and then subtract enough walks to lower OBP back to where it was before.

How many singles to we have to add to SLG?  That's the same formula as for how many walks we had to add to OBP.  For 1,000 AB, it's

1000 divided by (.999 minus SLG)

That increases OBP by that many "events", so we subtract that exact number of walks, and OBP is back to where it was before.  (Effectively, we've just converted walks to singles at the exact rate that OBP stays the same, but SLG goes up a point.)

The increase in runs is, therefore,

[1000 / (.999 - SLG)] * [value of single - value of walk]

We're assuming singles and walks have constant value -- .47 and .34, say -- so we get that adding one point of OBP adds

+.14 * [1000 / (.999 - SLG)] runs.

That's a higher number when SLG is higher, so we see that a point of SLG also has increasing returns.  (I'm not going to try to figure out by how much.)

------

The last case is adding a point of OBP by singles (and leaving SLG alone).

How many singles do we need to add?  Same formula:

[1000 / (.999 - OBP)]

But, that will also increase SLG, so we have to subtract that enough "extra bases" from SLG to bring it back to where it was before.

Adding hits increased total bases by the same number as it increased AB.  But, to keep slugging the same when adding AB, we need to increase total bases by only SLG times the number of AB.  So, we need to subtract (1-SLG) total bases, for each single added.

That is, we need to subtract

[1000 / (.999 - OBP)] * (1-SLG) bases for each hit.

Combining the the two steps, gives a batting line change of

[1000 / (.999 - OBP)] cases of "add one single, and subtract (1-SLG) bases".

Assigning run values here -- say, .47 for a single, and .26 for a base -- gives a run increase of

[1000 / (.999 - OBP)] * (.47 - .26 (1-SLG))

That gives increasing returns in OBP, and also increasing returns in SLG.  Again, I'm not going to try and quantify which is bigger.

------

Those are the only four cases I see of how to increase one of OBP and SLG at the margin.  (For extra-base hits, you just add the two cases -- add singles, and then add extra bases.  The math works out the same.)

That means, in terms of increasing returns, we have:

Increase SLG by bases -- roughly linear
Increase SLG by hits  -- increasing in SLG
Increase OBP by walks -- increasing in OBP
Increase OBP by hits  -- increasing in OBP and SLG

So, some ways are increasing in OBP, and some in SLG, and ... it looks like OBP and SLG are represented roughly equally.  It looks like we should expect a ratio that's not too far from 1.7.  It might not be *exactly* 1.7, but our gut says it should be not too different.  Which is about right -- it's in the 2s.

This is all theory.  Is there any evidence we can look at?

Well, it looks like teams with lots of walks should be different from teams with lots of hits.  The walking teams should see lots of increasing returns in OBP, and so a higher ratio.  And the hitting teams should see lots of increasing returns in SLG, and so a lower ratio.

So, I repeated the regression, but included only teams who were at least two percentage points higher than normal in their BB/H ratio.  This is the regression for those teams:

wpct = 2.69 OBP + 1.05 SLG - .845 (ratio: 2.5)

And for teams who walked two percentage points *less* than normal:

wpct = 1.62 OBP + 1.03 SLG - .491 (ratio: 1.6)

So, that seems to support the theory!  More walks = higher ratio, as hypothesized.

The results are similar if I use other point thresholds for higher/lower than average:

0 points: 2.6 low, 3.7 high
1 point : 2.0 low, 4.6 high
2 points: 1.6 low, 2.5 high
3 points: 0.8 low, 1.5 high
4 points: 7.1 low, 3.6 high

(The theory seems to fail in the extreme case ... but it's probably sample size.  If you up the SLG coefficient by 2 SDs, the ratio drops from 7.1 all the way to 1.6.)

Overall, I'd say, the test seems to support the theory.

-----

OK, now the bad news.  I don't think this is the real answer.  Yes, I think it's all correct, but I wonder if the effect is much too small to make such a big difference, from 1.7 to 2.3.

Also, this occurred to me, another explanation that seems bigger:

Walks get lumped in with singles in OBP.  Extra bases get lumped in with singles in SLG.  Which is worth more: a single, or the exact number of walks and extra bases that have the same impact on OBP and SLG?  Whichever is worth more, if the good teams get more of that one relative to the other, that will show up in a higher coefficient.  If the good teams get fewer of that one, the coefficient would be lower.

This last explanation seems to me like the effect would be bigger.  Further research required, I guess.

-----

Part II is here.

Labels: , , ,

Thursday, May 16, 2013

Two regression puzzles

Here's a couple of interesting sports regression problems I ran into in the past week, if you're into that kind of thing.  What struck me about them is how simple the actual regressions are, but how hard you have to think to figure out what they really mean.

----

The first one comes from Brian Burke.

Brian ran a regression to link an NFL quarterback's performance to his salary.  He got a decent relationship, with a correlation of .46.  Based on that regression, it looked like Aaron Rodgers should be worth around \$25 million a year.

So far so good.

Then, Brian ran exactly the same regression, but switched the X and Y axes.  He got the same correlation, of course.  And the points on the graph were exactly the same, just sideways.  But, this time, it looked like Rodgers should be worth only about \$11 million!

How is that possible?

Here's the post where Brian lays out both arguments -- along with pictures -- and asks which is right.  It took me a couple of hours of pondering, but I think I figured it out.

My answer is in the comments to Brian's post.  I think it's correct, but I'm not completely sure ... and I don't think I even convinced Brian.

----

The second one you can understand, probably, without pictures.  I'll elaborate in the next post, but I'll just lay it out for now.

It's an established result, in baseball analysis, that a point of on-base percentage is worth about 1.7 times as much as a point of slugging percentage.  (Here's a discussion at Tango's old blog; you can probably Google and find others.)

But ... if you do a regression, that's not what you get.

I ran a regression to predict team winning percentage from OBP and SLG, using seasons from 1960 to 2011.  My equation was:

wpct = (2.52 OBP) + (0.71 SLG) - 0.62

By this regression, it looks like a point of OBP is worth 3.5 times a point of SLG -- almost twice as much as the true value of 1.7.  Also, the 2.52 and the 0.71 aren't right either, individually.

It's not just random error ... even if you move the two coefficients together by 2 standard errors each, the ratio still won't reach 1.7.  Also, if you break this down into subsets, you get roughly similar results for each (as long as you keep enough seasons to reduce the randomness enough).

What's going on?

It took me a while -- again -- but I think I figured this one out too.  I'll explain in the next post.

UPDATE, Friday 5/17:  Upon further reflection, I *haven't* figured out the second one yet.  But I'm working on it!

Labels: ,

Monday, May 06, 2013

How extreme are simulation game results?

Last post, I tried to figure out the theoretical breakdown of luck in team records.  In standard errors, I got:

31.9 runs from career years batting
31.9 runs from career years pitching
23.9 runs from event clustering
23.9 runs from opposing team's event clustering
39.1 runs from Pythagoras

A couple of posts before that, I had done the same thing, but for my "luck" study.  There, the "career year" luck estimates were higher.  (The clustering and Pythagoras estimates were roughly the same, but that's because I used pretty much the same method).

42.7 runs from career years batting
48.5 runs from career years pitching
23.8 runs from event clustering
25.8 runs from opposing team's event clustering
39.1 runs from Pythagoras

There are good reasons for my "career year" estimates being farther off -- mostly, because I had to interpret changes in talent (aging, injuries, learning to hit a breaking ball) as luck.  (There are also selective sampling issues.)

Anyway, part of the reason I did all this was because of a comment from Ted Turocy in my post on simulation games:

"Do you also assume that the player "cards" have been suitably regressed to the mean? If not -- which is the case with all standard season sets/disks -- then the simulation will tend to have more extreme totals on the player leaderboards."

I hadn't thought of that.  Ted is right ... most simulations don't regress to the mean.  If Mark McGwire hit 70 home runs, the game will be calibrated so that McGwire's expectation is 70.  Which means, around half the time, he'll actually hit *more* than 70.  In fact, the binomial SD for McGwire's 1998 is around 8 homers, so that, a good proportion of simulated seasons, he might hit 80 or more!

The *team* totals will be more extreme too, then, and so will the team standings.  So I wondered: how much more extreme?  I seem to recall, some time ago, APBA putting out a promotional flyer with the results of a full simulated season (sent in by a customer), and showing how similar it was to the actual standings.  You'd think, though, without regressing to the mean, you'd have too much of a spread, though.

Well, now we can figure it out.

From 1973 to 2001 (omitting strike seasons), the SD of team records (normalized to 162 games) was almost exactly 11 wins.  The theoretical SD of luck is 6.36 wins, which means the SD from talent is almost exactly 9 wins.  (9 squared plus 6.36 squared equals 11 squared.)

So, that's 90 runs.

But, now, the simulation is increasing that, by taking the "career years" luck, and making it part of the player's "talent" (which is what the card represents).  For the team as a whole, it works out to 31.9 runs pitching, and 31.9 runs hitting.

So the SD of talent is now 101 runs -- the square root of (90 squared plus 31.9 squared plus 31.9 squared).

Which means the SD of observed W-L records is now 119 runs, or 11.9 wins -- the square root of (101 squared + 63.6 squared).

11.0 wins -- real life
11.9 wins -- APBA

Not much difference -- around a single win.

So, if you figure the most extreme team in a year is maybe 2 SD from the mean ... the top and bottom teams should be two wins more extreme than real life.  (They won't necessarily be the *same* teams.)

That's not a big deal ... you probably wouldn't even notice.  Real life often bumps the extremes well beyond that.

The real-life SD of wins is 11.0, but it varies quite a bit by season.   Here are the 26 season SDs:

10.3,  9.9, 11.7, 11.8, 14.4, 12.3, 12.9, 11.6, 10.5, 9.8, 8.9, 12.6, 10.3, 9.8, 12.0, 10.0, 9.1, 9.7, 10.2, 12.2, 10.0, 9.6, 13.5, 12.5, 10.0, 13.0, 14.8, 13.4, 13.5, 10.8, 10.1, 9.3, 11.1, 11.4, 11.0, 11.4

11.9 fits right in.  The extremes are much higher than that.  In 1984, the SD of wins was less than 9.  In 2002, it was 14.75.  I've never seen anyone mention either, that there was so much parity in 1984 but so little in 2002.  It seems that we don't even notice *big* changes.

The overall SD of the SDs was more than 1.5 ... so a change from 11.0 to 11.9 is only 0.6 SDs from the previous mean.  For statistical significance, you'd need to more triple that ... which means you'd need roughly ten times as many seasons.  It would take 360 years of Major League Baseball to get a 50 percent chance of statistical significance for a single unregressed APBA season.

It's a smaller effect than I had been thinking it would be.

------

In any case, that just applies to team records.  You'll be able to notice the spread more easily in individual results.

For a season of 600 AB, the binomial standard deviation of batting average is 18.7 points.

In 2012, Buster Posey led the majors in batting average, at .336.  In a simulation, he'd have a 50% chance of beating that.

Miguel Cabrera, who hit .330, would have probably a 40% chance.  Andrew McCutchen, half an SD behind, would have a 30% chance.  Mike Trout, also around 30%.  Adrian Beltre (.321), 20%.  So far, that adds up to 170%, or an average of 1.7 simulated hitters beating .336.  And that's only after looking at five players!

So, it's virtually assured that the simulated leader will outhit the actual leader.  That's probably true in any of the major statistics where players get similar numbers of opportunities.

------

Bottom line: in an APBA season, you'll notice players are more extreme than real life -- but you probably won't notice that teams are.