Tuesday, March 26, 2019

True talent levels for individual players

(Note: Technical post about practical methods to figure MLB distribution of player talent and regression to the mean.)

------

For a long time, we've been using the "Palmer/Tango" method to estimating the spread of talent among MLB teams. You're probably sick of seeing it, but I'll run it again real quick for 2013:

1. Find the SD of observed team winning percentage from the standings. In 2013, SD(observed) was 0.0754.

2. Calculate the theoretical SD of luck in a team-season. Statistical theory tells us the formula is the square root of p(1-p)/162, where p is the probability of winning. Assuming teams aren't that far from .500, SD(luck) works out to around 0.039.

3. Since luck is independent of talent, we can say that SD(observed)^2 = SD(luck)^2 + SD(talent)^2 . Substituting the numbers gives our estimate that SD(talent) = 0.0643. 

That works great for teams. But what about players? What's the spread of talent, in, say, on-base percentage, for individual hitters?

It would be great to use the same method, but there's a problem. Unlike team-seasons, where every team plays 162 games, every player bats a different number of times. Sure, we can calculate SD(luck) for each hitter individually, based on his playing time, but then how do we combine them all into one aggregate "SD(luck)" for step 3? 

Can we use the average number of plate appearances? I don't think that would work, actually, because the SD isn't linear. It's inversely proportional to the square root of PA, but even if we used the average of that, I still don't think it would work.

Another possibility is to consider only batters with close to some arbitrary number of plate appearances. For instance, we could just take players in the range 480-520 PA, and treat them as if they all had 500 PA. That would give a reasonable approximation.

But, that would only help us find talent for batters who make it to 500 PA. Those batters are generally the best in baseball, so the range we find will be much too narrow. Also, batters who do make it to 500 PA are probably somewhat lucky (if they started off 15-for-100, say, they probably wouldn't have been allowed to get to 500). That means our theoretical formula for binomial luck probably wouldn't hold for this sample.

So, what do we do?

I don't think there's an easy way to figure that out. Unless Tango already has a way ... maybe I've missed something and reinvented the wheel here, because after thinking about it for a while, I came up with a more complicated method. 

The thing is, we still need to have all hitters have the same number of PA. 

We take the batter with the lowest playing time, and use that. It might be 1 PA. In that case, for all the hitters who have more than 1 PA, we reduce them down to 1 PA. Now that they're all equal, we can go ahead and run the usual method. 

Well, actually, that's a bit of an exaggeration ... 1 PA doesn't work. It's too small, for reasons I'll explain later. But 20 PA does seem to work OK. So, we reduce all batters down to 20 PA.*  

*The only problem is, we'll only be finding the talent range for the subset of batters who are good (or lucky) enough to make it to 20 plate appearances. That should be reasonable enough for most practical purposes, though.  

How do we take a player with 600 PA, and reduce his batting line to 20 PA? We can't just scale down. Proportionally, there's much less randomness in large samples than small, so if we treated a player's 20 PA as an exact replica of his performance in 600 PA, we'd wind up with the "wrong" amount of luck compared to what the formulas expect, and we'd get the wrong answer.

So, what I did was: I took a random sample of 20 PA from every batter's batting line, sampling "without replacement" (which means not using the same plate appearance twice). 

Once that's done, and every hitter is down to 20 PA, we can just go ahead and use the standard method. Here it is for 2013:

1. There were 602 non-pitchers in the sample. The SD of the 602 observed batter OBP values (based on 20 PA per player) was 0.1067.

2. Those batters had an aggregate OBP of .2944. The theoretical SD(luck) in 20 PA with a .2944 expectation is 0.1019.

3. The square root of (0.1067 squared - 0.1019 squared) equals 0.0317 squared.

So, our estimate of SD(talent) = 0.0317. 

That implies that 95% of batters range between .247 and .373. Seems pretty reasonable.

-------

I think this method actually works quite decently. One issue, though, is that it includes a lot of randomness. All the regulars with 500 or 600 plate appearances ... we just randomly pick 20, and ignore the rest. The result is sensitive to which random numbers are pulled. 

How sensitive? To give you an idea, here are the results of 10 different random runs:

0.0317
0.0286
0.0340
0.0325
0.0464
0.0471
0.0257
0.0421
imaginary
0.0435

I should explain the "imaginary" one. That happens when, just by random chance, SD(observed) is smaller than the expected SD(luck). It's more frequent when the sample size is so small -- say, 20 PA -- that luck is much larger than talent. 

In our original run, SD(observed) was 0.0107 and SD(luck) was 0.0102.  Those are pretty close to each other. It doesn't take much random fluctuation to reverse their order ... in the "imaginary" run, the numbers were 0.01021 and 0.01022, respectively.

More generally, when SD(observed) and SD(luck) are so close, SD(talent) is very sensitive to small random changes in SD(observed). And so the estimates jump around a lot.

(And that's the reason I used the 20 PA minimum. With a sample size of 1 PA, there would be too much distortion from the lack of symmetry. I think. Still investigating.)

The obvious thing to do is just do a whole bunch of random runs, and take the average. That's doesn't quite work, though. One problem is that you can't average the imaginary numbers that sometimes come up. Another problem -- actually, the same problem -- is that the errors aren't symmetrical. A negative random error decreases the estimate more than a positive random error increases the estimate. 

To help get around that, I didn't average the 500 estimates in the list. Instead, I averaged the 500 values of SD(observed), and 500 estimates of SD(luck). Then, I calculated SD(talent) from those.

The result:

SD(talent) = 0.0356

Even with this method, I suspect the estimate is still a bit off. I'm thinking about ways to improve it. I still think it's decent enough, though.

--------

So, now we have our estimate that for 2013, SD(talent)=0.0356. 

The next step: estimating a batter's true talent based on his observed OBP.

We know, from Tango, that we can estimate any player's talent by regressing to the mean -- specifically, "diluting" his batting line by adding a certain number of PA of average performance. 

How many PA do we need to add? As Tango showed, it's the number that makes SD(luck) equal to SD(talent). 

In the 500 simulations, SD(luck) averaged 0.1023 in 20 PA. To get luck down to 0.0356, where it would equal SD(talent), we'd need 166 PA. (That's 20 multiplied by the square of (0.1023 / 0.0356)). I'll just repeat that for reference:

Regress by 166 PA

A value of 166 PA seems reasonable. To check, I ran every season from 1950 to 2016, and 166 was right in line. 

The average of the 57 seasons was 183 PA. The highest was 244 PA (1981); the lowest was 108 PA (1993).  

--------

Now we know we need to add 166 PA of average performance to a batting line to go from observed performance to estimated talent. But what, exactly, is "average performance"?

There are at least four different possibilities:

1. Regress to the observed real-life OBP. In MLB in 2013, for non-pitchers with at least 20 PA, that was .3186. 

2. Regress to the observed real-life OBP weighting every batter equally. That works out to .2984. (It's smaller than the actual MLB number because, in real life, worse hitters get fewer-than-equal PA.)

3. Regress to the average *talent*, weighted by real-life PA.

4. Regress to the average *talent*, weighting every batter equally.

Which one is correct? I had never actually thought about the question before. That's because I had only every used this method on team talent, and, for teams, all four averages are .500.  Here, they're all different. 

I won't try to explain why, but I think the correct answer is number 4. We want to regress to the average talent of the players in the sample.

Except ... now we have a Catch-22. 

To regress performance to the mean, we need to know the league's average talent. But to know the league's average talent, we need to regress performance to the mean!

What's the way out of this? It took me a while, but I think I have a solution.

The Tango method has an implicit assumption that -- while some players may have been lucky in 2013, and some unlucky -- overall, luck evened out. Which means, the observed OBP in MLB in 2013 is exactly equal to the expected OBP based on player talent.

Since the actual OBP was .3186, it must be that the expected OBP, based on player talent, is also .3186. That is: if we regress every player towards X by 166 PA, the overall league OBP has to stay .3186. 

What value of X makes that happen?

I don't think there can be an easy formula for X, because it depends on the distribution of playing time -- most importantly, how much more playing time the good hitters got that year compared to the bad hitters.

So I had to figure it out by trial and error. The answer:

Mean of player talent = .30995


(If you want to check that yourself, just regress every player's OBP while keeping PA constant, and verify that the overall average (weighted by PA) remains the same. Here's the SQL I used for that:

SELECT 
sum(H+bb)/sum(ab+bb) AS actual, 
sum((h+bb+.30995*166)/(ab+bb+166)*(ab+bb)) / sum(ab+bb) AS regressed 
FROM batting
WHERE yearid=2013 and ab+bb>=20 and primarypos <> "P"
The idea is that "actual" and "regressed" should come out equal.

The "primarypos" column is one I created and populated myself, but the rest should work right from the Lahman database. You can leave out the "primarypos" and just use all hitters with 20+ PA. You'll probably find that it'll be something lower than .30995 that makes it work, since including pitchers brings down the average talent.  Also, with a different population of talent, the correct number of PA to regress should be something other than 166 -- probably a little lower? -- but 166 is probably close.

While I'm here ... I should have said earlier that I used only walks, AB, and hits in my definition of OBP, all through this post.)

--------

So, a summary of the method:

1. For each player, take a random 20 PA subset of his batting line. Figure SD(observed) and SD(luck).

2. Repeat the above enough times to get a large sample size, and average out to get a stable estimate of SD(observed) and SD(luck).

3. Use the Tango method to calculate SD(talent).

4. Use the Tango method to calculate how many PA to regress to the mean to estimate player talent.

5. Figure what mean to regress to by trial and error, to get the playing-time-weighted average talent equal to the actual league OBP.

----------

If I did that right, it should work for any stat, not just OBP. Eventually I'll run it for wOBA, and RC27, and BABIP, and whatever else comes to mind. 

As always, let me know if I've got any of this wrong.







Labels: , , , ,