Comments on Sabermetric Research: True talent levels for individual players

Came across your blog via Bill James Online and Ta...

2019-06-21T13:06:44.025-04:00

Came across your blog via Bill James Online and Tango's stuff...

What about simply using a PA-weighted average for the league OBP and OBP variance? That is:

OBP_lg = sum(PA_i * OBP_i)/sum(PA_i)

var_lg = sum(PA_i * (OBP_i-OBP_lg)^2)/sum(PA_i)

And of course stdev_lg = sqrt(var_lg)

I did this with 2018 data (including pitchers) and found OBP_lg = 0.318 and stdev_lg = 0.0518 (var_lg = 0.002679).

Assuming var_lg = var_talent + var_random, and using the the Tango criteria that the proper number of plate appearances to regress is when var_talent = var_random, this means:

var_lg = 2 * var_random

And since var_random = OBP_lg * (1-OBP_lg) / PA_tango, then of course

PA_tango = 2 * OBP_lg * (1-OBP_lg) / var_lg

For my 2018 numbers, this works out to PA_tango = 162, which is very close to the number you got by your simulation method.

2019-06-21T10:45:35.409-04:00

This comment has been removed by the author.

I found that "imaginary" standard deviat...

2019-03-31T13:58:06.097-04:00

I found that "imaginary" standard deviation interesting, where the SD was less than predicted by luck alone. I first thought about that issue when I read the original Tango Tiger luck vs skill article, and wondered how to handle a SD LESS than predicted by chance.

By analogy with chess- there's an advantage in playing white, I figure the home field advantage is not a straight 4% advantage, giving the home team a 54-46 edge, but a multiplier factor.

Suppose the home field team won P times for every loss against a visiting team, and the home field multiplier was K. Then at home, the home team should win P*K games for every loss, and on the road they should win P games for every K*1= K losses. I did that for every team in 2018 and took an average.

For example , in 2018 Boston was 57-24 at home making PK= 57/24
On the road, they were 51-30. , making P/K = 51/30

That gave two equations with two unknowns, easily solved, with P = 2.0094, K= 1.182
The average of the different Ks was 1.144, greater than the 1.105 by taking the AVERAGE of 42/38. Note that Boston on a neutral field was a hair BETTER than their 108/54 record! Home field edge tends to reduce performance percentage difference for home and away, and will also reduce the SD.

A home field advantage, for whatever reason, batting last, visitor's travel fatigue, umpire bias, would tend to reduce the actual SD to a figure LESS than that predicted by 50-50 chance alone. As an extreme example, if the home field K advantage was as high as 9, with the home team winning 90% of the time against an evenly matched opponent, any analysis would show ALL performance SDs as imaginary if they didn't take the home field factor into consideration.

It 100% takes number of PA in account, or can. I ...

2019-03-27T19:08:47.975-04:00

It 100% takes number of PA in account, or can. I think the example just throws everyone in together to get the empirical prior but you wouldn't have to. You could create a distribution using weighted AB, or a subset of players, or whatever you feel. Priors are as simple or as fancy as you feel is justifiable. Then when you add the prior to actual performance the number of AB is included in actual performance.

That procedure doesn't take differing numbers ...

2019-03-27T13:04:44.013-04:00

That procedure doesn't take differing numbers of PA into account. The example explicitly considers only seasons with 500+ AB, and assumes that batters with (say) 50 AB are part of the same distribution.

That last assumption is what I'm trying to avoid, which is what causes all the aggravation.

I think what you might want, for at least part of ...

2019-03-26T21:32:54.585-04:00

I think what you might want, for at least part of your process, is something similar to this procedure: http://varianceexplained.org/r/empirical_bayes_baseball/