Monday, January 09, 2017

Apportioning team wins among individual players

In one of my favorite posts of 2016, Tango talks about Win Shares, and about forcing individual player totals are forced to add up to the team's actual wins, and what that kind of accounting actually implies.

I was going to add my agreement to what Tango said, but then I got sidetracked about the idea of assigning team wins to individual players, even without the "forcing" aspect. 

I thought, even if the player wins exactly add up to the team wins without forcing them to do that ... well, even then, does the concept make sense?


The idea goes like this: you know the Blue Jays won 89 games in 2016. How many of those 89 wins is each individual Blue Jay responsible for, in the sense that if you add them all up, you get a total of 89?

One problem is that the criterion "responsible for" is too vague -- like the "most valuable" in "most valuable player."  It can mean whatever you want it to mean. But even if you're flexible and accept any reasonable definition, I'm not sure it still necessarily makes sense.

You drive your car 89 miles on three gallons of gas. How many of those 89 miles was the engine responsible for? How many of those 89 miles was the steering wheel responsible for? Or the tires, or the radiator? 

Well, you can't go anywhere without an engine, and you can't go anywhere without tires -- but they can't both get full credit for the 89 miles, can they? 

Well, it's the same thing for baseball. You can't win without a pitcher, and you can't win without a catcher -- you'd forfeit every game either way.

If you say that Troy Tulowitzki was responsible for (say) 4.0 of the team's 89 wins, what does that actually mean? It sounds like it means something like, "the team won 4.0 more games with Tulo at short than if the rules let them leave the shortstop position open."  

But that can't be it. Even if the rules allowed it ... well, with only eight players on the field, and an automatic out every time Tulo's spot came up to bad ... well, that would have cost the Blue Jays a lot more than four additional games, wouldn't it?


So, my initial thought was, assigning team wins to players makes no sense. But, then, I saw what I think is a way to make it work. 

When you say Troy Tulowitzki was responsible for 4.0 wins, you're implying that he's four wins better than nothing. But "nothing," taken literally, defaults every game. What if you say, instead, that he's four wins better than a zero-level player?

Taking a page from "Wins Above Replacement," let's redefine "nothing" to mean, a player from the best possible team that would still win zero games against MLB opponents. Or, maybe, to make it clearer, a team that would go 1-161. (You could probably use 0.1 - 161.9, or something, but I'll stick to 1-161.)

(I'm curious what kind of team that would be in real life. For what it's worth, I think I once ran a simulation of nine Danny Ainges (1981) versus nine Babe Ruths, and the Ainge team did go exactly 1-161.)

If Pythagoras works at those extremes ... the square root of 161 is about 13, so we're talking a team that would be outscored by MLB teams by a factor of 13 or more. I have no idea what that is. A good high school team?

Anyway ... if you define it that way, then, I think, it works. Win Shares is just Wins Above Replacement, with a team replacement level at an .005 winning percentage instead of the usual .333 or .250 or whatever. 

Maybe you could call it WAZ, for "Wins Above Zero."


But I'm still uneasy, even though it kind of works. I'm uneasy because I still don't buy the concept. I don't accept the idea that you can start with the 89 wins, and break it down by player, and it has to add up, and the job is just to figure out how. 

Because, that's not how it works. If you didn't like the car analogy, try this:

You have three players on your team. Each takes ten free throws. You get the team score by multiplying the individual scores together. If the players get 5, 6, and 8, respectively, the team gets 240.

Of those 240 points, which players are responsible for how many points? If you replaced player A by a guy who can't shoot at all, the team would score zero -- the product of 0, 6, and 8. So, A's "with or without you" contribution is worth 240. But, so is B's and C's! In this non-baseball sport, the sum of the individual players adds up to *triple* the team total.

In this specific case, because the score is straight multiplication, you might be able to make this work by taking the logarithm of everything, and switching to addition. But baseball isn't that easy. It's somewhere between addition and multiplication; the value of your single depends on the chance your teammates will reach base ahead of you and behind you. A home run is still dependent on context, but less so, since you always at least score yourself.

As I argued, baseball is "almost" linear, so you can get all this to work, kind of. But the fact that it works, kind of, doesn't mean the question makes sense. It just happens that the roundish peg fits into the squarish slot, because the peg is kind of squarish too, and the slot is kind of roundish.


Even before Win Shares or other win statistics, we used to do team breakdowns all the time, but for runs rather than wins. 

For instance, the 1986 Chicago White Sox scored 644 runs. We've always been willing to split those up by figuring Runs Created. For instance, I'm OK saying that of those 644 runs, Harold Baines was responsible for 87 of them, Daryl Boston another 28, Carlton Fisk 39, and so on. 

So why do I have a problem doing the same for wins?

Well, this is just me, and your gut will differ. But, personally, it's that when you throw pitching into the mix, it makes it obvious that the splitting exercise is contrived. 

With runs, you have an actual, visible pile of runs, that actually scored, and you can see the players involved, and it seems reasonable to divide the spoils.

But what about pitchers? What do you have a pile of to split? Maybe runs prevented, rather than runs created. But what's in the pile? How many runs did the 1986 White Sox prevent? Infinity, is my first reaction.

With Win Shares, Bill James got around this problem by defining a "zero line" to be twice the league average -- the "pile" is the runs between that and the actual number. For 1986, the zero line is 1492, so the White Sox wind up with a pile of 793 prevented runs to split among them. That's fine and reasonable, but it's still arbitrary, and, for me, it shatters the illusion that when you split team wins, you're doing something real. 

Here's another weird analogy.

You earn $52,000 a year, and at the end of the year, after all your expenses, you have $1,040 saved. How do you split the credit for that $1,040 among your 52 paycheques (batters)? Easily: just divide. Each pay is responsible for $20 of that $1,040.

But ... it's not just your deposits that are involved. It's your withdrawals, too. You could easily have spent all your money, and even gone into debt, if not for your spending prevention skills (pitchers). How much of the $1,040 is due to your dollars deposited being high, and how much is due to your dollars withdrawn being low?

To model that, you have to figure out "spending prevented."  Maybe, under zero willpower, you would have spent double what you earned -- you would have borrowed another $52,000 and blew it on crap. So, it turns out, your willpower prevented $53,000 in spending.

Your paycheques are responsible for $52,000 deposited, and your willpower for $53,000 not withdrawn. Maybe we'll divide that proportionally. So, each cheque, maybe your job skills were worth $9.90 per paycheque, and your thrift skills were worth $10.10.

Does that sound like a real thing? It doesn't to me.


This is not to say that I don't like Win Shares ... I do, actually. But I like them in the same way I like Bill's other "intuitive" stats, like Approximate Value, and Speed Score, and Trade Value. I like those as rough evaluations, not measurements. In fact, Win Shares is almost like Approximate Value, except that because they're roughly denominated in team wins, I find Win Shares easier to process intuitivel7. 

It's not the stat that bugs me, or the process. It's just the idea that it's a real, legitimate thing, demanding that team wins be broken up and credited arithmetically to the individual players. Because, I don't think it is. It just comes close in the baseball case.

Maybe I'm just old and cranky.

Anyway, in Tango's post, which I haven't actually talked about yet, there are better reasons to resist the idea of splitting team wins, based on the idea that they don't actually add up, and that when you force them to, you have to do things that you really can't justify. That's a much better argument, and it was what I was going to talk about before I started getting sidetracked with this conceptual stuff.

Next post.

Labels: ,


At Friday, January 20, 2017 6:43:00 PM, Blogger Eduardo Sauceda said...

The Elo rating does not have any absolute value. From his book The Ratings of Chessplayers, past & present 2nd ed:

The rating scale itself -its range of numbers- is, like any scale without reproducible fixed points, necessarily an open ended floating scale. Application of the rating system to the entire membership of a national federation requires a range wide enough to cover all proficiencies, perhaps as many as ten categories from novice to Grand master, and enough ballast numbers so no rating ever goes negative... both the class subdivision into 200 points and the choice of 2000 as the reference point (upper level for the strong amateur or club player) were already stepped in tradition when this author arrived on the scene. So too was the expression of ratings in four-digit numbers, although four-digit accuracy was not present. These features were retained for their general acceptance by the players. Other numbers could have been used. It is only the differences on the scale that have real significance in term of probabilities.

When someone adapts the Elo system to other sport, he uses an arbitrary number to start the pool. In my Elo implementation published many years ago by Kenneth Massey on I used 2000 as base for any sport. Fivethirtyeight uses 1500. But that has no significance, only the rating difference between members of the pool. As you say, there is not possible comparison between different pools without enough "communicating vessels" in form of games between the pools. So, trying to compare an Elo rating between NBA and NCAA is futile, as is NHL vs MLB or Chess and Scrabble.


Post a Comment

<< Home