Friday, March 21, 2014

ESPN quantifies clubhouse chemistry

"ESPN the Magazine" says they've figured out how to measure team chemistry. 

"For 150 years, "clubhouse chemistry" has been impossible to quantify. ... Until now.

"Working with group dynamics experts Katerina Bezrukova, an assistant professor at Santa Clara, and Chester Spell, an associate professor at Rutgers, we built a proprietary team-chemistry regression model.  Our algorithm combines three factors -- clubhouse demographics, trait isolation and stratification of performance to pay -- to discover how well MLB teams concoct positive chemistry.

"According to the regression model, teams that maximize these factors can produce a four-win swing during a season."

The article doesn't tell us much more about the algorithm, calling it "proprietary".  They do define the three factors, a bit:

"Clubhouse demographics" is "the impact from diversity, measured by age, tenure with the team, nationality, race, and position.  Teams with the highest scores have several overlapping groups based on shared traits and experiences."

"Trait isolation" is when you have so much diversity that some players are have too few teammates similar to them, and thus are "isolated" in the clubhouse.

"Stratification of performance to pay" -- or, "ego factor" -- is based on how many all-stars and highly-paid players the team has.  A happy medium is best.  Too few big shots creates "a lack of leadership," but too many creates conflict.

Sounds silly to me, but who knows?  The data might prove me wrong.  Unfortunately, the article provides no evidence at all, not even anecdotal.  


This is from the magazine's 2014 MLB preview issue, and their little twist is that they add the chemistry estimates onto the regular team projections.  For instance, Tampa Bay is expected to rack up 91 "pre-chem" wins.  But, their chemistry is worth an extra two victories, for a final projection of 93-69.  (.pdf)

But even if you accept that the regression got the effects of chemistry exactly right -- unlikely as that is, but just pretend -- there's an obvious problem here.

If Tampa's chemistry is worth two wins, how would those two wins manifest themselves?  They must show up in the stats somewhere, right?  It's not like the Rays lose 5-4 to the Red Sox, but the chemistry fairy comes along and says, "Tampa, you guys love each other so much that you're getting the win anyway."

The idea must be that if a team has better chemistry, they'll play better together.  They'll hit better, pitch better, and/or field better.  There are other possibilities -- I suppose they could get injured less, or the manager could strategize better, or an aging superstar might swallow his ego and accept a platoon role.  But, you'd think, most of the benefit should show up in the statistics.

But, if that's the case, chemistry must already be embedded in last year's statistics, on which the magazine based its "pre-chem" projections.  Since teams are largely the same from year to year, ESPN is adding this year's chemistry on top of last year's chemistry.  In other words, they're double counting.

Maybe Tampa's "chemistry" was +1.5 last year ... in that case, they've only improved their 2004 chemistry by half a win, so you should put them at 91.5 wins, not 93.

It's possible, of course, that ESPN backed out last year's chemistry before adding this year's.  But they didn't say so, and the footnote on page 57 gives the impression that the projections came from outside sources.  


Here's another thing: every overall chemistry prediction adds up to an whole number.  Taking the Rays again, they get +0.1 wins in "ego", +1.7 in "demographics," and +0.2 in "isolation".  The total: +2 even.

There are a couple of teams ending in .9 or .1, which I assume are rounding errors, but the rest are .0.

How did that happen?  Maybe they rounded the wins and worked backwards to split them up?  


Another important problem: there are so many possible confounding factors that the effect ESPN found could be one of a million other things.

We can't know for sure, because we don't know what the model actually is.  But we can still see some possible issues.  Like age.  Performance declines as a player gets older, so, holding everything else equal, as a regression does, older teams will tend to look like they underperformed, while younger teams will look like they overperformed.

The regression's "demographic factor" explicitly looks at diversity of age.  The more players of diverse ages, they say, the better the chemistry.  

I did a quick check ... in 2008, the higher the diversity (SD) of batter ages, the older the team.  The five lowest SDs had a team average (integer) age of 27.8.  The seven highest had a team average of 29.1.  

Hmmm ... that goes the opposite way from the regression, which says that the older, high-diversity, high-chemistry teams do *better* than expected.  Anyway, the point remains: there's a hidden correlation there, and probably in the other "chemistry" measures, too.

A team with lots of All-Stars?  Probably older.  Few highly-paid players?  Probably younger with lots of turnover.  "Isolated" players?  Maybe a Yu Darvish, plays for a good team that will do whatever it takes to win next year.  Lots of variation in nationality?  Maybe a team with a good scouting department or farm system, that can easily fill holes.

You can probably think of lots of others.

Oh, wait, I found a good one.  

On page 46, ESPN says that high salary inequality is one of the things that's bad for chemistry.  In 2008, the five teams with the highest SD of batter salary had an average age of 30.4.  The seven teams with the lowest SD had an average age of 28.7.  

That one goes the right way.


Anyway, this is overkill, and  I probably wouldn't have written it if I hadn't gotten so frustrated when I read ESPN's piece.  Geez, guys, you have as much right to draw dubious conclusions from fancy regressions by academic experts as anyone else.  But if the regression isn't already public, you've got to publish it.  At the very least, give us details about what you did.  "We figured out the right answer and you just have to trust us" just doesn't cut it.

Journalistically, "We have a secret algorithm that measures clubhouse chemistry" is the sports equivalent of, "We have a secret expert that proves Barack Obama was born in Kenya."

Labels: , , ,


At Friday, March 21, 2014 4:05:00 PM, Blogger Bob Timmermann said...

I thought you couldn't count heart. But I guess you can count chemistry. Although to be fair, chemists already do that.

At Friday, March 21, 2014 4:09:00 PM, Blogger Phil Birnbaum said...

Many years ago, a mutual friend once said how it made no sense the way some sportswriters dismiss sabermetrics. "They say 'I don't believe in sabermetrics', which is like saying, 'I don't believe in chemistry."

He meant real chemistry, like chemists practice. And it's was a great line. But, now it's too ambiguous. Especially in this blog post.

At Sunday, March 23, 2014 8:50:00 PM, Blogger doc said...

And another thing--If Team A is projected to win 2 more games than their projected performance stats suggest, who loses those games? The teams with crappy chemistry? Did they publish their estimates of the crappy chemistry discount, or are we supposed to guess? We should all know before we get our bets down for the season outcomes...

At Tuesday, March 25, 2014 9:09:00 PM, Blogger Ken Swanson said...

This is the annoyance with proprietary models. If there's no way to see what your model is, it's very hard to know if your model makes any sense whatsoever.

Not to mention that there's no mention of actually using the model on test data (i.e., the past) to see if it actually makes sense. You would assume they did, but...

At Tuesday, March 25, 2014 11:03:00 PM, Blogger doc said...

I'll admit, I tickled by even an effort to quantify "chemistry." Obviously, they have indicated some things that they look at (as Phil pointed out in his post). But, for example, did the early 1970s As have good chemistry or bad chemistry? I seem to recall that they fought within the team a lot.

And, yeah, proprietary models are nice, but useless for advancing the conversation. (Good catch, Ken.)


Post a Comment

Links to this post:

Create a Link

<< Home