Thursday, April 11, 2013

Can we tell simulation from real life?

I was a participant on the "randomness" panel at the Sloan Conference last month.  One of the questions was, "How can fans get a feel for how much luck there is in sports?"

My answer went something like this: Play simulation games, like APBA or Strat-O-Matic for baseball.  You'll find that, one game, team A will win 11-1, and, the next, they might lose 8-2 to the same opponents.  Even with exactly the same talent, as determined by the game, the results will vary widely just because of random variation.

What I wanted to add at the time, but trailed off because I lost my train of thought, was: if you're skeptical, you might think that those games are "over"-random, given that they use dice rolls and all.  But ... it turns out that random APBA outcomes are very, very close to real-life outcomes.  For instance, I'd bet that pairs of "11-1 then 2-8" games are almost exactly as common in baseball history as they would be in APBA-simulated baseball history.

Now, I have no actual evidence for that, but I think it's true.  Still, I got to thinking ... what are the ways where real life and APBA *would* be different?  That is, suppose I handed you a bunch of actual game box scores, and a bunch of APBA box scores.  Would you be able to tell which pile was which?

We need to add some assumptions.  Let's suppose that the simulation is as "perfect" as sabermetric knowledge permits -- that is, it uses proper log5, the best park effects, the best guess at how DIPS should work, the proper understanding that batters hit better with runners being held on first base, and so on.  Let's suppose, too, that we clone the team's managers, and let them make game decisions the same way as real life (when to change pitchers, put in a pinch hitter, call for a hit-and-run, etc.).

And, let's also assume that we're going to weed out games with the really weird things, the ones that no simulation could be smart enough to to include with the right probabilities, like Derek Jeter's famous "flip" throw home, or the time the ball bounced off Jose Canseco's head for a home run.   Or, if you prefer, assume that the simulation IS smart enough, if that doesn't bend your brain too much.

Really, what we're trying to do here is assume that the simulation has every probability perfect: it's just that the outcomes are independent and randomly determined, by dice rolls based on the probabilities, instead of by actual play of the game by flesh-and-blood humans.

If we did all that, could anyone tell the difference?  

My gut answer: it would be hard.  There are some things we could look for.  Injuries, for instance, mean that batters would be a tiny bit "streaky", in that bad performance would be clustered more than randomly, during those times when the player is hurt.  You might find that, in real life, rookies start out well and peter out, as opponents figure out their weaknesses, whereas in APBA, the cards are fixed.  

But, overall, I think even the most knowledgable experts would have trouble telling the pile of real box scores from the pile of simulated box scores.

Think about this in concrete terms, of what you, personally would do.  Suppose I took one of those computer games, Pursue the Pennant, or something, and simulated a bunch of games from the 1978 schedule.  And I print off the box scores, and put them alongside the real ones, and I hand them to you in person.

Assuming you don' t actually remember a lot of details from 1978 games -- like actual scores, or player performances -- what would you do to figure out which was which?

Labels: , ,


At Friday, April 12, 2013 8:56:00 AM, Blogger Daniel Tilkin said...

Does the simulation take into account player fatigue? In real-life, managers give players rest days (especially relievers).

At Friday, April 12, 2013 8:57:00 AM, Blogger Phil Birnbaum said...

Yes, assume that player usage follows the same principles in the simulation as in real-life.

At Saturday, April 13, 2013 9:17:00 AM, Anonymous T Turocy said...

Do you also assume that the player "cards" have been suitably regressed to the mean? If not -- which is the case with all standard season sets/disks -- then the simulation will tend to have more extreme totals on the player leaderboards.

However, that's not a way to distinguish the simulation engine itself from real-life.

At Saturday, April 13, 2013 9:52:00 AM, Blogger Phil Birnbaum said...

Good point. Yes, we assume that the player cards are as close to true talent as we can guess.

If not, then, right, you could tell the pile of simulated games from the pile of real games because it would have more extreme totals. That's exactly the kind of thing I'm asking about.

At Saturday, April 13, 2013 10:05:00 AM, Blogger Phil Birnbaum said...

That's a point I hadn't thought of, though. I don't think APBA *does* regress to the mean, so, if you play a full season of actual APBA cards, you *will* get more extreme results overall.

If you do regress to the mean, you can tell you're running a simulation because the real-life players with the best stats will appear to underperform in the simulation (for instance, George Brett will be unlikely to hit .390 in the simulated 1980).

I should have said that I was assuming not only that you didn't remember actual game scores, but that you also didn't remember actual player performances.

At Sunday, April 14, 2013 12:33:00 AM, Anonymous Anonymous said...

This wouldn't be a large group but I think players batting at or slightly above .300, in real life, would be taken out more often to preserve the accomplishment. Unlike a simulation, real life if aware of itself.

At Sunday, April 14, 2013 1:36:00 AM, Blogger Phil Birnbaum said...

That's a good one, the .300 thing. Thanks!

At Sunday, April 14, 2013 11:03:00 AM, Anonymous Eddy Elfenbein said...

I’ve given a lot of thought about this and I think one giveaway of a simulation would be the distribution of HBPs. They’re random with a sizeable non-random element. A clever simulation could pick up some of the nuances but it would require some thought.

At Sunday, April 14, 2013 1:09:00 PM, Blogger Phil Birnbaum said...

That's a good one, Eddy, thanks!

At Wednesday, April 17, 2013 12:39:00 PM, Blogger Scott Segrin said...

Colin Wyers @cwyers posted this on Twitter in a discussion of the non-independence of PA. I think it applies here...

The obvious example is if you have a guy who is 4-for-4 on the day the opposing manager is more likely to issue an IBB.

At Wednesday, April 17, 2013 12:41:00 PM, Blogger Phil Birnbaum said...

That's a good one too. Thanks!

At Wednesday, April 17, 2013 12:46:00 PM, Blogger Scott Segrin said...

Just a follow up to my last post. There are probably a number of different instances where the "perception" of a hot or cold streak by a player will cause a manager to act in a particular way. Pinch hitting, bunting for a hit, moving players in or out of particular roles all come to mind. Here in Milwaukee, John Axford lost the closer role after only three bad outings. That probably wouldn't happen in a simulation.

At Thursday, April 18, 2013 8:44:00 PM, Blogger Don Coffin said...

Would we see the game-by-game data, or only the seasonal totals? I suspect the simulation might be easier to detect with game-by-game data.

At Thursday, April 18, 2013 10:31:00 PM, Blogger Phil Birnbaum said...

You can see the game-by-game data.

At Friday, April 19, 2013 9:26:00 PM, Anonymous Anonymous said...

The Boston streets are just as empty as a Phoenix Coyotes game!

Choke on that Notre Dame haters!

You can respond to me at the link below:

My user name is "John Taylor"

At Sunday, April 21, 2013 2:09:00 PM, Anonymous Guilherme Neto said...

Hello Phil,

I'm the admin of , a Cristiano Ronaldo website with PageRank 5 and featured on "Google News". I was wondering if you could be interested in a simple partnership by exchanging links between our websites.

Thanks in advance and I look forward for your reply,

Guilherme Neto


Post a Comment

<< Home