Thursday, December 13, 2007

Stats vs. scouting: a thought experiment

I was thinking about the Moneyball debate, about traditional scouting vs. statistical analysis. Here's a thought experiment I came up with.

Suppose you take the 25 best scouts today, and put them in suspended animation for 40 years. Then, you wake them up. You ask them to evaluate the major-league first basemen of 2047. Of course, none of the scouts know anything about the players, who weren't even born when the scouts went to sleep in 2007.

The scouts get to watch the players hit. You don't want them to evaluate the players by keeping track of their stats, so you make sure all the stats work out the same. To do that, you show them only showing them 300 PA for every player. You pick those plate appearances by making sure to include exactly 80 hits, 10 home runs, 14 doubles, and so on. (The exact PA in each category are picked randomly). The scouts can watch those plate appearances as many times as they want. The technology of 2047 lets them see the everything holographically, in 3D, from any angle. They can even use radar guns if they like. (Indeed, since this is a thought experiment, assume any additional technology you want.)

You then ask the scouts to rank the 30 players by how well they'll do next year. Would they do a decent job?

I'm probably less qualified than most readers of this post to guess at this question, but I'll try anyway. I'd bet that the scouts wouldn't do very well. I'd bet that an Albert Pujols single doesn't look that much different from Kevin Youkilis single. However, I think the scouts might figure out who has power by looking at home run distances, and who walks a lot by noting plate discipline and the ability to lay off pitches. They'd also see who has good speed.

Now suppose you froze 25 sabermetricians. To this group, instead of showing them holographic replays of plate appearances, you were to show them only the players' stats. Would they do better than the scouts? I think it's almost certain they would. The sabermetricians would have the stats for the player's whole career in front of them. The traditional scouts wouldn't have that. They might know a few small things the stats group doesn't – plate discipline, for instance – but unless they counted, their impressions would be off a bit, over 300 PA times 30 players. But the sabermetricians would know a LOT more than the scouts -- batting average, home runs, walk

And suppose that you *included* statistics for all these things for the sabermetricians – speed, pitch counts, home run distances, line drive frequency, average pitch speed against, and so on. In fact, let the sabermetricians have any stats they want (within reason). Would there be anything left for the scouts? Only things that can't be measured. What are those things? Subjective impressions of personality and drive to win? Leadership? Certain aspects of body type? Are those really enough to measure up against all that data?

Doesn't it seem like a copy of the 2047 Baseball Prospectus and 2047 Baseball Forecaster should beat the crap out of a bunch of scouts who aren't allowed to count things?

Before this thought experiment, I felt like traditional scouting was of substantial value – although not as important as the statistical record. But now, it seems to me that hard data would trump live scouting in almost every case.

Here's an experiment you could do right now, to check that. Find your top 25 scouts right now, and ask them: you've seen a lot of current major-league players live this year. For which players have you seen live indications that suggest the player's prospects are better or worse than what his statistical record suggests? Maybe you've seen something like, "hey, Joe Blow normally hits .320, but he's weak on curve balls on the outside corner, and once pitchers catch on, he'll only hit .270." Or, maybe, "you know, these five guys have had stats very similar to those five guys. But these five have drive and leadership, and are going to make themselves into better players. Those other five just coast through the season, and they're going to be washed up before too long."

That is: ask scouts to make testable predictions that are based only on observations of things that can't be measured by sabermetricians.

Can any scouts reliably make successful predictions like that? If they can, that would be evidence that scouting valuable, much more valuable than I think it is. If not, though, isn't that itself evidence that traditional scouting only has value because there isn't enough good data?

It seems to me that scouting is a *substitute* for data, and an inferior one. For those who think it's a *complement* to data, my view is that you have to show me where the benefit is.

P.S. As Tango points out, scouts sometimes add value by noticing trends that statistical analysts can verify. In that case, you can argue that they're really doing sabermetrics ...

Labels: , ,


At Thursday, December 13, 2007 8:35:00 PM, Anonymous Anonymous said...

I'd say the thought experiment is a little biased, depending on what you imagine the stats guys to have stats for.
We think of scouts mostly operating in domains (high school,college,low minors) in which the context (quality of competition,park effects, etc) tends to vary from place to place and year to year, with smaller sample sizes. You will agree this poses difficulties for the stats guys, at least in 2007?
There is advance scouting in the majors, but it isn't focused very much on who's good or bad, but exploitable strengths and weaknesses. These scouts aren't evaluating whether Albert Pujols has more power potential than Kevin Youkilis, but instead whether you can throw him fastballs inside, and whether you can bunt on him.

At Thursday, December 13, 2007 10:48:00 PM, Blogger Phil Birnbaum said...

>You will agree this poses difficulties for the stats guys, at least in 2007?

Sure. But my point is that what the scouts do is not as good as having real stats. As I said, the scouts are substitutes, not complements.

I'm sure the scouts are very, very good at watching a high-school player and figuring out how good he's going to be. But they're not as good as a statistical record would be, if you could figure out the context properly.

Your point is taken on advance scouting. But, with all that watching, shouldn't the stats guys be able to figure out who's stats are better than the player? Say a guy out of high-school plays a year in AA, and his MLE is .260/.310/.390. Shouldn't the scouts be able to (sometimes) say, hey, he's actually much better than that record, he was just unlucky? Maybe they can, I don't know. But if they can't, or they can't reliably, again that would mean that scouts are a second-best to a decent stat line.

At Thursday, December 13, 2007 11:29:00 PM, Anonymous Anonymous said...

I will present the case of a guy I went to high school with, Ryan Strieby. Ryan is 6'6 and was a first-team all-american at Kentucky, a top-25 school in one of the best conferences. For some reason, he wasn't drafted until the 4th round. He hasn't done much in the minors, although he is still young. The point is that statistically, this guy should have been top 10 pick, all-american from a major school, but he slid. Scouts have a purpose like Joe said, in areas where stats are unreliable like the NCAA or High School. In the majors I would argue that any weakness is also quantifiable and that scouts really don't add anything. The only purpose a scout in the majors would have I feel is to propose potential analysis (Pujols can't hit the outside pitch, plays bad on turf, etc.). If he can't, stats will show it.

At Friday, December 14, 2007 12:46:00 AM, Blogger jinaz said...

I guess I see scouts adding quite a bit more than you indicate. I think they're particularly helpful, as others have noted, when one needs to:
* Evaluate players with limited sample sizes (a chronic problem).
* Evaluate players in contexts other than the majors (again, major problem).
* Identify the underlying basis for numerical findings (has Andruw Jones suffered a major skill loss, or was there something wrong with his mechanics, or was he just unlucky, or...?).

I'm a stathead, of course, so if I were in charge of a team, my tendency would be to focus on the numbers. But if I had access to high-quality, quantified scouting report data, I'd certainly want to integrate those data directly into my projections and player evaluations. Whether that's then sabermetrics or scouting at that point ... I dunno. Don't care much either if it makes my estimates better.

Anyway, with respect to scouting predictions and tests, the Rule 5 is a good place to start, as I've often seen it said that that's a draft that comes down largely to scouting. Here's an n=1 study in the making:

This year, Carlos Guevara was taken by the Florida Marlins in the rule 5 draft from the Reds last week. He's pitched in AA the past two seasons and has put up excellent stats, especially his strikeout totals. This would make one wonder why the Reds, who have had terrible bullpens since 2003, wouldn't have promoted him to AAA last season, much less protected him in the Rule 5.

The reason, scouts will tell you, is that his strikeout totals are largely due to his screwball. And they believe that this "trick" pitch will not be effective at all against higher-quality hitters.

Well, the Marlins have to keep him on their 25-man roster next season or sell him back to the Reds, which means he'll probably get a decent trial next year. Will he be successful against major league hitters? Most stat-based projections would say he has a shot. Scouts say he'll get crushed.

On the other side of the coin, the Reds took this guy in the Rule 5 after losing Guevara. If that's not a pure scouting decision, I don't know what is. But apparently they think he has a better chance to survive in the majors next season than Guevara.

Proof will be in the pudding...

At Friday, December 14, 2007 2:29:00 AM, Blogger Phil Birnbaum said...

Hi, Justin,

As Tango said in the link at the end of my post, the "screwball" scout is actually doing some intuitive sabermetrics ... he's inferred that pitchers with trick pitches don't make the jump to the majors well. And that's something that can be checked out objectively, if you have the right data.

In this case, it's not a case of the scout seeing something subjective, but of him using a rule. Any scout (or layman) armed with the same rule could reach the same conclusion. "Subjective is objective" here, as Tango puts it.

The value of the scouts in this case is that their expertise and experience were able to produce this new knowledge.

A parallel is the doctor who is so experienced at seeing anemia patients that he can tell 90% of the time whether a patient has anemia or not. He is widely sought-after for his accuracy.

But then, another doctor invents a test for anemia that works 99% of the time. Doctor A is no longer valuable -- unless the data (test results) are not available.

But even if Doctor A's specialty is obsolete, *doctors in general* are still valuable, because they invent better and better tests.

At Friday, December 14, 2007 2:40:00 AM, Blogger Phil Birnbaum said...

Maybe I should phrase my main point more clearly. Scouts have value in two ways:

1. They come up with new hypotheses that, once verified, become new knowledge about baseball; for example, "AA pitchers who strike out a lot of guys with trick pitches won't do well in the majors."

2. When reliable, interpretable statistics are not available, their observations substitute for the numbers.

My post was mostly about (2); I mentioned (1) only in the P.S. But they're both important.

At Friday, December 14, 2007 2:53:00 AM, Blogger Phil Birnbaum said...

Okay, one more point. Suppose you had the best scouts make MLB player predictions for next year. You then test those predictions against a mechanical statistical system like Marcel (or PECOTA, or whatever).

Would the scouts win? I think they wouldn't.

But let's make things a little easier on them. Suppose instead that you actually gave the scouts the actual Marcel predictions, and just asked them to change any they disagreed with. Would their predictions be an improvement? I think they'd break even -- some would be better, some would be worse. I don't think they would be much different from luck.

Do you agree? If you do, I will now ask one more question:

If the statistical record, with sabermetric methods, are so good that even the scouts can barely improve on them at the major-league level, then doesn't it follow that if we only had better stats and methods at the college and high-school levels, the scouts couldn't beat those either?

Scouts are still valuable because we don't have enough data yet, and because we don't have good enough methods yet.

At Friday, December 14, 2007 3:07:00 AM, Anonymous Anonymous said...

I think I can agree with your theoretical premise, but I do have a very hard time imagining the utopian(?) future in which every high school and college field in the USA and perhaps a dozen other countries are all outfitted with the 2047 version of pitchf/x technology! I think scouting will be around for quite some time, especially as the game gets more and more international.

At Friday, December 14, 2007 10:41:00 AM, Anonymous Anonymous said...

I'm usually a fan of Phil's thought experiments, but I don't like the bias he introduces here. In any event, I think a Pujols single does look different from a pitcher's single. The bias is that I don't like it that you are only showing 30 K for Pujols and also only 40 K for some bad hitter.

A better example is with fielders. There are no "stats" for a fielder, other than errors. And we have a real-life example here: the Fans' Scouting Report. And Fans, absent most stats, will nail the evaluations.

Finally, I'm way behind on presenting the Community Forecasts for hitting, exactly as Phil is suggesting. We'll see how they did, relative to Marcel.

At Friday, December 14, 2007 10:42:00 AM, Anonymous Anonymous said...

Obviously, I meant 40 K for both.

At Friday, December 14, 2007 11:09:00 AM, Blogger Phil Birnbaum said...

"The bias is that I don't like it that you are only showing 40 K for Pujols and also only 40 K for some bad hitter."

But doesn't that imply that much of scouting is just counting statistics? Or maybe I don't understand what you mean.

At Friday, December 14, 2007 12:39:00 PM, Anonymous Anonymous said...

What if you were to give a scout Pujols 40 best statistical performances (all his HR), and his 40 worst (his K), and give him a pitcher's 40 best performances (singles), and 40 worst (K). Then, a scout will see how Pujols really crushes the ball, and is not helpless even as he K's. Basically, the scout and the numbers guy see the same things.

Or, in your case, why not give the stats guy ONLY the same PA you gave the scout guy. Then what?

You are withholding certain things from a scout, such that Pujols works the count fantastically, but you are not showing the scout all those times he does so, but rather, only a subset of his performance, and at that, when he's likely more behind the count than he normally would.

I think a better example would have been the fielding, where you have neither a selective sampling issue, nor data to pollute the scout.

At Friday, December 14, 2007 9:34:00 PM, Blogger Phil Birnbaum said...

Tango, I'm still not sure I understand. The scouts would know the might be seeing too many of Pujols' strikeouts, and they would be free to infer such on the basis of what they did see -- ie, that his strikeouts are of the intelligent, non-helpless variety.

When a scout sees a high-school player over, say four games and 20AB, he similarly knows that the frequency of what he's seeing is close to random. And so he has to make inferences based on things that are consistent over those 20AB, such as how he works the count.

The thought experiment is designed to simulate the process of evaluating a high-school player, or any player when you don't have any reliable stats for him.

To put this another way, you say:

"... why not give the stats guy ONLY the same PA you gave the scout guy. Then what?"

Then the stats guy has NO information about the players. What I'm trying to do is create a symmetry between the scout and the stats guy.

The scout sees HOW the events happen, but not HOW OFTEN they happen. The stats guy sees HOW OFTEN they happen, but not HOW they happen.

Your example of the scout counting how often Pujols takes a pitch is an example of the scout adding to his "how often" base of information. Which is really a way of saying the scout is compling statistics.

And I agree with you that that's valuable, but it's a data-collection exercise. The scout is basically acting like a micro-Retrosheet.

At Monday, December 17, 2007 12:18:00 PM, Blogger jinaz said...

I was thinking that there's another possible function of scouts that we haven't mentioned: forecasting injuries, especially among pitchers. You certainly hear predictions about future arm problems on certain pitchers all the time due to their mechanics (e.g. Tim Lincecum). It'd be interesting (though challenging...) to test those predictions at some point and see if they hold any water. Because variation in injury risk might be something that's very difficult to predict based on traditional stats, or even pitchf/x data (though I imagine release point data might be useful).


Post a Comment

<< Home