Tuesday, August 26, 2014

Sabermetrics vs. second-hand knowledge

Does the earth revolve around the sun, or does the sun revolve around the earth?

The earth revolves around the sun, of course. I know that, and you know that.

But do we really? 

If you know the earth revolves around the sun, you should be able to prove it, or at least show evidence for it. Confronted by a skeptic, what would you argue?  I'd be at a loss. Honestly, I can't think of a single observable fact that I could use to make a case.

I say that I "know" the earth orbits the sun, but what I really mean by that is, certain people told me that's how it is, and I believe them. 

Not all knowledge is like that. I truly *do* know that the sun rises in the east, because I've seen it every day. If a skeptic claimed otherwise, it would be easy to show evidence: I'd make sure he shared my definition of "east," and then I'd wake him up at 6 am and take him outside.

But that sun/earth thing?  I can only I only say I "know" it because I believe that astronomers *truly* know it, from direct evidence.


It occurred to me that almost all of our "knowledge" of scientific theories comes from that kind of hearsay. I couldn't give you evidence that atoms consist, roughly, of electrons orbiting a nucleus. I couldn't prove that every action has an equal and opposite reaction. There's no way I could come close to figuring out why and how e=mc^2, or that something called "insulin" exists and is produced by the pancreas. And I couldn't give you one bit of scientific evidence for why evolution is correct and not creationism. 

That doesn't stop us from believing, really, really strongly, that we DO know these things. We go and take a couple of undergraduate courses in, say, geology, and we write down what the professors tell us, and we repeat them on exams, and we solve mathematical problems based on formulas and principles we are told are true. And we get our credits, and we say we're "knowledgeable" in geology. 

But it's a different kind of knowledge. It's not knowledge that we have by our own experience or understanding. It's knowledge that we have by our own experience of how to evaluate what we're told -- how and when to believe other people. We extrapolate from our social knowledge. We believe that there are indeed people, "geologists," who have firsthand evidence. We believe that evidence gets disseminated among those geologists, who interact to reliably determine which hypotheses are supported and which ones are not. We believe that, in general, the experts are keeping enough of a watchful eye on what gets put in textbooks and taught at universities, that if Geology 101 was teaching us falsehoods, they'd get exposed in a hurry.

In other words, we believe that the system of scientists and professors and Ph.D.s and provosts and deans and journals and textbook publishers is a reliable separator of truth from falsehood. We believe that, if the earth really were only 6,000 years old, that's what scientists would be telling us.


Most of the time, it doesn't matter that our knowledge is secondhand. We don't need to be able to prove that swallowing arsenic is fatal; we just need to know not to do it. And, we can marvel at Einstein's discovery that matter and energy are the same thing, even if we can't explain why.

But it's still kind of unsatisfying. 

That's one of the reasons I like math. With math, you don't have to take anyone's word for anything. You start with a few axioms, and then it's all straight logic. You don't need geology labs and test tubes and chemicals. You don't need drills and excavators. You don't actually have to believe anyone on indirect evidence. You can prove everything for yourself.

The supply of primes is infinite. No matter how large a prime you find, there will always be one larger. That's a fact. If you like, you can look it up on the internet, or ask your math teacher, or find it in a textbook. It's a fact, like the earth revolving around the sun.

If you do it that way, you know it, but you don't really KNOW it. You can't defend it. In a sense, you're believing it on faith. 

On the other hand, you can look at a proof. Euclid's proof that there is no largest prime number is considered one of the most elegant in mathematics. The versions I found on the internet use a lot of math notation, so I'll paraphrase.


Suppose you have a really big prime number, X. The question is: is there always a prime bigger than X?  

Try this: take all the numbers from 1 to X, and multiply them together: 1 times 2 times 3 .... times X. Now, add 1. Call that really huge number N. That huge N is either prime, or is the product of some number of primes. 

But N can't be divisible by X, or anything less than X, because that division has to always leave a remainder of 1. Therefore: either N is prime, or, when you factor N into other primes, they're all bigger than X. 

Either way, there is a prime bigger than X.


I may not have explained that very well. But, if you get it ... now you know that there is no highest prime. If you read it in a book, you "know" it, but if you understand the proof, you KNOW it, in the sense that you can explain it and prove it to others.

In fact ... if you read it in a textbook, and someone tells you the textbook is wrong, you may have some doubt. But once you see the proof, you will *never* have doubt (except in your own logic). Even if the greatest mathematician in the world tells you there's a largest prime, you still know he's wrong. 


In theory, everything in math is like that, provable from axioms. In practice ... not so much. The proofs get complicated pretty quickly. (When Andrew Wiles solved Fermat's Last Theorem in 1993, his proof was 200 pages long.)  Still, there are significant mathematical results where we can all say we know from our own efforts. For years, I wondered why it was that multiplication goes both ways -- why 8 x 7 has to equal 7 x 8. Then it hit me -- if you draw eight rows of seven dots, and turn it sideways, you get seven rows of eight dots.

There are other fields like math that way ... you and I can know things on our own, fairly easily, in economics, and finance, and computer science. Other sciences, like physics and chemistry, take more time and equipment. I can probably prove to myself, with a stopwatch and ruler, that gravitational acceleration on earth is 9.8 m/s/s, but there's no way I could find evidence of what it is on the moon. 

But: sabermetrics. What started me on all this is realizing that the stuff we know about sabermetrics is more like infinite primes than like the earth revolving around the sun. Active researchers don't just know sabermetrics because Bill James and Pete Palmer told us. We know because we actually see how to replicate their work, and we see, all the way back to first principles, where everything came from. 

I can't defend "e equals mc squared," but I can defend Linear Weights. It's not that hard, and all I need is play-by-play data and a simple argument. Same with Runs Created: I can pull out publicly-available data and show that it's roughly unbiased and reasonably accurate. (I can even go further ... I can take partial derivatives of Runs Created and show that the values of the individual events are roughly in line with Linear Weights.)

DIPS?  No problem, I know what the evidence is, there, and I can generate it myself. On-base percentage more important than batting average?  Geez, you don't even need data for that, but you can still do it formally if you need to without too much difficulty. 

For my own part -- and, again, many of you active analysts reading this would be able to say the same thing --  I don't think I could come up with a single major result in sabermetrics that I couldn't prove, from scratch, if I had to. Even the ones from advanced data, or proprietary data, I'm confident I could reproduce if you gave me the database.

For all the established principles that are based on, say, Retrosheet-level data ... honestly, I can't think of a single thing in sabermetrics that I "know" where I would need to rely on other people to tell me it's true. That might change: if something significant comes out of some new technique -- neural nets, "soft" sabermetrics, biomechanics -- I might have to start "knowing" things secondhand. But for now, I can't think of anything.

If you come to me and say, "I have geological proof that the earth is only 6,000 years old," I'm just going to shrug and say, "whatever."  But if you come to me and say, "I have proof that a single is worth only 1/3 of a triple" ... well, in that case, I can meet you head on and prove that you're wrong. 

I don't really know that creationism isn't right -- I only know what others have told me. But I *do* know firsthand what a triple is worth, just as I *do* know firsthand that there is no highest prime. 


And that, I think, is why I love sabermetrics so much -- it's the only chance I've ever had to actually be a scientist, to truly know things directly, from evidence rather than authority.

I have a degree in statistics, but if nuclear war wiped out all the statistics books, how much of that science could I restore from my own mind?  Maybe, a first-year probability course, at best. I could describe the Central Limit Theorem in general terms, but I have no idea how to prove it ... one of the most fundamental results in statistics, one they teach you in your first statistics class, and I still only know it from hearsay.

But if nuclear war wipes out all the sabermetrics books ... as long as someone finds me a copy of the Retrosheet database, I can probably reestablish everything. Nowhere near as eloquently as Bill James and Palmer/Thorn, and I'd probably wouldn't think of certain methods that Tango/MGL/Dolphin did, but ... yeah, I'm pretty sure I could restore almost all of it. 

To me, that's a big deal. It's the difference between knowing something, and only knowing that other people know it. Not to put down the benefits of getting knowledge from others -- after all, that's where most of our useful education comes from. It's just that, for me, knowing stuff on my own ... it's much more fulfilling, a completely different state of mind. As good as it may be to get the Ten Commandments from Moses, it's even better to get them directly from God.

Labels: , ,


At Tuesday, August 26, 2014 6:19:00 PM, Blogger K. Medvedovsky said...

Just to give you a hard time, I'll point that your "axiom" here is being given the datasets. Be it box score data, play by play data, or whatever, we're largely taking it on faith that they were compiled correctly.

That's different than the set of axioms we work with in number theory for instance. You can't have the set of real numbers without the 11 axioms that define the set. We're taking the axioms as given because they're literally givens.

The process of creating the retrosheet database meanwhile was the result of tens of thousands of hours of human observation which we're taking on faith was done (largely) correctly, the occasional Hack Wilson RBI aside.

In a less nitpicky sense (or maybe more?), as you note, you can't prove the central limit theorem (nor could I obviously). Doesn't that also mean, to the extent you rely on it, or other uncontroversially true statistical or mathematical principles, that it's similarly going on faith?

That said, I agree with the thrust of this post. Something like (x)(R)APM in basketball went from being something that smarter people than I relied on as a strong measure of player value to being something "real" only when i learned to actually replicate it, even if I have no idea why ridge regression works, and played no part in assembling the datasets.

I don't know why that's the line, but it's there.

At Tuesday, August 26, 2014 7:11:00 PM, Blogger Phil Birnbaum said...

Right, I was deliberately ignoring easily-observable facts (like play-by-play data). I guess I'm willing to take those as axioms, things that any observer can notice. Like, "A shot rang out and Kennedy slumped" I'm willing to accept that I know, based on numerous eyewitnesses, but not "The trajectory of the bullet showed it came from the schoolbook depository" which is a conclusion drawn from evidence.

I had that in the post, but decided to leave it out. That always comes back to bite me in the comments. :)

Sure, when I realy on CLT, I'm going on faith. Like not drinking arsenic. But ... as you use something, and you see how it connects to other things, you DO get a sense of "knowing" it. I guess, from the standpoint of induction, I can count on the many times it actually did prove to be true, in my own simulations.

But here's one that I still take on faith: that the variance of the uniform distribution on (0,1) is 1/12. I got that from Wikipedia. I could probably prove it, but never bothered.

At Tuesday, August 26, 2014 10:52:00 PM, Blogger Don Coffin said...

Phil, not to pick nits, but...about the economics thing...it's more like math that you may like to believe. And people believe, on the basis of their own experience, things that are completely wrong. For example, that being on the gold standard would lead to stable prices. (Keynes demolished that proposition back in 1921.) Like government budgets are the same as household budgets--households can't "live beyond their means," and neither can governments." And more...

At Tuesday, August 26, 2014 10:59:00 PM, Blogger Phil Birnbaum said...


Sure, you can do economics wrong, just like you can do sabermetrics wrong ("RBIs are important"). But you can also do a lot of it right, from first principles, no?

At Wednesday, August 27, 2014 11:16:00 AM, Blogger Don Coffin said...

Yeah, but the first principles aren't necessarily intuitively obvious to the casual observer...much like math...

At Thursday, August 28, 2014 9:58:00 AM, Blogger Zach said...

"Most of the time, it doesn't matter that are knowledge is secondhand."

I think you mean "our knowledge" (just an fyi).

I've always wondered how much of the datasets Sabermetrics are based on have been influenced by pre-sabermetric managing and scouting. It's probably not significant, but it's something I've wondered.

At Thursday, August 28, 2014 10:18:00 AM, Anonymous Bill Frank said...

Copernicus actually did prove that the Earth revolves around the Sun.

At Thursday, August 28, 2014 11:13:00 AM, Blogger Phil Birnbaum said...

Thanks, Zach, fixed the error. Bill Frank: Copernicus proved it, but I can't!

At Monday, September 01, 2014 8:09:00 PM, Anonymous Anonymous said...

Are you simply saying you are a saber metics expert, and not an expert in another field? Couldn't a geologist say the same thing you did about geology for him/her. And sabermetrics to them would be lik geology is to you

At Wednesday, September 03, 2014 8:49:00 AM, Anonymous aweb said...

You got a degree in Statistics without having to prove the Central Limit Theorem? I find that odd. I couldn't reproduce it right now, but I definitely had to do that a few times in higher level Stats classes that I took (I also have a Stats degree or two). It was a painful proof - looking around, I used Moments and characteristic functions for it at some point.

My favourite type of Mathematics for proofs and the spooky way things work out, perfetly, every time, was always linear algebra (matrices).

At Thursday, October 23, 2014 3:22:00 AM, Blogger Mo said...

Phil, you need to write a book. Your writing is very entertaining (and enlightening).

I've often thought it would be cool to have a time machine and travel back a hundred years or so. Then I realize that there would be no radios, or TVs or phones, and I would know that we had them in the future, but I would have no idea how to build one.

Then again, if I had a time machine, I would make sure I learned how to invent something major before going back. Yes... then I could be famous like Farnsworth or Tesla.

Good stuff-- your posts often make me think.


Post a Comment

Links to this post:

Create a Link

<< Home