Netflix, the big rent-movies-by-mail company, has a website that tries to guess which movies you’ll like based on your evaluations of movies you’ve already seen. They have now announced that they will award a million dollars to anyone who can improve their prediction algorithm by 10%. It’s open to anyone, including sabermetricians, anywhere in the free world except Québec.
Here’s how it works. When you register at the site, you download a “training” database of 100,000,000 assorted movie ratings (1 to 5 stars, integers only) from 450,000 different customers. You then devise the best algorithm you can to accurately predict some of those ratings from other of those ratings. When you’re happy with your algorithm, you run it on a “final exam” database of 2,800,000 movies from those same customers (with the actual ratings not provided). You send Netflix your 2,800,000 estimates, they compare it to the “real” ratings to calculate your accuracy, and post your results on the site.
If you beat Netflix’s standard error by more than 10%, you win a million bucks. If you can’t beat 10%, but you’re still the best for the year, you get $50,000.
My first reaction is that reducing the standard error by 10% could very easily be impossible. I’m pretty sure it’s impossible to chop 10% off the standard error of, say, Runs Created, because there’s a minimum level of inherent noise in the data, and Runs Created is already within 10% of it. You could figure this out mathematically – take a typical batting line, scramble its at-bats around randomly, and see how many runs score. Repeat a couple of hundred thousand times. Find the standard error of all your results. That’s the minimum intrinsic error of any stat that’s based on a batting line. If that’s more than 90% of the standard error of Runs Created, well, you can’t win the million dollars.
To say that another way, suppose a team matching Derek Jeter’s batting line scored 80 runs half the time, and 90 runs the other half. In that case, the best you could do would be to guess 85 runs, for a standard error of 5. If the rules of baseball are flaky enough to evenly split real life between 80 and 90, that’s your limit of accuracy, period. No matter how much money you were offered, it would be impossible to do any better.
And I think estimators like Runs Created are already butting up against that limit. I’m pretty sure that 10% is out of the question.
Can you do the same kind of calculation for Netflix data? Not the same way – there are decent simulations of baseball, but no reliable simulation of the human brain’s response to movies. But maybe by analyzing the hundred million ratings, and the statistical properties of the data, you might be able to come up with a persuasive argument that the million dollar threshold is unreachable. I don’t think they’d pay you for the argument, but I’d certainly buy you a beer.
Of course, I might be wrong about this. Someone has already taken 1% off, and the contest has been running for only a week …
(Thanks to Freakonomics for the pointer.)