Thursday, February 14, 2013

Explaining the MLB Forecaster's Over/Under Projections

THE IDEA

The idea for the MLB Forecaster's projections of an over/under line came from two different places. At Christmas I was given the Nate Silver book "The Signal and the Noise". Silver gained superstar status in 2012 when his 538.com blog correctly projected fifty out of fifty states in the U.S. presidential election. But prior to moving into the realm of political projections, Silver had been an above-average online poker player (like myself) and a fan of major league baseball.

Silver developed the PECOTA projection system (which he later sold) to forecast baseball players' career trends. Without getting into the details of PECOTA here, the basic idea was this: in sports, as with anything in life, you can't expect to predict things perfectly. What you can do is work through an array of possibilities, and see where things average out.

How could this be applied to projections for a team? The idea then came to me in a conversation with a friend of mine about the Blue Jays' fortunes for 2013. The Jays made a huge splash in the off-season by netting R.A. Dickey, Melky Cabrera, Jose Reyes, Josh Johnson, Mark Buehrle and Emilio Bonifacio without giving up any significant roster pieces. Vegas placed the Jays as the new World Series favourites. But my friend, who is even more pessimistic than I am when it comes to sports, said "Well, this is all great...but what if Morrow gets hurt? Or what if Bautista doesn't come back from his wrist surgery as the same player? What if Romero stinks again?"

That got me thinking...yes, what if? But against those pessimistic what-ifs, there are also corresponding optimistic what-ifs for each team. So I thought that it would be a good idea to run through different permutations for players, and see how those permutations combined together to project a team's overall success.

(As a side note, I am not by any means claiming originality on this...ZIPS and countless other projection systems, including presumably the ones made by the Vegas bookkeepers, already have far more scientific methods of setting W/L records for teams).

HOW IT WORKS

Using baseball-reference.com's WAR statistics, I have projected out expected WAR levels for the starting lineup of teams, their bench, their starting rotation, their closer, and their bullpen. But it's not as simple as just picking a number for each player. I've projected for ten tranches, then taken an average of those tranches and placed them into three groups (poor, medium, good). This way we can answer the questions: what does a bad Jose Bautista season look like? How about a career year for Adam Jones? How much value does Derek Jeter have if age catches up to him this year? What if it doesn't?

By then combining these numbers into every different permutation, we can see how many wins above replacement a team is expected to get. For example, once in a blue moon, every single player (plus the bench and bullpen) will have terrible seasons by their standards. In other cases, 14/17 of them might. Welcome to the 2012 Boston Red Sox and the 2012 Miami Marlins.

What the MLB Forecaster attempts to pinpoint is the midpoint of the wins above replacement total. If a team crosses the 50% mark at, say, the 90-win point, then that means that I believe that you should bet the over if a team is listed at 85.5 by the Vegas bookmakers, and the under if they are listed at 95.5.

DOES WAR ACTUALLY CORRELATE TO A TEAM'S WINS?

Over the past five years, WAR has had a definite correlation to a team's Pythagorean W-L expectancy. It has also had a strong correlation to the actual number of games they win (although the American League gets docked 2 wins from their total for difficulty of schedule, while the National League gets 2 wins added. I may have to make further adjustments for the stacked American League East).

The leaguewide win total always correlates with the leaguewide WAR total, which is the only way that the statistic provides any value. However, for a certain team, there can be extreme exceptions as to whether or not their team WAR correlates to the amount of games they actually win. Over the past ten seasons, the majority of these have been within the +3/-3 margin of error. However, outliers occur every once in a while. In 2012, for example, the Baltimore Orioles had a team WAR of 83.5. Their Pythagorean W-L expectancy was 84 wins. They in fact won 93 games. This could be the result of luck or perhaps a certain moxy when it comes to close ballgames. Regardless, there is no point from my perspective of allotting for these discrepancies...the statistics show that they balance themselves out over the course of time.

So when I say that a team only finishes with, say, under 75 wins in 6.7% of our projections, what I'm really saying is that, with even luck, they finish with a WAR expectancy of under 75 wins 6.7% of the time. They could finish with a WAR expectancy of 75 but actually win only 68 games because of poor luck, or they could similarly win 82 games because of good luck. But from a probabilities perspective, I can only forecast neutral luck.

FLAWS IN THE SYSTEM

There are two major flaws in this projection system. The first is legitimate, the second somewhat dubious:

1. A team either in contention early on or out of contention early on could significantly change their roster throughout the year.

2. The projections treat each player as if they are isolated. Baseball is the only sport that could even conceive of this. A similar projection for basketball, for example, would make no sense...if LeBron James was injured and finished with a bottom-10% LeBron James season, it stands to reason that the rest of his team would also suffer. But baseball is essentially an individual sport masquerading as a team one.

With that said, there is the possibility that slumps and/or hot streaks can infect an entire team. Players on a disappointing team might be less willing to return quickly from an injury. The bullpen of a team with underachieving starting pitchers might be taxed from their extra workload. All of these are legitimate ways in which a team's fortunes can snowball in either positive or negative directions.

But for the sake of placing a bet on a team's over/under total before the season starts, we are all equally ignorant. We can only look forward based on the information we have now. So with all of that said, it's time to start projecting.

No comments:

Post a Comment