Cabrera's MVP award was thought to be on the backs of "bad stats." Is this a bad thing in general? Photo AP via sportingnews.com
(First Article in a series discussing Baseball Statistics that I mostly wrote months ago and was waiting for downtime to post. As it happens, the posts that I have in the can for months on end tend to get rather bloated; this one is > 3000 words. Apologies in advance if you think that’s, well, excessive).
(Note: a good starting point/inspiration for this series was a post from February 2012 on ESPN-W by Amanda Rykoff, discussing some of the stats used in the movie Moneyball. Some of the stats we’re discussing in the next few posts are covered in her article).
The more you read modern baseball writing, the more frequently you see the inclusion of “modern” baseball statistics interspersed in sentences, without definition or explanation, which are thus used to prove whatever point the writer is making. Thus, more and more you need a glossary in order to read the more Sabr-tinged articles out there. At the same time, these same writers are hounding the “conventional” statistics that have defined the sport for its first 100 years and patently ridiculing those writers that dare use statistics like the RBI and (especially as of late) the pitcher Win in order to state an opinion. This is an important trend change in Baseball, since these modern statistics more and more are used by writers to vote upon year ending (and career defining) awards, and as these writers mature they pour into the BBWAA ranks who vote upon the ultimate “award” in the sport; enshrinement into the Hall of Fame.
This year’s AL MVP race largely came down to the issue of writers using “old-school” stats to value a player (favoring Miguel Cabrera and his triple-crown exploits) versus “new-school” stats to value a player (favoring Mike Trout, who may not have as many counting stats but has put in a historical season in terms of WAR). And as we saw, the debate was loud, less-than-cordial, and merely is exacerbating a growing divide between older and newer writers. This same argument is now seen in the Hall of Fame voting, and has gotten so derisive that there are now writers who are refusing to vote for anyone but their old-school stat driven pet candidates as a petulant reaction to new-school writers who can’t see the forest for the trees in some senses.
A good number of the stats that have defined baseball for the past 100 years are still considered “ok,” within context. Any of the “counting stats” in the sport say what they say; how many X’s did player N hit in a season? Adam Dunn hit 41 homers in 2012, good for 5th in the league. That’s great; without context you’d say he’s having a good, powerful season. However you look deeper and realize he hit .204, he didn’t even slug .500 with all these homers and he struck out at more than a 40% clip of his plate appearances. And then you understand that perhaps home-runs by themselves aren’t the best indicators of a player’s value or a status of his season.
Lets start this series of posts with this topic:
What’s wrong with the “old school” baseball stats?
Most old school stats are “counting” stats, and they are what they are. So we won’t talk about things like R, H, 2B, 3B, HR, BB, K, SB/CS. There’s context when you look at some these numbers combined together, or if you look at these numbers divided by games or at-bats (to get a feel for how often a player hits a home run or steals a base or strikes out a guy). In fact, K/9, BB/9 and K/BB ratios are some of my favorite quick evaluator statistics to use, especially when looking at minor league arms. But there are some specific complaints about a few of the very well known stats out there. Lets discuss.
1. Runs Batted In (RBI). Or as some Sabr-critics now say it, “Really Bad Stat.” The criticism of the RBI is well summarized at its Wiki page; it is perceived more as a measure of the quality of the lineup directly preceeding a hitter than it is a measure of the value of the hitter himself. If you have a bunch of high OBP guys hitting in front of you, you’re going to get more RBIs no matter what you do yourself. Another criticism of the stat is stated slightly differently; a hitter also benefits directly from his positioning in the lineup. A #5 hitter hitting behind a powerful #4 hitter will have fewer RBI opportunities (in theory), since the #4 hitter should be cleaning up (no pun intended) the base-runners with power shots. Likewise, a lead-off hitter absolutely has fewer RBI opportunitites than anyone else on the team; he leads-off games with nobody on base, and hits behind the weakest two hitters in the lineup every other time to bat.
I’m not going to vehemently argue for the RBI (the points above are inarguable). But I will say this; statistical people may not place value on the RBI, but players absolutely do. Buster Olney touched on this with an interesting piece in September that basically confirms this; if you ask major leaguers whether RBIs are important you’ll get an across-the-board affirmative. Guys get on base all the time; there’s absolutely skill and value involved in driving runners home. Guy on 3rd with one out? You hit a fly ball or a purposefully hit grounder to 2nd base and you drive in that run. Players absolutely modify the way that they swing in these situations in order to drive in that run. And thus RBI is really the only way you can account for such a situation. The Runs Created statistics (the original RC plus the wRC stats) don’t account for this type of situation at all; it only measures based on hits and at-bats.
(As a side-effect, the statistic Ground-into Double Plays has a similar limitation to RBI: it really just measures how many batters were ahead of you on base as opposed to a hitter’s ability to avoid hitting into them. But thankfully GIDP isn’t widely used anywhere).
2. Batting Average (BA): The isolated Batting Average is considered a “limited” stat because it measures a very broad hitting capability without giving much context to what that hitter is contributing to the end goal (that being to score runs). A single is treated the same as a home run in batting average, despite there being a huge difference between these two “hits” in terms of creating runs. This is exemplified as follows: would you rather have a .330 hitter who had zero home runs on the season, or a .270 hitter who hit 30 home runs? Absolutely the latter; he’s scoring more runs himself, he’s driving in more runs for the team, and most likely by virtue of his power-capability he’s drawing more walks than the slap hitting .330 hitter. More properly stated, the latter hitter in this scenario is likely to be “creating more runs” for his team.
Statistical studiers of the game learned this limitation early on, and thus created two statistics that need to go hand in hand with the Batting Average; the On-Base Percentage (OBP) and the Slugging Percentage (SLG). This is why you almost always see the “slash line” represented for hitters; to provide this context. But, be careful REPLACING the batting average with these two numbers (or the OPS figure, which represents On-Base percentage + Slugging). Why? Because Batting Average usually comprises about 80% of a players On base percentage. Even the highest walk guys (guys like Adam Dunn or Joey Votto) only have their walk totals comprising 17-18% of their OBP. If you sort the league by OBP and then sort it by BA, the league leaders are almost always the same (albeit slightly jumbled). So the lesson is thus; if someone says that “Batting Average is a bad stat” but then says that “OBP is a good stat,” I’d question their logic.
Lots of people like to use the statistic OPS (OBP+SLG) as a quick, shorthand way of combing all of these stats. The caveat to this is thus; is a “point” of on-base percentage equal to a “point” of slugging? No, it is not; the slugging On Base Percentage point is worth more because of what it represents. Per the correction provided in the comments, 1.7 times more.
Coincidentally, all of the limitations of BA are attempted to be fixed in the wOBA, which we’ll discuss in part 2 of this series.
3. ERA: Earned Run Average. Most baseball fans know how to calculate ERA (earned runs per 9 innings divided by innings pitched), and regularly refer to it when talking about pitchers. So what’s wrong with ERA?
Specifically, ERA has trouble with situations involving inherited runners. If a starter leaves a couple guys on base and a reliever allows them to score, two things happen:
- those runs are charged to the starter, artifically inflating his ERA after he’s left the game.
- those runs are NOT charged to the reliever, which artificially lowers his ERA despite his giving up hits that lead to runs.
ERA is also very ball park and defense dependent; if you pitch in a hitter’s park (Coors, Fenway, etc) your ERA is inflated versus those who pitch in pitcher parks (Petco, ATT). Lastly, a poor defense will lead to higher ERAs just by virtue of balls that normally would be turned into outs becoming hits that lead to more runs. Both these issues are addressed in “fielder independent” pitching stats (namely FIP), which are discussed in part III of this series.
A lesser issue with ERA is the fact that it is so era-dependent. League Average ERAs started incredibly high in the game’s origin, then plummetted during the dead ball era, rose through the 40s and 50s, bottomed out in the late 60s, rose slightly and then exponentially during the PED era and now are falling again as more emphasis is placed on power arms and small-ball. So how do you compare pitchers of different eras? The ERA+ statistic is great for this; it measures a pitcher’s ERA indexed to his peers; a pitcher with an 110 ERA+ means that his ERA was roughly 10% better than the league average that particular year.
4. Pitcher Wins. The much maligned “Win” statistic’s limitations are pretty obvious to most baseball fans and can be stated relatively simply; the guy who gets the “Win” is not always the guy who most deserves it. We’ve all seen games where a pitcher goes 7 strong innings but his offense gives him no runs, only to have some reliever throw a 1/3 of an inning and get the Win. Meanwhile, pitchers get wins all the time when they’ve pitched relatively poorly but their offense explodes and gives the starter a big lead that he can’t squander.
Those two sentences are the essence of the issue with Wins; to win a baseball game requires both pitching AND offense, and a pitcher can only control one of them (and his “control” of the game is lost as soon as the ball enters play; he is dependent on his defense to get a large majority of his outs, usually 60% or more even for a big strike out pitcher). So what value does a statistic have that only measures less than 50% of a game’s outcome?
The caveat to Wins is that, over the long run of a player’s career, the lucky wins and unlucky loseses usually average out. One year a guy may have a .500 record but pitch great, the next year he may go 18-3 despite an ERA in the mid 4.00s. I have to admit; I still think a “20-game winner” is exciting, and I still think 300 wins is a great hall-of-fame benchmark. Why? Because by and large wins do end up mirroring a pitcher’s performance over the course of a year or a career. The downside is; with today’s advances in pitcher metrics (to be discussed in part III of this series), we no longer have to depend on such an inaccurate statistic to determine how “good” a pitcher is.
Luckily the de-emphasis of Wins has entered the mainstream, and writers (especially those who vote for the end-of-year awards) have begun to understand that a 20-game winner may not necessarily be the best pitcher that year. This was completely evident in 2010, when Felix Hernandez won the Cy Young award despite going just 13-12 for his team. His 2010 game log is amazing: Six times he pitched 7 or more innings and gave up 1 or fewer runs and got a No Decision, and in nearly half his starts he still had a “quality start” (which we’ll talk about below briefly). A more recent example is Cliff Lee‘s 2012 performance, where he didn’t get a win until July, getting 8 no-decisions and 5 losses in his first 13 starts. For the year he finished 6-9 with a 3.16 ERA and a 127 ERA+. Clearly Lee is a better pitcher than his W/L record indicates.
(Coincidentally, I did a study to try to “fix” pitcher wins by assigning the Win to the pitcher who had the greatest Win Percentage Added (WPA). But about 10 games into this analysis I found a game in April of 2012 that made so little sense in terms of the WPA figures assigned that I gave up. We’ll talk about WPA in part 4 of this series when talking about WAR, VORP and other player valuation stats).
5. Quality Starts (QS) Quality Starts aren’t exactly a long standing traditional stat, but I bring them up because of the ubiquitous nature of the statistic. It is defined simply as a start by a pitcher who pitches 6 or more innings and who gives up 3 or less earned runs. But immediately we see some issues:
- 6 IP and 3 ER is a 4.50 ERA, not entirely a “quality” ERA for a starter. In fact, a starter with a 4.50 ERA in 2012 would rank him 74th out of 92 qualified starters.
- If a pitcher pitches 8 or 9 complete innings but gives up a 4th earned run, he does not get credit for the quality start by virtue of giving up the extra run, despite (in the case of a 9ip complete game giving up 4 earned runs) the possibility of actually having a BETTER single game ERA than the QS statistic defaults to.
Why bring up QS at all? Because ironically, despite the limitations of the statistic, a quality start is a pretty decent indicator of a pitcher’s performance in larger sample sizes. Believe it or not, most of the time a quality start occurs, the pitcher (and the team) gets the win. Take our own Gio Gonzalez in 2012; he had 32 starts and had 22 quality starts. His record? 21-8. Why does it work out this way? Because most pitchers, when you look at their splits in Wins versus Losses, have lights out stuff in wins and get bombed during losses. Gonzalez’s ERA in wins, losses and no-decisions (in order): 2.03, 5.00 and 4.32. And, in the long run, most offenses, if they score 5 or more runs, get wins. So your starter gives up 3 or fewer runs, hands things over to a bullpen that keeps the game close, your offense averages 4 runs and change … and it adds up to a win.
I used to keep track of what I called “Real Quality Starts (rQS)” which I defined as 6 or more IP with 2 or fewer earned runs, with allowances for a third earned run if the pitcher pitched anything beyond the 6 full innings. But in the end, for all the reasons mentioned in the previous paragraph, this wasn’t worth the effort because by and large a QS and a rQS both usually ended up with a Win.
6. Holds: A “hold” has a very similar definition as the Save, and thus has the same limitations as the Save (discussed in a moment). There was a game earlier this season that most highlights the issues with holds, as discussed in this 9/21/12 post on the blog Hardball Times. Simply put; a reliever can pitch pretty poorly but still “earn” a hold.
Holds were created as a counting stat in the mid 1980s in order to have some way to measure the effectiveness of middle relievers. Closers have saves, but middle-relief guys had nothing. The problem is; the hold is a pretty bad statistic. It has most of the issues of the Save, which we’ll dive into last.
7. Saves. I have “saved” the most preposterous statistic for last; the Save. The definition of a Save includes 3 conditions that a reliever must meet; He finishes a game but is not the winning pitcher, gets at least one out, and meets certain criteria in terms of how close the game is or how long he pitches. The problem is that the typical “save situation” is not really that taxing on the reliever; what pitcher can’t manage to protect a 3 run lead when given the ball at the top of the ninth inning? You can give up 2 runs, still finish the game, have a projected ERA of 18.00 for the outing and still get the save. Ridiculous. And that’s nothing compared to the odd situation where a reliever can pitch the final 3 innings of a game, irrespective of the score, and still earn a save. In the biggest blow-out win of the last 30 years or so (the Texas 30-3 win over Baltimore in 2007), Texas reliever Wes Littleton got a save. Check out the box score.
I wrote at length about the issue with Saves in this space in March of 2011, and Joe Posnanski wrote the defining piece criticising Saves and the use of closers in November 2010. Posnanski’s piece is fascinating; my biggest takeaway from it is that teams are historically winning games at the exact same rate now (with specialized setup men and closers) that they were winning in the 1950s (where you had starters and mop-up guys).
I think perhaps the most ridiculous side effect of the Save is how engrained in baseball management it has become. Relievers absolutely want Saves because they’re valued as counting numbers they can utilize at arbitration and free agency hearings to command more salary (I touched on in a blog post about playing golf with Tyler Clippard this fall; he absolutely wanted to be the closer because it means more money for him in arbitration). Meanwhile, there are managers out there who inexplicably leave their closer (often their best reliever, certainly their highest paid) out of tie games in late innings because … wait for it … its not a save situation. How ridiculous is it that a statistic now alters the way some managers handle their bullpens?
What is the solution? I think there’s absolutely value in trying to measure a high leverage relief situation, a “true save” or a “hard save.” Just off the top of my head, i’d define the rules as this:
- there can only be a one-run lead if the reliever enters at the top of an inning
- if the reliever enters in the middle of an inning, the tying run has to at least be on base.
- the reliever cannot give up a run or allow an inherited run to score.
Now THAT would be a save. Per the wikipedia page on the Save, Rolaids started tracking a “tough save” back in 2000, and uses it to help award its “Fireman of the Year” award, but searching online shows that the stats are out of date (they’re dated 9/29/11, indicating that either they only calculate the Tough Saves annually, or they’ve stopped doing it. Most likely the former frankly).
Phew. With so many limitations of the stats that have defined Baseball for more than a century, its no surprise that a stat-wave has occurred in our sport. Smart people looking for better ways to measure pitchers and batters and players.
Next up is a look at some of the new-fangled hitting stats we see mentioned in a lot of modern baseball writing.