This is the second entry a series, GoodEnough For Me, a companion series to Extrapolator’s Smell’s Like Pujols. The original GoodEnough can be found here, and the latest entry in SMP can be read here.
We are just about 2 weeks into spring training, a time when sports writers are happy to be wearing short sleeves and sunglasses under the Florida sun instead of hats and coats and gloves and misery in places like Philadelphia, Boston, New York, Milwaukee, Cleveland, Cincinnati, and so on. (Mr. Thursday does not care for winter).
And so these middle aged men, so enlivened by the lovely weather, so happy to be away from nagging wives and copy editors, away from snow, so giddy at again being able to eat fresh crab in February, knowingly and intentionally pass their joy on to the readers of their columns, in the form of making you believe your team is better than it has in the past. The best writers convince you every year. Joe Posnanski, one of baseball’s finest writers (so fine, in fact, that the Curious Mechanism keeps loose tabs on the Royals just so we can follow his columns) has made a habit every year of predicting, boldly, the Royals will advance to the championship rounds. This year, he’s scaled back–no, no, no, not the postseason, he says, but they’ll be better! Other writers in every baseball city will convince their ballpark faithful of this. The young guys are coming along, the veterans are feeling healthy and spry, the new coaches are having positive impact. The first Cactus and Grapefruit League games start today–games where the writers and fans alike will ignore all the screw-ups and concentrate only on the positives. Hope springs eternal in February.
Our first entry into this series took a cursory look at Similarity Scores for pitchers. Today, we take a more in depth angle, and we issue our complaints. For purposes of explanation, we’ll use real life examples–namely the starting pitchers of 2006.
For the record, we used any starting pitcher with at least 50IP. We choose this number somewhat arbitrarily, but it’s still a good one, as it’s entirely possible that many of our rookies will fail to get significantly above that mark. We calculated the league averages for the pitchers, and we selected the best pitcher (Johan Santana) and the worst (Josh Towers). Here are some of their relevant numbers:
Player – W – L – ERA – IP – SO – BB
Santana – 19 – 6 – 2.77 – 233.67 – 245 – 47
Towers – 2 – 10 – 8.42 – 62 – 35 – 17
MLB Average – 9 – 8 – 4.50 – 144.33 – 101 – 47
It’s plainly obvious that Santana far exceeded the Major League averages, which, in turn, far exceeded the performance of Josh Towers. If we set Johan Santana as our basis of comparison, the average performance scores only a 795. By this measure, Towers scores a 772.
This is where we have a problem.
The Similarity Score says the MLB average was 79% as good as Santana. However, it also says Josh Towers was 97% as good as the MLB average. A pitcher 8 games under .500 is as good as 1 game above. 62 innings pitched are as good as 144. Double the ERA? No problem!
At least two reasons for this exist. First, SimScores value certain things very highly, and everything else very low. Of the 228 points subtracted from Santana to get Towers’ score, 200 of those came as a result of winning percentage, and ERA. Similarly for the MLB, of the 205 subtracted points, 187 points came at the hands of those two statistics. Even though SimScores consist of wins, losses, winning percentage, ERA, games pitched, starts, complete games, innings pitched, hits allowed, strikeouts, walks, shutouts, and saves, close to 90% of the value of similarity scores come from ERA and winning percentage.
However, these two stats are actually limitedin their capabilities. Both are capped at 100 points. So, even though Josh Towers won only 16.7% of his games and the MLB won 52.9% of theirs, both had the same 100 points subtracted from them, as they paled in comparison to Santana’s 76.0% win rate. Likewise, the MLB’s ERA was so bloated next to the fine work of Santana that it cost the MLB 87 points. Towers’ 8.42 ERA was only 100 points worse than Santana’s 2.77, as the number was capped. Had it not been, Tower’s ERA would have subtracted 286 points from his total score.
Without caps, the MLB would have a SimScore of 780. Josh Towers would have 390.
While this seems like a more realistic reflection of single season performance, we do have a problem. Our two main statistics have gone from accounting for 85-90% of the score, to accounting for 90-95%. This is too much weight put on two stats, especially when both stats are rate stats. By this method, a pitcher who pitches very well over a short period of time would compare very favorably. Let’s imagine a September call-up starting pitcher (we’ll call him “Thursday”) who produced the following line:
Pitcher – W – L – GS – ERA – IP – SO – BB
Thursday – 4 – 1 – 6 – 3.50 – 39 – 42 – 22
This pitcher clearly performed well over his six starts, but even with Santana-like performances in each start, having pitched only 1/6th as often, his similarity score should reflect that. Instead, our hypothetical pitcher scores a 962. Thursday actually gains on Santana by having a better win percentage and by having fewer walks allowed, hits allowed, and losses. We must find some way to increase the value of cumulative statistics.
The most significant cumulative statistic is innings pitched. Let’s look at our SimScores if we dramatically increase the value of IP:
Pitcher – SimScore – Uncapped SimScore – Uncapped SimScore w/IP
Santana – 1ooo – 1ooo – 1000
MLB – 795 – 780 – 692
Towers – 772 – 390 – 222
Thursday – 962 – 962 – 771
As you can see, this has had a significant effect on the scores, dropping each pitcher by an average of 149 points. It also, we think, better reflects the value of each pitcher. An average major league pitcher isn’t quite 70% as good as Santana, and Josh Towers was barely worth 1/5 of Santana. The biggest eyesore is, again, Thursday, as his Santana-like performance for a single month is worth more than a decent pitcher for an entire year. That may be true, but we’re not confident of that.
Regardless, we’re making progress, and we promise to get this weekly update back on Mondays where it belongs for next week.