Ranking MLB Organizational Strength by Position

The charts that follow combine each team’s share of 2018 WAR at a given position, with their share of Future Value (FV) at that same position (as determined by the Fangraphs BOARD prospect ratings). The sum of these two shares calculate the relative positional strength at each position, which is how the charts are ordered. So, for example, the Indians have Francisco Lindor and a whole bunch of shortstop prospects. Lindor and friends combined for about 9% of the WAR at shortstop in 2018, and Cleveland’s plethora of SS prospects make up 8% of the future value at SS as well. When we combine the two shares (9%+8%) we get 17% – far and away atop the shortstop position, and well in front of the runner-up Nationals.

This iteration reflects rosters as of 12/10/2018 – so Paul Goldschmidt’s share of WAR is embedded in the Cardinals’ portion of the 1B chart, and Patrick Corbin’s with the Nats. As you can see, a considerable share of 2018 Catcher and 2B WAR is still available on the FA market. I’ll have a longer post where I walk through each of these at some point, but it’s a little difficult at this juncture since they’re changing almost every day – with the M’s and Dbacks both in some kind of tear-down mode, and the bulk of free agents yet to find a team, these charts are far from what they’ll look like on Opening Day.

The Baseball-Nerdiest Cities in the US According to Google Trends

Here we’ll use data from Google Trends to determine the nerdiest baseball cities in the US. About 90% of the words in this post are on the basis and methodology for the analysis. If you just want to see the rankings, skip to the end.

Whenever I discover that someone I know is a baseball fan, I try throwing a few advanced metrics into the conversation just to gauge familiarity. I should preface this by mentioning I don’t think any level of familiarity with advanced metrics changes a person’s value as a baseball fan – whether we’re exporting Fangraphs data or we’re listening to sports talk radio, passion is passion regardless of how we waste our time with something as pointless as baseball fandom. But while I enjoy baseball conversations with fans of all types simply because its baseball, I love getting the insight from fellow stat geeks because I want to know what others value. There’s so much to learn from the massive collection of data generated by baseball that it’s impossible for one person to know everything on their own. The biggest problem I have throwing in the statistical jargon is simply the lack of bites on the other side of the conversation; I never get the long-awaited bWAR versus fWAR debate that I truly long for in casual chitchat. I see it on the message boards in droves. Fangraphs, Beyond the Box Score, and even Reddit all seem packed with geeks, so why isn’t the bar by Angel Stadium flooded with a few of the same people an hour before first pitch? This had me thinking about something that’s probably kind of dumb, but to me at least, is still very interesting…

Geographically speaking, where do I find all the baseball stat geeks? The nearest brewery to which stadium am I most likely to find someone equally as annoyed as I am about being unable to split half-seasons and years in the same export at Fangraphs?

Thanks to Google’s dominance in both search engine quality and creepy monitoring of our each and every move, Google Trends was my go-to resource for the data I collected. For the uninitiated, Google Trends is a way to measure the search interest in a particular term over time or space (geography), and also compare the interest of different terms to each other over those same dimensions. “Search interest” is probably better defined by the less-marketable term “search volume”, though the data produced by Google Trends isn’t a direct measure of volume like total searches or search percentage – it’s a 0-100 scale that controls for general search activity in a given area (same as controlling for population). Now I probably could’ve simply looked up Fangraphs on Google Trends (which I did) and called it a day, but the lack of rigor made it seem shallow.

It’s obvious from the Google Trends graphic when and where Fangraphs garners the most interest; baseball season, the Pacific Northwest, the area between Chicago and St Louis (I looked this up and it’s apparently called the “North Central Midwest”) and the Pittsburgh area. But we’re still not done. For US-only searches, Google Trends usually returns more data points using the “Metro” subregion, which is actually the Designated Market Area (DMA) used by Nielsen (the TV ratings people) rather than the Metropolitan Statistical Area as I’d first assumed. The exported data from Google Trends for Fangraphs revealed a handful of DMAs with search volume too low to register any quantifiable level of interest. I’d wondered if the same places would garner similar results for Baseball-Reference, and much to my delight, I found a correlation coefficient of r=0.85 (R2 is shown on the chart). I also pulled Google Trends data for the phrase “Happy Thanksgiving” (it was trending at the time) as a control set to reassure myself the correlation between Fangraphs and Baseball-Reference wasn’t a probable outcome for any random Google search; this yielded a correlation coefficient of r=-0.10 with Fangraphs…hooray!

This means it’s very likely certain regions are more baseball-nerdy than others – not that searching for Fangraphs or Baseball-Reference makes a person a baseball nerd, but the aggregated data certainly represents a solid proxy. I wanted to collect more Google Trends data on search terms that are similarly stat-geeky, so I tried “sabermetrics”, “Bill James”, and “Moneyball”, but neither sabermetrics nor Bill James yielded enough data points due to lack of volume, and the very vast majority of Moneyball’s search volume was generated when the movie came out – not a desirable trait. Still, with more search terms, we have more data, and we’ll be a lot more confident in the results while mitigating bias; similar to diversifying your portfolio to mitigate risk. So I eventually ran the following through Google Trends, both individually and combined (for volume comparison between terms):

My goal, if you couldn’t tell, was to use the aggregated data to determine the best (and worst) baseball-nerd cities and regions by summing the total interest generated for the five Google searches by location. To accurately reflect their search proportions, each Google Search was weighted by its individual search volume relative to the combined volume of all five (visualized in the appropriately titled donut chart).

I added a finishing touch for a sixth and final Google search-related measurement that doesn’t fall completely in-line with the other 5:

  • The search volume of MLB compared to the search volume of NFL
    • Each metro has a combined MLB-NFL search score of 100
      • One score for MLB
      • One score for NFL
      • They sum to 100
    • So the most baseball-ish city possible would have a MLB score of 100 and an NFL score of 0 (this place doesn’t exist…or else it’s way too small to be a blip on Google’s radar)
      • A sad (though not too relevant) side note – no metros returned an MLB score above 42…but way to go Peoria-Bloomington, IL!
    • Since it isn’t measured in terms of volume, I weighted the MLB-NFL score at 1/6th (~16.7%) of the combined score and re-weighted the other five accordingly

The final weighting of the combined score is shown in the next chart:


The very last table in this post reflect the complete rankings, which I can’t say are too surprising. The map generated by the initial Fangraphs search made them a little more predictable, but there’s certainly more clarity after we combine all the data. I also put together a heat map in Tableau that visualizes nerdiness in the 48 contiguous states (sorry Hawaii, Honolulu [112th] did register some data…nowhere in Alaska did). Let me also mention that Tableau doesn’t recognize Designated Market Areas as a geographic variable, so I had to map them out by ZIP code…which was the greatest pain in this entire post before I learned it would’ve been much easier had I mapped them out by county instead of ZIP.

Well…the numbers basically speak for themselves. As much as I cringe hearing Cardinals fans claim the “Best Fans in Baseball” designation, they’re easily the nerdiest. They’re also the most engaged on social media, which makes their nerdiness pretty understandable. So congratulations St. Louis and surrounding area, you’re a bunch of nerds – which makes me reeeeeeally want to visit Busch Stadium when the A’s come to town in 2019. The Columbia-Jefferson City area is the market directly west of St. Louis, and directly east of Kansas City, though the Cardinals generate about twice the search volume the Royals do in the area. The third result, Champaign-Springfield-Decatur, IL, generates more search volume for the Cubs than the Cardinals, however – SO YOU AREN’T THAT GREAT CARDINALS FANS! The other strong areas include Pittsburgh, Chicago, and New England, each one home to notably loyal and passionate fanbases – though I have to admit Pittsburgh ended up higher than I might’ve guessed. I’m also guessing Meg Rowley and Patrick Dubuque are solely responsible for Wisconsin appearing twice in the top 15…and probably a little for Seattle not being as sad as the rest of the west coast.

The west coast is basically inept when it comes to nerding out on baseball, which is sad news for me. 23rd-ranked Seattle-Tacoma is the only west coast area in the first 38, and that’s when the Bay Area finally joins in at 39th. I find the very bottom of the rankings interesting – maybe even more so than the top. These areas could easily be the places where the most blue chip high school football prospects come from in any given year – 12 of the bottom 15 are from deeeeeep football country – Texas, Oklahoma, Florida, Mississippi, and Georgia. Compared to the top of the list though, the bottom is also generally much further in proximity from any MLB team.

What’s all this mean? Probably not a whole lot. But I’ve been to the bars near Fenway, and they were definitely enthusiastic about baseball in a way I don’t ever expect to witness in Anaheim. If that same enthusiasm is topped by the nerdiness engulfing the area between St. Louis and Chicago, I actually look forward to visiting the Midwest – something I’ve never felt before in my life. At the very least, I’m guessing it beats trying to talk about run differential in El Paso.

Building a Farm of the Best Unranked Prospects – Part 1: Pitchers

A lot is made about the strength of some farm systems, and likewise the weakness of some others. There’s plenty of reasons why stockpiling noteworthy, ranked, and on-the-radar prospects can be advantageous:

  • Longevity and cost control at the positions held by your prospects
  • You can easily fill the voids of outgoing free agents
  • Prospects are the preferred currency of low-budget teams who can land big names on the trade market but not the free agent market
  • A healthy farm is often seen as a proxy for the future health of the organization

With all that said, how good of a farm could you possibly have if you don’t have any ranked prospects? If your farm is comprised of nothing but under-the-radar guys, the benefits bulleted above pretty much all go away. The best you could really hope for is that you saw something that everyone else either missed or undervalued. But I’m guessing there’s still something in the public data that could help us find a few hidden gems. So I’m going to build a farm of unranked prospects and see it becomes anything some time down the line.

I’m going start with pitchers, TINSTAAPP be damned. Looking at THE BOARD over at Fangraphs, I noticed that there’s only 31 pitchers in the top 100 (compared to 49 at MLB.com as of 11/25/18), so it’s not exactly uncommon for a team to find themselves without a ranked pitching prospect. Since my goal is to build the best possible farm system that appears to be worse than basically every other MLB team – but (hopefully) only on the surface – I have to set some baseline criteria to establish who’s eligible for my Island of Misfit Prospects, and do so in a way that ensures my farm looks really bad.

Eligibility Requirements & Composition of Farm

If this is done right, my fantasy farm will be at or near the bottom in each of the charts and tables above, and the eligibility requirements can assist with that. So here are the rules for a pitcher to be eligible for my farm:

  1. The pitcher cannot be ranked any higher than 220 overall on THE BOARD at Fangraphs.
    1. This is due in part to how common it is for a team to lack a pitcher in the top-100, which is how we generally understand “unranked” prospects.
    1. This increases the likelihood that no pitcher in our farm will be included on the most recent iteration of the major top-100 prospect lists (Baseball America, Keith Law, MLB.com, etc.)
    1. Finally, the Brewers are the only team without a pitcher in the top 220. I’d have made the cutoff lower than the Brewers’ highest ranking pitcher, but unfortunately Caden Lemons is ranked 807th on THE BOARD, and that would have been extremely limiting.
  2. The pitcher cannot have any MLB experience.
  3. The pitcher must have thrown at least 10 IP in 2018.
  4. The pitcher must be a part of a MLB organization.
  5. The pitcher must have already reached the MiLB assignment level
    1. This means that a pitcher on my triple-A team must have reached triple-A at some point in his career.
    1. So I’m not necessarily advancing anyone to a higher level, but we’re going to say my fantasy farm is being created retroactive to the start of the 2018 season.

The farm should reflect the experience at each level realistically, just so I’m not throwing a bunch of unranked pitchers who threw well in low-A ball onto my triple-A team. I’m also not really interested in inexperienced pitchers who didn’t get any further than short season A-ball either, so I’m only going as deep as high-A in my fantasy farm. That gives me three teams to make up my farm; A+, AA, and AAA. Let’s also say I’m building the 2018 version of my farm, just so I don’t have to worry about 2019 assignments or promotions; the goal is to see how everyone pans out moving forward. So here’s how each team’s pitching will be comprised:

  • 8 relief pitchers; over half of minor league appearances must have been made in relief
  • 5 starting pitchers; must have started at least 50% of minor league appearances

So without further delay, let’s move on to the players I’ve selected…




Zac Lowther


In retrospect, it seems that I maybe should’ve been more restrictive on the eligibility of Fangraphs BOARD players by either setting a rank cutoff at something worse than 220, or by capping the amount of players who appear on the BOARD at a maximum that better reflects a bottom-of-the-barrel collection of minor league pitching. I say this not because my Fantasy Farm is “good”, but because it didn’t turn out remarkably “bad” in terms of perception. It is completely devoid of stud prospects, and also appears to be pretty below average when the analysis gets deeper, it isn’t exactly a bottom feeder. Check out the following charts:

Notice how the red column (representing my Fantasy Farm) moves a little more to the left as the charts descend – that’s by design. It illustrates how my collection of pitchers may appear a little less hopeless depending on how we want to evaluate farms.

Nevertheless, by common standards, I’ve put together an unremarkable, below-average farm system with no big-time prospects as of the end of the 2018 season. Other than the eligibility rules I set up, the biggest factor in my selection was a data model of MLB performance based on minor league performance. The end-result is the collection of no-name prospects you saw in the tables above, who are generally young for their league, and possess either good K%-BB% or GB% rates, if not both.

Evaluating Outcome & Final Thoughts

In terms of how I’ll measure my success, I probably put my selections at a slight disadvantage by prohibiting any pitchers with MLB experience and assigning them only to levels they’ve already pitched at. Those two rules make it so I have a collection of guys who are, in all likelihood, set to repeat triple-A in 2019. I don’t have data to cite, but I’d imagine non/unranked prospects who repeat triple-A are considerably less likely to make it to the Big Leagues than a guy who ended the prior year in double-A and is assigned to triple-A at the start of the following year. Additionally, it’s hard to find a diamond in the rough who’s managed to reach the highest levels of the minors without gaining any traction as a prospect, so I’d imagine the true talent level of my Fantasy Farm is much lighter at the top than at the bottom.

That being said, I still think my farm will outperform any expectations set forth if my farm actually existed. I don’t have anything set in stone as far as how I’ll determine the success of my farm, but I have a few ideas, and I’ll probably use at least a couple of them (save for those that I ultimately conclude are relatively useless). Keeping in mind that I’m referring to future performance/debuts/numbers, here’s what I’m thinking…

  • Measure success by MLB debuts and/or performance
    • 160 pitchers made their MLB debut in 2018 (though not all were full-time pitchers, i.e. Willians Astudillo)
    • The average number of debuts by a team was 5.3
    • This number is skewed toward bad teams and doesn’t necessarily represent talent on the farm
    • Only about 20% of debuts came from playoff teams, which make up a third of MLB (33.3%)
    • These pitchers went 265-276 with a 4.55 ERA and 1.39 WHIP, striking out 4248 and walking 1978 in 4766.1 IP, so the bar probably won’t be set too high
  • Measure success by comparing performance across all levels to that of top-ranked farms (compile aggregated figures by org and compare)
  • Measure success by aggregating numbers of all pitching prospects ranked better than 239 on the 2018 update of the Fangraphs BOARD
    • According to the version of the Fangraphs BOARD in reference, the highest ranking member of my Fantasy Farm is Jose Suarez of the Angels at 239
    • By compiling figures of all pitchers ranked above 239, I’m looking exclusively at pitchers regarded more highly than my top-ranked pitcher
    • 98 pitchers are ranked in front of Jose Suarez, so there’d likely be a large body of work to compare
    • Since both my fantasy farm and the top 98 pitchers are likely to put up numbers at all levels, the data can be collected and measured both as a whole, and/or by level

Consider this particular exercise just a part an ongoing analysis. Regardless of how the results look at the end of 2019, it still won’t be finished. The conclusion of all this probably won’t come for a few seasons, but I’ll keep checking in on the results periodically and reporting them. Sure, in the end I’d like to be able to say my farm did better with a bunch of no-names (at the time) than the top-ranked farms, but I might crash and burn too. All we can do is watch it unfold.

MLB Starting Rotations: Using Data to Define an Ace (and a 2 and a 3 and a 4…)

Since I really want to use the blogosphere to solve as many of baseball’s infinite puzzles that I possibly can (within the constraints of life), it probably seems like I’m not being very ambitious with this post – at least if you’re judging by the title. I get it…there’s even a definition of “Ace” provided by Major League Baseball at MLB.com. That’s about as official as it gets, so consider this a closed case, right? Well, you can probably assume from the inclusion of hundreds of words below this paragraph that my answer is no. There’s really not much existing literature that delineates the parameters of an “ace” (or any other spot in the rotation) in an objective, data-driven manner. The great Jeff Sullivan was on the cusp with this Fangraphs post, but ultimately conducted an opinion poll in which readers were asked if they considered the top SP in each team’s rotation an “ace”. Both Jeff’s methodology and his conclusions underscore additional benefits of establishing objective, context-neutral parameters:

(NOTE: This isn’t a criticism of Jeff Sullivan or his post…he’s probably my favorite baseball writer by a wide margin, and his objectives with said post were not the same as my objectives with this post):

  1. Posted prior to the 2016 season, the content is contemporaneously relevant, and 71% of respondents considered Sonny Gray an ace. With statistically-rigid definitions of what an ace is, we could compare Sonny Gray’s performance at that point instead of laughing at the mere thought of being asked “Is Sonny Gray an ace?”. At this juncture I’d imagine Gray’s perception is that of a fringe starter who fills in when someone goes down. But is that what he really is? I don’t know, we haven’t established what makes a fringe starter either. With context-neutral definitions of each rotation spot, we can eliminate the contemporaneous relevance and easily make comparisons across seasons or even eras.
  2. Jeff concluded there were about 20 starting pitchers in Major League Baseball that most people would agree were aces, which makes us 10 shy of what we’d expect given the MLB definition of “ace” (the top starting pitcher on a team). While small year-to-year variances are to be expected, we should consistently find about 30 pitchers to fall within the parameters of acehood. So really, Jeff’s poll found there was a perception that 20 aces were active at the time – I contend that there were actually around 30, and roughly a third of them weren’t all that obvious. We want to eliminate the perception aspect with definitive criteria that undeniably establishes acehood.
  3. It turns out that the perception of an ace wasn’t completely performance-based (shocker!): pitchers from more talented rotations were penalized for being teammates with other good starting pitchers. Stephen Strasburg outperformed many of the pitchers who scored higher than him, yet only 57% of respondents considered him an ace – largely due to being in the same rotation as Max Scherzer (and probably injuries). While some may consider it fundamentally incorrect to label multiple pitchers from the same rotation “aces”, it’s going to be harder to convince me that a league-average pitcher who leads a rotation where he’s followed by 4 below-average teammates is more worthy of the ace label. Objectively speaking, an ace is unconditionally an ace based on performance (not on that of his teammates). The ace parameters will rid us of the perception penalty incurred by aces who are teammates with aces, and likewise the perception benefit bestowed on non-aces who overshadow their relatively inferior rotation mates.

Before we go any further, I want to make it clear that I’m writing under the assumption that an “ace” and a “#1” are synonymous. On a recent episode of Effectively Wild, Ben, Jeff, and Meg Rowley all bantered about how we define an ace, and even briefly attempted to distinguish the differences between an ace and a #1; not that they’re mutually exclusive, but it sounded more like the beginning of an LSAT logic game where ‘all aces are #1s, but not all #1s are aces…’ from what I gathered. I don’t want to strictly adhere to the MLB.com definition, but for the sake of this post, we’re going to at least continue under the assumption that aces and #1s meet the same defining criteria as each other.

Perhaps counterintuitively, the task of defining each role within a rotation is even more important given the lightening workloads of starting pitchers, and, inversely, the increasing workloads of relievers. The paradigm shifts with caution, and no team is should have a perennial Cy Young candidate throw anything less than the greatest quantity of innings he can possibly throw without sacrificing performance or health.

With the advent of the Opener, what truly constitutes a “Starting Pitcher” is becoming increasingly vague. It wouldn’t be much of a surprise to see some of the more traditional roles played by back-of-the-rotation starting pitchers to completely disappear in the pretty near future. But it should be a little more than obvious that this evolutionary process isn’t necessary for all SPs, right? Perhaps the most likely progression begins with the teams under tighter budget constraints, relatively deeper relief corps than starting corps, and the ones just a little more forward thinking. We saw the Rays unveil the strategy out of necessity, soon followed by the injury-stricken Athletics. But what was spawned initially out of necessity for the early adopters should presumably expand to teams doing it out of practicality.

But in the wake of all this, one puzzle we’re left to figure out revolves around the pitchers to cut from their traditional role – who should be sacrificed to this developing experiment?

I’m not going to try and answer that in THIS post, because we need to solve another puzzle as a prerequisite – the definition of each spot in the rotation. On one hand, it couldn’t be simpler; each spot is based on the order of talent within a given pool of starting pitchers, beginning with the most talented at the top. On the other hand, it’s a complex and generally subjective matter, albeit unnecessarily; a lot of credible baseball people might require seemingly arbitrary attributes, like a minimum fastball velocity for an ace, or more strikeouts than innings for anyone in the one or two spot. I’m not saying these ideas are necessarily incorrect either, but my goal is to wash away the ambiguity. Defining the performance expectations of each spot in the rotation can be done objectively by analyzing some key metrics and keeping the parameters simple.

First we’ll define the parameters. We know MLB’s definition of an “ace” is the best starting pitcher on a given team. We also concede that not every team has an ace because talent isn’t equally distributed. So how we divide the pitcher roles will be across teams rather than within them; this means “aces” will be the top 30 starting pitchers in MLB, not the single best starting pitcher from each of the 30 teams (which is how we’d determine acehood using MLB’s definition).

As easy as it is to envision the stereotypical grumpy baseball traditionalist reciting how only a few pitchers handled the majority of innings decades ago, 5-man rotations outnumbered all other combinations for the first time in 1926 (believe it or not, the 6 man rotation was actually more common than the 3-man rotation at that point). So we can call a rotation a pool of five starting pitchers without much controversy. However, given how improbable it is to expect the same 5 pitchers to make all their scheduled starts in a given year, every team generally has a 6th pitcher who can start (either in theory or an actual place on the 25-man roster) whenever someone from the top 5 can’t. As a role every team has been forced to utilize, and the means by which many SPs crack their first rotation, the 6th spot is by no means trivial. So, while we’ll call a rotation a set of 5 SPs, we’re also saying they’re the top 5 from a pool of 6 pitchers. This establishes 6 tiers that, under optimal conditions, would be represented by sextiles (that’s what you call 6 equally-sized groups) of talent, where the first sextile holds the top 16.7% of talent, which descends with each tier.

Unfortunately, since true talent can’t really be quantified, we’ll have to proxy talent with performance metrics. Here I’m going to use ERA-, FIP-, and xFIP-. This lets us compare the metrics equally across different seasons, leagues, and parks, creating a context-neutral benchmark for comparison. I assume anyone who finds themselves on this blog is familiar with these three metrics and why they’re more useful than their slightly-more-traditional-non-minus counterparts. But if not, I highly recommend checking out their entries in the Fangraphs Glossary (you’ll learn a ton in like 5 minutes).

(NOTE: If you REALLY don’t feel like leaving the page, the key here is the number 100; 100 is average. An ERA-/FIP-/xFIP- under 100 is better than average, and anything above 100 is worse than average, with the absolute difference representing the percent better or worse than average. For example, a FIP- of 75 is 25% better (less) than league average: 75 – 100 = -25%. For normalized stats that end in “-“, any measure below 100 is good, while the opposite holds true for normalized metrics ending with a “+”, such as wRC+.)

Instead of using these metrics individually for our approximation of talent, I’m going to use the average. ERA often comes under fire because it’s a relatively poor predictor of future performance due to the amount of luck associated with its inputs – which is well warranted given that both FIP, xFIP, and even K%-BB% actually predict future ERA better than past ERA. But I’m including ERA here because I don’t see any reason to omit past success as a component that defines an ace, or any other tier of a rotation, lucky or unlucky. However, since we’re attempting to approximate talent to define each tier, it’s important we limit the magnitude of ERA since much of its variance is fielding-dependent. We do this by including the other two metrics, FIP(-) and xFIP(-), both which are obviously fielding-independent, and rely exclusively on the pitcher. Furthermore, while each metric is results-based, the most forward-looking of them is xFIP, which is a better predictor than both FIP and ERA are of their future selves. So while xFIP might be the worst descriptor of what actually happened, it’s easily the best indicator of what will eventually happen. This is important is because it makes future expectations a part of the equation.

Additionally, while it won’t be perfect given the incomparable year-to-year variance of each respective metric, the average also gives us an idea of the rough cutoff for each metric individually. So once we establish our cutoffs, we could say, “player X had an ERA- of 99 but an xFIP of -75. So he pitched like a #3 starter, but I expect him to pitch like an ace moving forward”.

So our talent proxy is simply the average of ERA-, FIP-, and xFIP-, which I’ll call MEAN-. Once we establish the cutoff for each sextile, our tiers will be defined. Using data from 2002 through 2018, I looked at every pitcher who threw at least 100 IP as a starter, calculated both their MEAN- and their respective MEAN- percentile rank, and here’s what we have:

While splitting our data into sextiles gives us the mathematical explanation as to why this happened, at first glance it might seem odd to see Tier 4 begin with the league average MEAN-…because league average should be a #3, shouldn’t it? Actually it shouldn’t. There’s a reason top pitching prospects are often given labels that imply something as seemingly underwhelming as a “3rd starter” – it’s because 3rd starters are (barely) above average pitchers. Sure, they’re seen as the midpoint in the rotation, but they’re only the midpoint when the best 5 options make all their scheduled starts, themselves included. At some point, every team utilizes their 6th option, with few exceptions. In 2018, the Indians and Rockies used the fewest starting pitchers with 7, while the average big league team utilized 12. Starting pitchers whose innings total ranked 6th or lower on their respective teams accounted for 18.8% of starting pitcher innings – only the top ranked starting pitcher (and presumable ace) accounted for more with 21%. This helps explain why the 4th Tier is where league average goes, and not the 3rd Tier.

The table above shows some average performance metrics of the starting pitchers within each tier dating back to 2002. Everything descends or ascends in the order you’d expect it to, but one interesting thing about the table is the WAR column. Tiers 2 through 6 are separated pretty evenly, ranging anywhere between a 0.6 and 0.8 WAR differential with the adjacent tier. The exception is Tier 1 (our Ace Tier), which is a full 1.5 WAR ahead of Tier 2. We can see this more clearly in the table of average WAR by tier; the linearity holds steady for the most part in tiers 2 through 6, only to slope sharper from 1 to 2. So even while we’ll find roughly the same number of pitchers within each tier on an annual basis, upgrading from a Tier 3 pitcher to a Tier 2 pitcher won’t yield the same improvement you’d see from upgrading a Tier 2 to a Tier 1. The roughly equal tier-by-tier difference in WAR from the bottom 5 tiers suggests we get essentially flat marginal returns from any single-tier upgrade unless we’re adding a Tier 1 guy (an ace!).


That may have been tough to follow, but let me put it another way. Let’s say you’re a GM headed into the offseason with the goal of upgrading your rotation via trade. For the sake of this hypothetical, you’re only able to offer one trade package comprised of a starting pitcher from your current rotation, a prospect, and cash. In return, you’ll receive a starting pitcher that’s 1 tier better than the SP you’re trading away (the prospect and cash are irrelevant other than making the tier downgrade worthwhile for your trade partner). We’ll hold the prospect and cash fixed, so the only part of the offer you can change is the tier of the pitcher you give up, and therefore, the tier of the pitcher you receive. So here’s what you’re looking at in the trade for a new SP:

  • Assume your 5-man rotation is comprised of a starting pitcher from each of the top 5 tiers
  • You also have a Tier 6 pitcher you use as a spot starter
  • Your ace is the only pitcher you’re unable to trade
  • If you give up a Tier 6, you’ll receive a Tier 5    (~0.8 net WAR)
  • If you give up a Tier 5, you’ll receive a Tier 4    (~0.7 net WAR)
  • If you give up a Tier 4, you’ll receive a Tier 3    (~0.6 net WAR)
  • If you give up a Tier 3, you’ll receive a Tier 2    (~0.8 net WAR)
  • If you give up a Tier 2, you’ll receive a Tier 1    (~1.5 net WAR)

The right thing to do here is to give up your Tier 2 pitcher, so you end up getting a Tier 1 SP. Sure, you get two aces in the rotation now, but the reason for giving up your #2 isn’t as simple as ‘adding an ace’. The reason you gave up your Tier 2 for a Tier 1 is because it represented the only offer with a marginal upgrade compared to what was on the table. In other words, the added benefit from swapping a Tier 6 with a Tier 5 is roughly the same as the added benefit from swapping a Tier 5 for a Tier 4, a Tier 4 for a Tier 3, and a Tier 3 for a Tier 2.

Since I have a habit of overexplaining things, I’ll end with some examples of each tier using numbers from the 2018 season. For the table of 2018 Tier Examples, 5 randomly selected pitchers within each tier were chosen just so readers get a better idea of who falls in line with a given tier.


How to Identify Bounceback Candidates (Pitcher Edition)

Okay, a lot of people think ERA sucks. Sure, I don’t really disagree in the sense that it’s luck-laden and a poor predictor of future performance. It’s a shallow measure, but it still seems to get the best of those even at the highest levels; Jon Gray was left off the Rockies’ playoff roster after posting a 5.12 ERA that wasn’t really compatible with his 9.6 K/9 and 2.72 BB/9. Domingo German couldn’t stay in the Majors with his 5.57 ERA in spite of striking out nearly 11 per 9 and walking 3.5/9.

This isn’t a defense of ERA by any means – its not. This is a guide to find out who’s 2019 ERA is (probably) going to be better than their 2018 ERA, and it’s pretty simple. Fangraphs features a metric called “E-F”, which is simply a pitcher’s ERA minus FIP. This can give us some idea of how representative the pitcher’s ERA actually is – grossly oversimplified, it gives us a measure of luck. The following facts have been fairly well-documented, but just for a refresh, I want to reiterate the following:

  • ERA is a relatively poor predictor of future ERA
  • FIP is a better predictor of future ERA but still not great
  • xFIP is a better predictor of future ERA and future FIP than both ERA and FIP

Results-based analysis is tricky business, but not totally unreliable when done correctly. ERA is far from the ideal indicator of a pitcher’s ability, which has been addressed through FIP, which also includes a lot of noise that’s washed away in xFIP. Things that show little or no year-to-year correlation, such as HR/FB% or BABIP, are controlled for by applying constants in the calculation of xFIP, which is why it’s probably the best metric we use to evaluate how good a pitcher’s been, at least in the same context of ERA. Unfortunately, fans, fantasy leagues, and the general consumption of baseball continue to emphasize ERA in spite of it’s obvious shortcomings, probably due to a fear of adaptation. So even though it would be more practical and easier to predict future xFIP, we’re going to predict future ERA with xFIP, since it’s still the best we’ve got.

Let’s check out the correlation matrix of ERA predictors I put together. This uses all big-league pitchers from 2010-2017 with at least 30 IP in a given half-season who also threw at least 30 IP in the subsequent half-season. I did notice that the within-period correlations aren’t identical in both time periods (ERA’s respective correlation to FIP and xFIP is .67 and .49 in t=0, but .70 and .55 in t+1…this still occurs even when ERA-/FIP-/xFIP- are used instead, so I’m theorizing that it’s just a matter of a pitcher gaining consistency with an additional year of experience, but that’s another post for another day.) We can see that each of the bullet points above are reflected in the matrix, and that xFIP does a much better job of predicting the future than any other metric. So what am I trying to prove here? That xFIP is a super useful metric that isn’t used enough for predictive analysis! And unlike ERA, xFIP is a superb predictor of itself, which is why I highlighted that particular part of the matrix, and added the chart on xFIP predictability. Worth noting is that the full-season correlation between ERA and xFIP is a much better-looking 0.64, compared to the half-season correlations shown in the matrix, so being able to predict xFIP from one period to the next is pretty valuable.


So now that I’ve emphasized the value of xFIP versus the other metrics as predictors with some visual overkill, I’m going to rework the Fangraphs’ metric I mentioned earlier: instead of E-F (ERA-FIP), we’ll be using E-X (ERA-xFIP).

Let’s set up some definitions that will apply to the remainder of this post:

  1. Overachiever – A pitcher who’s xFIP exceeds his ERA. In this case the E-X is negative.

    2018 Example: Wade Miley; 2.57 ERA/ 4.3 xFIP/ -1.73 E-X with MIL

  2. Underachiever – A pitcher who’s xFIP is less than his ERA. In this case the E-X is positive.

    2018 Example: Marcus Stroman; 5.54 ERA/ 3.84 xFIP/ 1.7 E-X with TOR

The intuition here is simple enough – overachievers are due for positive regression (remember that “positive” is bad when it comes to ERA/FIP/xFIP) and underachievers are due for negative regression. In other words, pitchers with a negative E-X should see their ERAs increase, while pitchers with a positive E-X should see their ERAs decrease. I said “should”, but I really mean “do”, because the effect is quite robust when we use aggregated data. The first chart looking at ERA changes from 2017 to 2018 suggests that, while E-X is a good indicator of the direction a pitcher’s ERA is headed, underachievers appear to be more predictable than overachievers – at least using non-normalized metrics.


Now since ERA is known to fluctuate over time and we need normalized metrics to compare across eras, I wanted to see how predictability changes (if it does at all) when we use ERA- and xFIP- instead of standard ERA and xFIP. Here, the effect is consistent across both groups (both overachievers and underachievers). Take a look at the chart below:

ERA & xFIP (ERA- & xFIP-)

This tells us that roughly 73% of overachieving pitchers in 2017 saw a rise in their 2018 ERA, while almost an identical portion of 2017 underachievers (72%) saw a decline in their 2018 ERA. That means, with respect to this sample, nearly three-quarters of the time we accurately predicted the direction of future ERA by subtracting xFIP- from ERA-. This is pretty powerful, but it’s limited in the sense that we’re looking at a binary prediction – its yes or no; while we can reasonably expect the ERA to increase or decrease, we don’t know by how much. And we all know to be skeptical when sample sizes are small; just 169 pitchers threw at least 40 IP in both 2018 and 2017, so let’s see what happens when we have a sample 8.5 larger than what’s reflected in the 2017/2018 chart…

ERA & xFIP (ERA- & xFIP-)

And there you go; 71% of overachievers saw their ERA go up in the subsequent half-season, and 72% of underachievers saw their ERA go down – basically unchanged from the previous chart. Here, time is grouped into half seasons rather than full seasons, which gives us an even greater sample to look at. So E-X is legit when it comes to predicting improvement or decline, but why not build on that if we can? If we’re trying to identify bounceback candidates, wouldn’t it be nice if we could know exactly how likely it is that a pitcher’s ERA will be lower next season (or next half-season) than it was in the most recent one?

Obviously the answer is ‘yes’, so I modeled the probability of ERA improvement using E-X as the singular dependent variable and ran a logistic regression on the binary outcome of whether or not ERA improved in the in half-season t+1. The summary statistics are shown below, as well as how to calculate the probability.


Calculating the probability estimate of this model isn’t like a typical linear regression, so if you wanted to apply it to a particular pitcher on your own, here’s how it works:

So rather than going through too much more math, lets move on to what the model tells us by using the probability of ERA Improvement chart:

This shows us the estimated probability of a given pitcher improving his ERA in the next time period (in this case, half of a season), based on the E-X in the most recent period. While the model is built off half-season samples, we can reasonably apply it different time groups that occur consecutively, like a full season (we don’t want to stray too far from the half-season though, because we’d fail to account for a lot player-specific changes that might occur in the two time periods. For example, we wouldn’t want t=0 to be the last 5 years, where we’re trying to predict improvement in the next 5 years, because a lot of changes could occur with the pitcher we’re looking at; his mix might change, his velocity almost certainly will, perhaps Tommy John surgery, etc.) So, at an E-X of 0, we see the probability of improving ERA is 50%, which is right where we’d expect it to be (actually it’s 49.8% if we take it out to the thousandths place…the absolute probability difference in an E-X of 0 and -10 is actually almost the same as the difference between 0 and +10, but I kept the probability estimates to two decimal places for the sake of simplicity). The greater the E-X in the most recent (half) season, the more likely it is the pitcher’s ERA will drop in the next (half) season; even though only 18% of pitchers post E-Xs of at least 20, it’s certainly worth noting their probability of improvement is better than three-quarters. Even more rare is an E-X of 40 or greater, which occurs just 4% of the time, but is practically a guarantee of improvement at 91%.

So just for fun, let’s apply the model to a pitcher using his 2018 E-X, and determine the probability that his ERA will improve. One guy a lot of people might be curious about is Sonny Gray; are greener pastures ahead for Sonny in 2019? Or was all that chaos in New York City the catalyst to an irreversible downward trend? Well…let’s find out!

2018 Sonny Gray – NYY

ERA: 4.90    xFIP: 4.10

ERA-: 113    xFIP-: 97

E-X = 113-97 = 16    Now we’ll apply the model…

1/(1+e^-[-0.06+{0.059*16}]) = 0.718

Estimated probability of improvement is 71.8%! So Sonny Gray’s got a pretty good shot at being a better pitcher in 2019 than he was in 2018.

Let’s do another…how about NL Cy Young Award winner Jacob DeGrom? DeGrom had an absolutely insane year that a bunch of morons tried discrediting at various stages, but most of the people reading this are probably aware of how special it actually was. So how likely is it that DeGrom could be even better next year?

2018 Jacob DeGrom – NYM

ERA: 1.70    xFIP: 2.60

ERA-: 45    xFIP-: 64

    E-X = 45-64 = -19

1/(1+e^-[-0.06+{0.059*-19}]) = 0.245

So the model gives DeGrom a 24.5% shot at improving his ERA in 2019, which isn’t that bad considering there’s not much room for improvement when your ERA is 1.7…the closer you get to 0, the more improbable improvement becomes!

Instead of continuing with random case-by-case examples, I added a few names to the probability chart to go along with Sonny Gray and Jacob DeGrom. I also built a table of 25 semi-randomly selected pitchers alongside their 2018 numbers and their respective 2019 ERA improvement probabilities. One thing that’s fairly clear, though also quite intuitive, is that it’s difficult to improve upon good performances; DeGrom, Max Scherzer, and Justin Verlander are unlikely to be better in 2019 than they were in 2018, largely because they were just so good. Applying that same intuition to the other end of the spectrum, it’s pretty easy to improve on bad performances – Clayton Richard is almost certainly going to be better in 2019 because he set the bar so low. Those are the predictable cases – the ones in which the probability model does nothing but reaffirm what we’d basically known. Among those shown in the table, the more interesting cases are those of Josh Hader and Carlos Carrasco, both of whom enjoyed incredible 2018 seasons, and are actually more likely than not to improve in 2019. There’s also a few names not shown in the table who are in the same boat as Hader and Carrasco, such as Patrick Corbin, Dellin Betances, Ross Stripling, and Edwin Diaz – all of them are likely to improve in 2019 after being phenomenal in 2018.

#aaron-sanchez, #alex-cobb, #anibal-sanchez, #carlos-carrasco, #chris-sale, #clayton-richard, #dallas-keuchel, #dellin-betances, #domingo-german, #e-f, #e-x, #edwin-diaz, #edwin-jackson, #era, #jacob-degrom, #jake-odorizzi, #jakob-junis, #joe-musgrove, #jon-gray, #jose-quintana, #jose-urena, #josh-hader, #justin-verlander, #kenta-maeda, #kyle-freeland, #madison-bumgarner, #marcus-stroman, #matt-harvey, #max-scherzer, #michael-fulmer, #mike-leake, #patrick-corbin, #pitching, #pitching-projections, #rich-hill, #robbie-erlin, #ross-stripling, #sonny-gray, #tyler-anderson, #tyler-mahle, #wade-miley, #xfip