The Z Files: Adjusting 2021 Projections for the Regional Schedule

The Z Files: Adjusting 2021 Projections for the Regional Schedule

This article is part of our The Z Files series.

Last time out, I reviewed several things I'm thinking about as I set to embark on 2021 MLB projections. One of those things really has the wheels spinning, so let's dig a little deeper. Specifically, should 2020 numbers be adjusted for quality of competition?

In a normal 162-game season, teams play 76 games (47 percent) against divisional opponents, so the stats are somewhat influenced by the strength of their divisional foes, but there's ample play against other teams to dilute the bias and not have to worry about it. However, in 2020, 40 games (67 percent) were played within the division with the other 20 (33 percent) contested in the cross-league geographical zone. Each club faced only nine others. In a standard campaign, everyone plays 20 other squads.

Intuitively, quality of opposition must have played a part in the past season's performances. It may be that the quality is the same across the three regions, so it washes out. What if it is not, though? A batter's 110 wRC+ may not equate to the same level of hitter from one of the other divisional pairings. Pitchers with a 24 percent strikeout rate may not exhibit the same level of dominance if they pitched on a team in one of the other groupings.

Even if there is a difference, adjustments will be more empirical than data driven. Let's dig into some numbers to see what we're dealing with and perhaps add a little objectivity to a mostly subjective dilemma.

Surface

Last time out, I reviewed several things I'm thinking about as I set to embark on 2021 MLB projections. One of those things really has the wheels spinning, so let's dig a little deeper. Specifically, should 2020 numbers be adjusted for quality of competition?

In a normal 162-game season, teams play 76 games (47 percent) against divisional opponents, so the stats are somewhat influenced by the strength of their divisional foes, but there's ample play against other teams to dilute the bias and not have to worry about it. However, in 2020, 40 games (67 percent) were played within the division with the other 20 (33 percent) contested in the cross-league geographical zone. Each club faced only nine others. In a standard campaign, everyone plays 20 other squads.

Intuitively, quality of opposition must have played a part in the past season's performances. It may be that the quality is the same across the three regions, so it washes out. What if it is not, though? A batter's 110 wRC+ may not equate to the same level of hitter from one of the other divisional pairings. Pitchers with a 24 percent strikeout rate may not exhibit the same level of dominance if they pitched on a team in one of the other groupings.

Even if there is a difference, adjustments will be more empirical than data driven. Let's dig into some numbers to see what we're dealing with and perhaps add a little objectivity to a mostly subjective dilemma.

Surface stats like ERA and WHIP can be misleading on an individual basis, but when applied en masse, their relatability comes in handy. Here's a breakdown of those ratios from the 2020 campaign, beginning with the three geographical zones.

 

ERA

WHIP

East

4.72

1.41

Central

4.12

1.26

West

4.52

1.31

Sure enough, the three divisional pairings suggest the quality of players within each is different. That said, to say the Central has the best pitching while the worst is in the East isn't necessarily accurate. Think of it this way. Is a 4.00 ERA from a Triple-A pitcher akin to that of an MLB hurler? Of course not. This is not to say the difference in the MLB combined divisions is like that between Triple-A and the majors, but the example holds true.

The above ratios reflect the difference in prowess between the hitting and pitching of the respective zones. It may be the West or East pitching is collectively better than the Central, or that the hitting in the Central is significantly worse than the bookending coasts, or a mix of both.

Here's a look at the data per division:

 

ERA

WHIP

AL East

4.53

1.37

NL East

4.92

1.45

AL Central

4.10

1.27

NL Central

4.14

1.25

AL West

4.65

1.33

NL West

4.39

1.30

Again, there isn't any cogent analysis applicable to projections from this data. The comparisons are all relative.

Some are drawing conclusions from the wild card playoff results, specifically how many teams from each division advanced to the next round.

 

Wild Card Round

Divisional Round

AL East

3

2

NL East

2

2

AL Central

3

0

NL Central

4

0

AL West

2

2

NL West

2

2

The obvious deduction is that despite the two lowest ERAs of the six divisions, the AL Central and NL Central teams are worse than those from the East and West.

If each series were a coin flip, it would be 1 in 128, or a 0.78% chance, all the Central teams would lose. I don't know what the odds were, but for the sake of math, if each Central team was a 3:1 underdog, there's a 13 percent chance all would fail to advance. There seems to some teeth to the narrative.

Looking at one year in a vacuum can be misleading since there isn't anything against which to compare. However, an issue with investigating stats prior to 2020 is there wasn't a universal designated hitter, so the comparisons have to be kept at the league level (American to American, National to National). Here's the divisional ERA data since 2017, with the standard deviations between divisions included.

Division

2020

2019

2018

2017

AL East

4.53

4.99

4.59

4.17

AL Central

4.10

5.05

4.88

4.53

AL West

4.65

4.92

4.34

4.43

NL East

4.92

4.73

4.45

4.60

NL Central

4.14

4.69

4.34

4.27

NL West

4.39

4.80

4.27

4.15

AL St. Dev.

0.29

0.07

0.27

0.19

NL St. Dev.

0.40

0.06

0.09

0.23

The larger standard deviations from 2020 support the notion there's a bigger difference this season in divisional quality, but it's far from unequivocal proof. Not to mention, it could be driven by sample size, so let's repeat the breakdown using an equivalent number of games from each season.

Data from 2017-2019 will be used. Each season will be parsed into two-month intervals (Apr/May, Jun/Jul and Aug/Sep). Instead of presenting the individual data, here is the average of the nine standard deviations for each league, compared to the 2020 mark.

 

2020

Average

High

AL

0.29

0.24

0.39

NL

0.40

0.17

0.35

The average of the standard deviations is smaller than 2020's level, suggesting the 2020 spike isn't simply due to variance. However, there was one period from the AL higher than the 2020 standard deviation, so there is still a chance the results are a sample size effect.

Admittedly, this is a simplistic approach fraught with flaws, since the assumption is each division from 2017-2019 was equal, and they almost assuredly weren't. The question is whether the difference is significant enough this season to neutralize them, since they were essentially three separate leagues.

A big factor yet to be investigated are the park effects. First, here's a synopsis of the data for runs and homers by handedness. An estimate is necessary for Globe Life Field (new venue), Marlins Stadium/Oracle Park (right field fences moved in at both venues) and Fenway Park/Citi Field/T-Mobile Park (each installed a humidor).

Division/Region

RUNS

HR LHB

HR RHB

AL East

100

102

106

NL East

98

100

100

AL Central

102

100

97

NL Central

100

101

98

AL West

96

102

100

NL West

101

93

96

East

99

101

103

Central

101

101

98

West

98

98

98

The key is the aggregate ERA in the Central were lowest, despite the games being played in the most run-friendly venues when viewed as a group. As will be discussed in a moment, ballpark neutralization is part of the projection process. The purpose of looking at this data is to determine if the parks are responsible for the discrepancies in regional ERA. They are not. In fact, the park data helps support the theory there was a difference in the quality of teams between the three geographical zones.

Let's work with the assumption the aggregate quality of the Central teams is weaker than the East and West, so an adjustment to their numbers is needed if the schedule returns to normal. Keep in mind, there's no guarantee this will be the case, but fantasy drafts will almost assuredly be approached in that manner, hence projections need to follow suit.

My method begins with generating a neutral projection. I take the actual numbers and flush out all the outside influences like park factors, luck and age. A weighted average of the neutralized stats is computed, which is the neutral projection. The actual projection then adds the context back – specifically age and parks. This offseason, neutralizing for regional quality will be part of the process. But how big a factor should it be?

One possibility is to regress each skill towards its corresponding league average. For example, here's the breakdown of strikeout rates from this past season.

Division/Region/League

K%

AL East

23.24%

NL East

22.91%

AL Central

24.11%

NL Central

25.77%

AL West

22.48%

NL West

22.23%

East

23.07%

Central

24.93%

West

22.35%

American

23.27%

National

23.60%

This is not necessarily indicative of what I'll do; it's an example of a regression calculation. Say I choose to regress 25 percent. If an AL pitcher posted a 22% strikeout rate, his neutralized mark would be (.75 x 22%) + (.25 x 23.27%) or 22.32%.

Over 1200 words and a bunch of tables and I'm still not sure what I'll do. At this point, it's fair to wonder, "Is it worth it?"

Let's use a very simplistic example: a pitcher that posted an identical 3.65 ERA over 170 innings in 2018 and 2019. If he repeated it again this season in 60 frames, his 2021 projection would be 3.65 (work with me, this isn't how ERA is projected, but I want to keep it relatable).

What if this pitcher toiled for one of the Central teams in 2020, though? His adjusted ERA would be higher. For the sake of this example, let's say the neutralized mark is 4.10. The manner I'd project the 2021 ERA is

((11 x 4.10 x 60) + (7 x 3.65 x 170) + (4 x 3.65 x 170)) / ((11 x 60) + (7 x 170) + (4 x 170)) = 3.77

In terms of projected earnings, the difference in ERA is about $1. If WHIP and strikeouts were also adjusted, we're looking at a $2-$3 difference. In the first round, this is just a spot or two. For the next couple of rounds, it's about one round difference. As you serpentine down the snake, we're talking abut several rounds.

So yeah, adjusting for region could make a difference.

Landing on the level is far from straightforward. In a way, I feel sorry I just went through all sorts of mental gymnastics for somewhat minimal payoff. That said, it serves as a great example of the issues involved with 2021 draft plans, whether you're a spreadsheet person or more of a feel-type drafter.

Welcome to my world.

Want to Read More?
Subscribe to RotoWire to see the full article.

We reserve some of our best content for our paid subscribers. Plus, if you choose to subscribe you can discuss this article with the author and the rest of the RotoWire community.

Get Instant Access To This Article Get Access To This Article
RotoWire Community
Join Our Subscriber-Only MLB Chat
Chat with our writers and other RotoWire MLB fans for all the pre-game info and in-game banter.
Join The Discussion
ABOUT THE AUTHOR
Todd Zola
Todd has been writing about fantasy baseball since 1997. He won NL Tout Wars and Mixed LABR in 2016 as well as a multi-time league winner in the National Fantasy Baseball Championship. Todd is now setting his sights even higher: The Rotowire Staff League. Lord Zola, as he's known in the industry, won the 2013 FSWA Fantasy Baseball Article of the Year award and was named the 2017 FSWA Fantasy Baseball Writer of the Year. Todd is a five-time FSWA awards finalist.
MLB FAAB Factor: More Than the NFL Draft Happening
MLB FAAB Factor: More Than the NFL Draft Happening
Mound Musings: Their Stock Is on the Rise
Mound Musings: Their Stock Is on the Rise
Los Angeles Dodgers-Washington Nationals, Expert MLB Picks for Thursday, April 25
Los Angeles Dodgers-Washington Nationals, Expert MLB Picks for Thursday, April 25
MLB Picks: PrizePicks Plays and Strategy for Thursday, April 25
MLB Picks: PrizePicks Plays and Strategy for Thursday, April 25