Extrapolating the Pythagorean Expectation Theorem to Spring Training
For those desperate for the return of baseball, alas the world’s best players will congregate in the Cactus and Grapefruit leagues in the first full Spring Training season in over three years. While stars, established players, and starters will often take this time to get their feels back and loosen up their bodies for the season, hungry minor-leaguers will pounce at this opportunity to showcase their skills.
While most fans take the former approach as well, it’s often tempting not to at least attribute some correlation to Spring Training and Regular Season (if not postseason) results. So surely the success in Spring training has some relationship to success in the regular season…or perhaps even in the playoffs? Recent circumstantial data may help that case: Over the past three years, the 2018 winners, the Red Sox, placed first in the Grapefruit League, the Dodgers, who won in 2020, won the Cactus League the year they won the chip, and the reigning champions, the Astros, were 2nd in Spring training last year. At a glance, successful teams seem to do well in that year’s Spring Training.
Convincing as it is, I’m here to give you some cold hard advice: Whatever you do, refrain from correlating your team’s preseason success to the teams’ future results—at least for the time being.
My contention is going to be based on the Pythagorean Expectation model, which I wrote about last year. Created by Bill James, the Pythagorean expectation model relates winning percentage based on runs scored and runs allowed.
In fact, the model is often a better predictor of a team’s future success than win percentage. Author and data scientist Wayne L. Winston tested to see how well the Pythagorean Theorem would forecast. Between by comparing it to a prediction based just on games won. In every playoff series between 1980 and 2007, he found that while the winning percentage predicts the winner of a series 50% of the time, the Pythagorean Theorem predicts 54% of the series correctly.
So let’s apply this to Spring Training: does in any way, Spring Training success translate to regular season success?
First, I gathered the 2022 regular season data and 2023 regular season data and obtained a Pythagorean expectation winning percentage, the expected winning percentage, for every team in both seasons. Before using the Pythagorean expectation values of the spring training and extrapolating that to the regular season, I wanted to see in general, how both the 2022 regular and spring training would work if we combined their actual winning percentage by expected winning percentage (obtained by the Pythagorean model).
As we can see, it’s not the most promising. From visual observation, we see that the Spring Training seemingly has more variance as well as a greater number of outliers. When combined, we see an r-squared value of 0.8387, meaning that there is evidence to show that, in the 2022 spring and regular season, the Pythagorean expectation can explain the winning percentage. Now to see whether the Pythagorean expectation of the Spring Training can explain the regular season winning percentage.
And….it performs quite poorly, to say the least, with an r-squared value of -0.081, we can confidently say that there is no statistical evidence to suggest that a team’s Pythagorean expectation of the Spring Training season relates to the win percentage of the regular season.
Going forward, I would want to see if the win percentage of the spring training season would yield some relationship to the regular season win percentage. I would also be interested to see if data of the spring training standings have some correlation to the regular season standings.