Wiffle Ball and the Pythagorean Expectation

Growing up, mathematics and backyard sports were like oil and water. When playing outside playing pick-up or playing wiffle ball the last thing I would think about was my overdue math homework, and when inside starting up on my two-week overdue algebra packet, the only thing I could think about was rallying the neighbors to set up a quick game of pre-dinner wiffle ball.

I’d like to think that I’ve matured since then, having now found a passion for harmoniously merging the two fields of sports and mathematics. A couple of days ago, I learned about Bill James’ “Pythagorean Expectation” formula so, I decided to apply and modify the formula to fit one of my favorite childhood sports—Wiffle Ball.

“Pythagorean Expectation” is a formula intended to estimate the number of games a baseball team “should” (or would) win, using the number of runs scored by the team and the number of runs conceded (or allowed) by the team:

 

Notice that besides both variables being squared, there’s nothing inherently “Pythagorean” about this model.

 

The power of this formula is its ability to predict future winning percentages. For example, oftentimes the predicted winning percentage, using the runs scored and runs allowed by a team in the first half of the season has a greater correlation to the 2nd half’s actual winning percentage than the 1st half’s actual winning percentage.

It’s key to note that the exponent, has been modified to 1.83 to best fit the data, and similar applications of this formula have been used in other sports.

For example, while there are multiple conclusions for many of these sports, generally, the agreed upon e-values for EPL is 1.3, NHL is just above 2, NFL is 2.37 and the NBA has the largest e-value of 13.91.

While I am admittedly not an MLW Wiffle Ball diehard, I came across the official wiffle ball league after finding myself in a two-hour binge-scroll of 6-foot-breaking pitches on Tik Tok. Though everything from the ball to the bat, innings, team size, and field size differs from baseball, like baseball, they have kept an extensive record of their statistics since pretty much the birth of the league in 2009. So, I wanted to see whether or not the Pythagorean expectation would fit in the MLW Wiffle Ball, and if so, what its appropriate e-value should be.

I first gathered the data from the league lineup website. I used data from the 2012 season all the way to the end of the 2022 regular season, recording the wins, losses, runs, and runs allowed. Though there is data from 2011 and 2012, I noticed that a lot of that data was unreliable as some players’ stats were not recorded properly. Additionally, I excluded post-season games since not every team played the same number of games.

I then transferred that data into a statistics program (it’s not entirely necessary to use a statistics program but it is more convenient). From there, I created a new variable of expected wins, which according to the Pythagorean expectation model is runs^2/(runs^2+runsallowed^2). I then plotted the actual win percentage of each team by that expected win percentage.

*Note that differ by year, so the 2021 Downtown Diamondbacks and the 2022 Downtown Diamondbacks are separate data entries, with different actual win percentages, runs, runs allowed, and expected win percentages.

 
 
 

Furthermore, using the e-value of 2 provided a r-squared value of .9243 and a root mean square error of 0.08311.

While, as predicted, this is evidence that the Pythagorean expectation correlates to the actual win percentage, I was curious if there was a better e value out there—one that minimizes that root mean square error.

I painstakingly redid the process but each time changing the e-value by a value of 0.05, and recording the respective r squared value and mean square root error. Upon just mere plug and input methods, I found 1.3 to minimize the root mean square error, but I wanted to double-check.

So using Desmos, I put all of my inputs in a table and then plotted it on a graph. I found a quadratic regression model to create an equation that related root mean square error by e value. You can see how the root mean square error by different e-values seems to take the same of a parabola.

The quadratic regression model confirmed by prediction, suggesting that a value of 1.29 would minimize the root mean square error, with an r-squared value of 0.9266 and a root mean square error of 0.08193.

Effectively, then, you could imagine that in MLW Wiffle Ball, an appropriate Pythagorean Expectation model would look like:

So next time you and your neighbors are in a slump playing wiffle ball, get your team and your calculator and start producing a predicted win percentage to see if you need to improve your play.

Sources:

https://en.wikipedia.org/wiki/Pythagorean_expectation

https://harvardsportsanalysis.org/2014/05/bringing-pythagorean-expectation-to-college-lacrosse/

https://www.leaguelineup.com/welcome.asp?url=mlwcultzfield

Previous
Previous

Extrapolating the Pythagorean Expectation Theorem to Spring Training

Next
Next

Another Journey in Statistics: Is Amanda Nunes a more Dominant Featherweight or Bantamweight Champion?