Does Rory McIlroy Actually Play Better on Easier Courses?
After the 2nd round of the US Open, journalist Dan Rappaport asked Rory Mcilroy, “It’s kind of a funny statistic, but I don’t think you’ve yet won a tournament where the winning score was single digits under par. Do you feel like this [LACC] setup plays into your hands where you have to go out and get it?”
Rory Mcilroy was quick to shut his question down: “If I had shot single digits at Congressional I would’ve won. If I would have shot single digits at Kiawah, I would’ve won. So I think it’s a flawed statistic.”
Now frankly, it’s actually shocking that in his near 15 career, Mcilroy has never won a tournament in single digits. It’s like hearing that up until the 2019 Masters victory, all 14 of Tiger Woods’s major victories came from a solo 1st or T-1 position after 54 holes.
McIlroy’s correction is valid, but also a bit misleading as well. Out of all his 30 wins (both PGA Tour, Majors, and DP World), 4 of the runner-ups finished in single digits—meaning if he himself had also shot single-digits, he would have won. Two of these victories were the aforementioned dominant major titles at Congressional and Kiawah, but the other two cases were when runner-ups finished at -9 at the Wells Fargo a couple of years ago and the 2016 Irish Open. If you compare Mcilroy to another world-beater like Koepka, the picture is gleaming. 3 out of Koepka’s 5 major titles finished in single digits, one of them being the notorious +1 winning score at Shinnecock. So, it seems there is credence to Rappaport’s stat…right? Is McIlroy a better player when the field plays better?
To see if Mcilroy actually performed better the easier the course was, I collected data from every tournament (European/DP World Tour, Majors, and PGA Tour) that McIlroy competed in from the beginning of 2017 to last week’s US Open. By data wrangling in R-studio, I measured the average scoring average to par by round. This would serve as my measure of "the field". Then I plotted that against Rory’s strokes-gained total data for each individual round. Strokes Gained is a measure of strokes above the field. It takes the difference between a player's score and the scoring average of the players in a given round. A SG of 5 would mean a player gained 5 strokes compared to the rest of the field. *Note that Mcilroy's missed cut rounds are omitted from the dataset. Refer to the last paragraph.
The fascinating grouped whizzing pattern is likely because 4 rounds of the same tournaments have similar characteristics of field performance and Mcilroy’s performance. While it’s an eye-catcher, it’s hard to perform a linear regression analysis to determine if there is a significant association, due to the fact that the observations are not independent—if I know that two randomly selected rounds are from the same event, it’s possible to make strong assumptions about the two. There is a clear relationship at play.
We have to adjust the dataset to allow for linear regression analysis. So, I grouped all the rounds by event and year, in order to find a specific event’s individual mean field scoring average to par and Mcilroy’s Mean SG for that given round.
A linear regression analysis produced the linear regression model:
In this case, the slope suggests that for an event, if the average field score increased by 1 stroke, we would expect McIlroy’s to average 0.104 more strokes to the field. This runs counter to Rappaport’s stat—Rappaport’s statistic would suggest that as the field scored easier (scoring average to par decreases), then we would expect McIlroy’s SG total to increase. The intercept of 1.8299 suggests that when the field scoring average is 0, we would expect Mcilroy to gain an average of 1.83 strokes on the field for that week.
But what do the tests have to say about our model? Is the relationship at play significant?
To determine if this relationship is statistically significant, our data must first satisfy these conditions:
Independence
Note that we already modified our dataset to weaken the dependence. But obviously, there still is some weak dependence; knowing that the event is the Masters can likely tell us about other of McIlroy’s Masters. However, since a lot of tournaments switch courses from year to year and conditions are generally different, with sufficient random sampling, we can attribute that dependence as relatively weak. We have satisfied the independence condition.
Linearity
Does the data show a linear trend? On the residual plot, the points scatter evenly above and below the horizontal line and there does not seem to be any curvature. A linear model seems appropriate for this data.
Constant Variance
Barring one or two outliers, the model residuals mostly exist in a band between 4 SG and -4 SG across values of predicted SG.
Normality
Residuals on the qqplot appear approximately normally distributed. There is a slight departure from the normality in the tails.
Now that the conditions are satisfied, we will run a summary on the coefficients as well as determine a 95% confidence interval on the coefficients of this model.
The p-value of the scoring_average coefficient suggests that if we were to assume that there is truly no association between the field’s performance and Mcilroy’s performance, the probability of observing the relationship of an average 0.104 strokes gained per 1 stroke increase of the field or more extreme in a sample of similar size is 2.14%. There’s significant evidence that Rory does play better as the course gets harder—not easier. In fact, using our 95% confidence interval, we can confidently state that there Rory gains a true average of between 0.015 to 0.193 strokes per event for every 1-stroke increase in the field’s scoring average. I think it’s important to note that while the finding is significant, it’s neither necessarily substantial nor extremely helpful. While 0.015 strokes is almost negligable, 0.193 strokes is a substantial increase. So we know that McIlroy plays better as the field plays worse, but we don’t know if he plays a little better or a lot better.
An analysis of Koepka shows a clearer picture. For the tournaments he played, a linear regressional model yielded a coefficient of 0.428 and a confidence interval of 0.323 and 0.5327 SG. We can state with confidence that for every 1 stroke increase in the average field score to par, we would expect Koepka to gain an average of between 0.323 strokes to 0.533 strokes on the field.
*Note that this dataset did not include European Tour/DP World Tour data.
Given this, it may not be surprising that Rory hasn’t won a tournament in single digits. But do not let mere circumstantial evidence (such as Rory not having won a tournament in single digits) lead you to believe a narrative that Rory plays worse on easier courses. In fact, the circumstantial can at times be misleading.
Look at Jon Rahm. Of his 11 victories on PGA Tour and in Majors, 3 of them have finished in single digits: the 2020 Memorial Tournament, 2020 BMW Championship, and 2021 US Open. However, when we look at his PGA Tour data since 2017, the picture is quite ambiguous.
*Note that this dataset did not include European Tour/DP World Tour data.
The coefficient is -0.03—really small—but it’s also not significant. Our relatively large p-value of 0.344 suggests that when assuming there is no true association between the field’s performance and Rahm’s strokes gained, there’s a 0.344 probability that we observe the association of -0.03. Because a p-value of 0.344 is relatively large in the world of statistics, we can’t yet state that there is a linear relationship between the field’s performance and Rahm’s performance. Despite Rahm having 3 more single digit wins than Rory (in fewer total wins, might I add), the data does not suggest that Rahm is a better relative performer than Rory when the event is harder.
Some limitations of the research include the structure of cuts in professional golf. For PGA Tour events, players that finish lower than 65th place after the first two rounds are “cut", and do not play the third and fourth rounds. Cuts are individual and each tour and tournament could have different rules to determine the “cut”. Nonetheless, weeks in which a player was cut would only include mean strokes gained based on two rounds, whereas weeks in which a player was not cut would be based on all four rounds. It could very well be that a player happened to play poorly in the first two rounds and would have played better in the third and fourth rounds, but due to the cut were not able to perform on the last two days. In other words, there is a clear incentive for a good start in golf that this data does not take into account. Nonetheless, we hope that with sufficient random sampling, it takes care of this issue.
Further research could look into the effect of the cut and how it impacts the topic of performance on easier and worse courses. Furthermore, it would interesting to examine whether better players tend to play relatively better (to their baseline) on harder courses compared to weaker players: Do the top 50 players in the Official Golf World Ranking tend to have a more positive coefficient than players ranked 50 to 100?
These are topics for another day. But to give a non-answer to Rappaport’s original question: you’re right that Rory hasn’t won a tournament in single digits but that doesn’t mean Rory is not a worse player on harder courses.
Sources:
Datagolf.com
Pgatour.com
https://www.youtube.com/watch?v=QObaplLZtLo