In CrossFit Games

December 08, 2008

PDF Article

Robert Novy-Marx, an Associate Professor of Finance at The University of Chicago Booth School of Business and former professional triathlete, offers an in-depth statistical analysis of the rankings and relative importance of the various events of the 2008 CrossFit Games. He offers some interesting, if not controversial, conclusions.

It is no exaggeration to say that the 2008 CrossFit Games tested who could do the fastest high-pull Fran and heavy Grace. This is somewhat surprising, as the deadlift and run events constituted roughly 40% of the average competitor’s total time. Nevertheless, these events were largely irrelevant with respect to the Games’ final outcome. Fran-plus-Grace times alone explain 94% of the variation in overall times, for both men and women, and ranking the athletes on the basis of Fran and Grace alone would have yielded outcomes very similar to the actual results.
This analysis gives some guidance on how one could construct a multi-event competition, scored using cumulative time, in which every event “matters.” It would require designing the events in a manner that roughly equates the standard deviations, not the averages, of times across events. This generally demands that strength-limited events are shorter than endurance-limited events. Events that some athletes struggle to just complete, like the relatively heavy clean and jerk in this year’s Grace, generate far more time dispersion per average duration (i.e., standard deviation relative to the mean) than events that are easy when you're not “on the clock,” like the run.



24 Comments on “A Statistical Analysis of the 2008 CrossFit Games”


wrote …

Fascinating. As a spectator and intensely interested observer I can report that the "buzz" in the crowd was about the "fairness" of the heavy Grace, with all conversations to which I was privy agreeing that the other three were without question fair and appropriate. The correlation of ONLY C2B Fran and Grace to the final results would ask for some greater variety or variation of challenge in the other events to try to explore or tease out some other measures of fitness, to further provide an opportunity to separate the top finishers, and I think settles the fairness issue as well.

Equally fascinating, and in my opinion putting to rest any controversy about the scoring differences between 2007 and 2008, is how LITTLE effect the change in scoring had on the outcome.

Black box, meet Dr. Novy-Marx! That was a great read. Thank you, sir.



wrote …

I'm no expert, but it seems like the results seem to be largely determined by the shorter events that have a large variation of time while the short run seemed pointless since all the runners would more than likely all get fairly similar times. I agree that a dominate performance should be rewarded (as Jason's performance in Grace was), but to have an undisputable result, each event should allow a noticeable and fairly equal difference in time between the first and last place in each event. This should mean an even longer run, maybe 3k.
I would be curious to see a modified Linda with predetermined weight. Maybe something like a 315# deadlift, 155# clean, 155# push press/jerk. Since crossfit is covers all time and modal domains, maybe a crossfit total should be thrown in there too. Top result could be scored as 0. All following scores could be scored as + 1 second for every (lb) difference.
I guess the biggest question is, what qualities are we looking for in a winner? Once that is defined, it might be easier to put together events with a judging system that will find the athlete that fits the most to that definition. Everyone probably has their opinion on how this thing could be run, but this was only the 2nd event and I have no doubt that the people in charge will figure it out.
Although I will not be at the next games, I am looking forward to seeing what surprises they will hold.


Aaron Shaffer wrote …

Wow, definitely one of the most well thought out articles I've read. Good work!

"Designing events in a manner that roughly equates the standard deviations, not the averages, of times across events" is naturally appealing. BUT this doesn't necessarily consider work output which is what I thought the competition is all about. I'd like to see the games measured as an athlete's total work capacity across all events -- a measurement using CrossFit's own definition of fitness. This would require bodyweight be considered as a factor. A 200lb man that runs 800m in 2:00 min is surely more fit than a 150lb man who accomplishes the same task -- at least by the definition of work capacity.


wrote …

I think it's important also to keep in mind the effect of variable judging to the standards. It was a reality that the times were influenced in an absolute sense by requirement for some to repeat reps, and in a relative sense by leniency allowing substandard reps to pass for one athlete but not another.


An interesting point. The issue of judging is one which has troubled the sport of kettlebell lifting (Girevoy Sport). I won't comment on the quality or standard of judging at the CF Games '07, but rather relate how the 'speed' of an individual judge can affect an event. The current woman's weight in the snatch for time is 16kg. At this weight, the top women can snatch for 10 minutes with 1 hand switch in the 26+ RPM rep range. However, this high speed taxes the ability of the judges to determine whether the rep is legitimate or note. They must declare the rep as legitimate before the next rep is begun, but their speed in judging the validity affect the max rep speed of the competitor. The only way to combat this high rep speed is to increase the weight, which the AKC seem to have gone/are going to do. This slows down the competitors so that the judges had time to judge the rep as legitimate without unduly affecting the overall speed of the competitor. Or so the thinking goes.


James Crichton wrote …

That was an excellent article. It is nice to seem some proper analysis of the data rather than conjecture an argument over the legitimacy of the results. A well thought-out and informative read. Thank you!


wrote …

Very powerful stuff when designing an equitable measure of fitness for future competitions. It also shows you how difficult is must have been to design a test for our sport that really did reward the best athlete. The two events that had the largest impact clearly went over that tipping point where you couldn't just smash your way through. If that is the way we want to measure our most elite then I would expect to see more of those style WODs for the '09 games.


wrote …

Fantastic article. As was briefly mentioned earlier, it seems that body weight should become a factor in the scoring process. If there is some rationale behind the decision for BW to not be a factor, I'd be interested in knowing what it is; however, it seems the only reason is that accounting for an athlete's BW complicates administration.

The most striking example was Speal's poor performance in the modified grace. The weight was roughly 20 pounds more than his BW, while the weight was roungly 50 pounds less than Khalipa's BW. I'd be interested to see an analysis of work capacity compared to BW for some of the top competitors. What's done is done, and I certainly don't mean to minimize the accomplishments of heavier athletes, but future games should not penalize athlete's for being smaller, as this one did.


wrote …

I'd like to get ahold of the raw games data with height/weight info on the athletes. Was a data set that complete ever released?


replied to comment from Paul Vandenbos

It isn't so obvious that high bodyweight conferred an advantage in the games. I don't have data on the athletes' weights, but I don't think they would correlate strongly with total times. The smallest athletes did struggle with heavy Grace, but beyond being "big enough" size probably didn't confer a huge advantage, and was a disadvantage in all the bodyweight elements of the competition.

More important, and less talked about, is the disadvantage conferred by height. In order to complete a workout, a tall athlete has to do more work than a shorter athlete with the same weight. WODs typically specify reps, not work, and work is proportional to the distance the weight has to move. To complete the same workout a six foot athlete must perform about 20% more work than an equally heavy five foot athlete. The obvious expression of this disparity is the ability of shorter athletes to "cycle faster." You aren't going to see a lot of tall athletes on the games podium.


wrote …

Indeed Robert, height is the variable I'm really interested in. Sorry to hear none of that was in the data you had access to. I'd have loved to know how those correlations panned out. Maybe next year.


wrote …

A very well written article... a fun read for the nerd in all of us.

With all due respect, I hope they continue to change the format each year, with no emphasis on finding a "best" method. Constantly varied should refer to the games as well, there is no way to best prepare other than to be prepared. We never know what life will ask of us, and I hope we never know what the games have to offer either.


wrote …

Jason Khalipa 1, Josh E. 2, and Jeremy Thiel 3 are all 5"9 or taller. All 3 weigh 185 or more.... making the short person advantage not seem like much...


wrote …

Interesting stuff indeed !
Run differences are not so big, but heavy Grace makes a big difference.
In simple words - can we say stronger is better ?


wrote …

Loved every bit of the article.

AS for everyone else, stop trying to make the games "fair," it's not going to happen. Life doesn't care how tall or light you are when throwing obstacles at you, and crossfit is suppose to prepare you for the sport of life.

Sure heavy grace was unfare for Speal. But it was also unfair for Josh Everette. If it was C&J 200# 15 reps, he would finish ahead of everyone by 5 minutes, at least. And if it was Nasty Girls, then I would beat the top 3 competitors.

If you want fair, participate in as many crossfit games as you can. Then you are more likely to do well in one of them (assuming the games keep changing and the fair-pansies don't get their hands on them).


wrote …

Statistics are very informative, but it seems to take away from the anticipation, the unknown, the "battle" etc.. What I liked most about watching the games and looking forward to watching next years is that anybody can take the competition. If the workout selection is random, as it should be, it is nearly impossible to repeat champions. I like watching extraordinary feats, like Khalipa in C&J. You could argue that Khalipa is a relative specialist for strength, because he was relatively weaker in the other events, but so what, it was still exciting and good for the overall outcome of the games, in my opinion.


replied to comment from Patrick Barber

I agree that the effects of height are second order. I don't have data, but I don't think the heights of the top games competitors look like a random draw from the population. Presumably height isn't impacted by training, so significant deviations in the height distribution of the top competitors from the general population are almost certainly due to selection issues. You see the same thing in basketball (where it is admittedly a way bigger issue)-- playing doesn't make you taller, and you don't have to be tall to be good, but good players are, on average, taller than the general population.

The top three competitors at the CrossFit games may have been of average height, but if being shorter confers no advantage we would expect the top 20 competitors to look roughly like the average population. About half would be taller than average, so roughly ten guys would stand 5'10"+. We would also expect there to be only about a 50-50 chance that there would be even one guy in the top 20 as short as 5'4" (2 standard deviation below the average height for men), and we would expect there to be as many 6'3"+ guys as there are 5'4"- guys. I don't know for sure, but I don't think that's what we see among the top competitors.


wrote …

In response to Comment #8, a possible negative effect of factoring in bodyweight would be a bias towards heavier athletes. Although it'll determine the total amount of work performed, the results may not be the best indicator of fitness. Take for example a 100m race between an elite Olympic sprinter at 160 pounds vs. a larger 250 pound athlete. If the elite sprinter runs a very quick 10 second 100 meter dash, and the larger athlete runs it in 15 seconds, the 250 pound athlete will still have produced a higher power output.


wrote …

Dude, you're hired!

Official score keeper, 2009 CF games.


wrote …

Thanks for the analysis. It was indeed an interesting read.

My assumption is that the Games are trying to determine the "fittest" athlete, much as a decathlon event does. Why not mirror the scoring system used in this Olympic event? Ranking is based on a points system, rather than position achieved, and the two-day event is split between speed, power and endurance. It seems like it would be fairly adaptable.


wrote …

Robert, I wonder if you could apply your magic to the results in the first Games. Specifically, what would happen if the rankings were determined by raw numbers rather than place in each event?

I envision an analysis that ADDS the time in seconds of the Hopper WOD and the Run, and then SUBTRACTS the CFT in pounds from the timed sum. The arithmetic result would be the score for the individual athlete. Presumably this would reward the strong to a degree roughly equally as it would reward the runner or the athlete working in the phosphatine pathway.



replied to comment from Mark Lanza

Have you ever looked at the specifics of the decathlon scoring? The results are completely dependent upon the weight assigned to each event and those weightings have not been consistent over time. When all is said and done, the 2008 scoring system was exactly what they do in the decathlon.


replied to comment from Robert Novy-Marx

Do you think it is possible to design the 2009 games so that the events, whether it is just 2 or even 10, are "balanced" by time variation (through measured experimentation using benchmark athlete performances)? How does one determine the optimal time variation? Is is 1.2 second per as in the run or 5 seconds per as in the heavy grace? It seems to me that the closer to zero you make the variation the more true to the normal curve the variation in results will be. This would argue for shorter events (fewer reps, shorter distances) but I think that will be less fun (less suffering).

Also, if you take the deadlift and extend it to 7 rounds will the results be linear. I mean to say, the weight is such that there could be a larger variance than expected and maybe only 6 rounds are needed or 6 and a half. I expect that a linear relationship exists for small variations in work done but that as you increase the total work there is likely to be a noticeable curve (more variation at the margin). I also expect that it is possible to measure that curve given enough data which brings me back to my original question. Is it possible to design the 2009 games so that the events are "balanced" by time variation?


replied to comment from Patrick Barber

I believe that 5'9" is the average height of american males, so if anything this should show us that being "average" height is an advantage over both taller and shorter competitors because they're able to be more well-rounded athletes.

Leave a comment

Comments (You may use HTML tags for style)