Empowering athletes with real-time, data-driven decision support

Big Data Evangelist, IBM

Split-second decisions are the essence of any activity in constant motion. If it's a rigorously timed athletic competition, you can't stop to reconsider your strategy. If there are any analytics to guide you toward your goal, they need to be dynamically inline to the activity itself.

Auto racing isn't just about how fast a souped-up car can go when you press pedal to metal. It's very much a team sport in which the driver coordinates tightly with the pit crew throughout the duration of the event. If you're unfamiliar with professional racing, you might think pit crews only serve a maintenance function: pumping gas, changing tires and hydrating the sweat-soaked driver at regular intervals. But, in fact, pit stops play a key strategic role in winning races. Specifically, tire changes are absolutely essential at key points in the race, because they enable the cars to run on brand-new treads that grip the track as well as possible. The very act of racing at 200+ miles per hour wears out treads in no time, effectively preventing vehicles from reaching their maximum potential velocities.

But there's an obvious trade-off in any in-race tire change decision: the pit stop consumes precious seconds and hurts the vehicle's rank position in a fast-moving pack. So, for each racing car's crew, the core analysis concerns how many pit stops can they afford to take before they become counterproductive.

This recent article illuminates the complex challenges that a data scientist might face in statistically modeling this decision-support scenario. The authors, Theja Tulabandhula and Cynthia Rudin of MIT, spell out the myriad variables and metrics that would be relevant to any such exercise. And they preface their discussion by mentioning that, essentially, nobody in the auto racing industry is currently doing pit crew analytics of any formal nature. Under the direction of a team captain, all professional pit crews still rely largely on gut feel and traditional rules of thumb.

Where decision support is concerned, the pit crew analytic model would need to combine all-time horizons. In other words, it would need to be real time (for obvious reasons), predictive (of the cumulative impact of pit crew decisions on race outcomes) and historical (considering only the within-race "history" that commenced with someone exclaiming "Start your engines!").

race track with race car.jpg

Though not averse to season-level historical analysis, the authors have constructed a statistical model on the assumption that races are won or lost based primarily on within-race decisions (especially tire changes) regardless of how well a specific team may have fared in prior races.

In addition, the authors note that the within-game "history" of a pro auto race cannot be easily modeled as a sequence of distinct "plays." They use this latter term in a more abstract sense than the average sports fan. In building their statistical model, they discuss the concept of "plays" within the notion of different "evolutions of the game." What they're calling a "play" is any segment of the action that begins and ends with distinct stoppages, followed either by resumption or completion of the game. Consistent with their parlance, a "play" might be defined by any or all of the following (depending on the sport): distinct periods (baseball's nine innings), possessions (football's four downs), timeouts (hockey's penalty calls) and moves (tennis serves). By the way, all of these latter terms have been coined by me now to call out the underlying conceptual model in a more abstract, sport-agnostic fashion than the article's authors do.

Where "plays" are concerned, pro racing's game evolution can be modeled as specific periods, timeouts and moves that take place within the real-time closed loop of a motor speedway. The chief periods consist of laps; most of these are full-tilt high-velocity "green" laps when cars are racing as fast as possible. The chief timeouts are "yellow" laps, when all the racers are required to slow down, as well as follow a safety car, and pit stops, which each team takes at their discretion. The chief moves involve accelerating, passing and pit stopping. The chief outcomes are rank positions.

Looked at this way, it's clear that auto racing is more continuously flowing than, say, basketball, hockey or soccer. That's because racing's periods (laps) flow continuously from one to the next, its timeouts are ad-hoc and discretionary and its moves are real time and dynamic. The objective outcome (first rank-position) depends on the efficacy of the seamless stream of actions and decisions from the race's start to its finish.

According to the authors' discussion: "At each point in time of a race, the entire history of the race determines the racer's current rank position. On the other hand, in basketball, the game is restarted at the beginning of each play and the team's current state does not heavily depend on their state before the restart. One can reasonably approximate a basketball game to be a sequence of independent plays and even model them as independent observations drawn from a distribution. These long-standing correlations of decisions within the race make racing inherently much more difficult to model."

What's fascinating is how the authors boil down the strategic in-game decisions to just a handful involving the pit crew. It all comes down to: how many pit stops, at what times in the race and involving how many tires per change-outs. There is always a trade-off between the number of fresh tires changed out per pit stop (all things considered, a car with four fresh ones will race faster than a vehicle with two new tires) and loss of rank position during the pit stop (changing out four tires takes longer than just two, or none at all).

Clearly, there are other factors determining the race's outcome, and the authors have built those into their predictive model as well, leveraging detailed data. Some non-obvious, non-linear racing dynamics come into play. For example, they observe that racers with high rank-position tend to go faster than those farther back; the higher velocity causes the leaders' tires to wear out faster, hence decelerate more quickly than cars back in the pack. This sets up a non-linear dynamic: pit stopping for tire-changes has a corresponding adverse impact on rank-position.

The paper is worth reading for any data scientist who wants to dabble in the "moneyball" arena. The bottom line, as the authors make explicit, is that each sport needs to be modeled on its own terms. A within-game decision-support predictive model for one sport cannot be applied directly to other sports, even ones that share a common ancestor or many surface similarities (baseball vs. cricket, tennis vs. badminton). No two sports have exact same "game evolution" structure, embody the exact same rules, play on the same surface, use the same equipment or generate the same types of performance data.

If they did, big data analysis would reveal that the seemingly different sports are in fact identical twins in different uniforms.