Last weekend, for the very first time, we introduced ‘Race Insights’, an innovative race prediction system which combines Artificial Intelligence and historical race data to predict key outcomes and eventualities of Formula One races.
Although the Turkish GP provided exciting and unpredictable racing, in reflection our data proved promising, correctly predicting several of the races final outcomes.
This article has been designed (with the assistance of Erick Mahle) to give you a greater understanding of the AI technology and data collection program behind the predictions looking closely at the three main components which define the system.
Data, Data, Data
One of the central components of the AI model is the input of race, driver and track information. The circuits themselves play a key role in predicting the result of a race, variables such as the elevation changes, number of DRS zones and the number of corners are all taken into account within the data input. These controlled variables are combined with extraneous (changeable) variables such as weather and air temperature to paint an accurate picture of track complexity and race day conditions.
In addition to this, the AI model also takes into account driver and team specific data to predict and analyse how individual drivers will perform on race day. For example, the model looks at a driver’s average race pace, their DNF rate and the number of overtakes they perform (on average) during a season.
Not only does the model analyse circuit and driver data from the 2020 season, it analyses patterns and trends from as far back as the 2010 Formula One season. By using historic race data we can analyse a driver’s progression or a team’s regression over time. This is primarily seen in the gradual disappearance of constructors such as Marussia, Caterham and Lotus and the changing fortunes of teams such as McLaren and Ferrari.
After all this information has been captured, the data is combined in a ‘story’ which, alongside an algorithm and a statistical overview, aims to predict four key criteria.
Building the model
Machines running artificial intelligence can use a range of algorithms to create reliable outcomes, some of the central algorithms within this program are gradient boosting or alternatively linear regression.
Once all the algorithms and data sets have been factored into the AI we can try and accurately predict our four key criteria;
- Qualifying results – How each driver will perform on a Saturday
- Final race results – Where each driver will finish
- Fastest Lap – Which driver will achieve the crucial extra point
- DNFs – Whether a driver will finish the race
Furthermore, the program also provides fascinating insights and statistics into the contribution that a driver may have to a team’s success. For example, you can follow the performance of Kimi Raikkonen as the Finnish driver moved from Lotus to Ferrari and then to Alfa Romeo.
The final stage of creating race insights is the statistical review. This review centers around the crucial ‘R²’.
R² is how the performance accuracy of the model (and the data behind it) is measured. To have a high degree of confidence in the prediction model the aim should be for 90% accuracy. However, this high degree of certainty isn’t always possible with the information available to the program.
This model is constantly evolving, as more driver, team and circuit data is input into the model the accuracy will improve and the data set will increase.
A massive thank you to Erick Mahle for his being the brains behind our race insights and for his assistance in creating this article!
Follow our socials as we reveal our race insights for the final races of the 2020 season!
Would you like to learn from leading Motorsport Engineers? Check out our Motorsport Academy!