POWER BI - Formula One via REST API

Hendrik Speelman
Jan 10, 2022
5 min read

Nothing more than passion

Formula One is one of my greatest passions. Ever since I was young I sat down in front of the television every race weekend to look at my racing heroes. Every race I was astonished by the atmosphere, culture, and of course the speed of the drivers. It almost feels like that speed is as fast as data going from one place to another.

Combining this passion with my passion for data was a challenge of mine that I always wanted to achieve, and now the possibilities are there with open data and tools like Power BI.

Tools, methods, and why

Power BI is one of my favorite BI tools to work with and has tons of possibilities to connect to data and to provide good looking visualizations. In this post, I will explain how I connected Power BI to Formula One data via an open REST API.

Always keep in mind that the goal of your project determines the way to achieve it. In this particular case, I choose to retrieve the data via a REST API, so I can easily refresh the data without exporting and importing the data again every time a new race has been driven. If I would have made an infographic for a specific driver, the approach would have been different.

The bottomline stays the same: always look for the right answer to your data question and challenge every possibility and outcome.

First things first, find data

Every time I search for data on the web, it can be a challenge, as I never know if there is available data. Some datasets can be bought; others are free to use. When I search for data (via our friend Google), I always add 'open data' to my search terms. This way, I am sure that the data is available and (often) free to use. When I did this for my formula one dataset, I found an open data REST API which I could use from a guy called Ergast. It provides a lot of information like drivers, races, countries, and even lap times. The dataset is updated frequently, so it feels convenient to use in my report.

Let's connect to the dataset

When I analyzed the data and knew what I was going to show on my dashboard (after a couple of minutes of brainstorming with my wife), I could finally set up the connection to the desired web URLs. Below you have a walkthrough on how to connect via Power BI:

1. Get Data

This can be done via the Home tab of Power BI. When a URL was prompted I filled in the following:

https://ergast.com/api/f1/results?limit=1000

As you can see, I only retrieve the first 1000 records, as this is the limit via this REST API. The result is as follows:

Looking at the data table itself, I can see that there are more than 1000 records available (25.399 to be exact), so this already gave me a first challenge: iterate over the pages of the dataset.

A REST API often works with pages, where you can iterate over the pages until all the data is retrieved, but in this case we have another parameter to work with in the URL: offset.

With the help of the offset parameter, I can navigate through the data and retrieve the data in buckets of 1000 records per REST call. I also have the total records of the dataset, so I know exactly how many calls I need to do.

Below you can find the code snippet which can be copy pasted in the advanced editor:

let
    Source = Xml.Tables(Web.Contents("https://ergast.com/api/f1/results?limit=1000")),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Attribute:total"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Attribute:total", Int64.Type}}),
    ValueMaxRecords = #"Changed Type"{0}[#"Attribute:total"],
    GeneratedOffsets = List.Generate(() => 0, each _ < ValueMaxRecords, each _ + 1000),
    #"Converted to Table" = Table.FromList(GeneratedOffsets, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Offset"}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Renamed Columns",{{"Offset", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type2", "GetData", each Xml.Tables(Web.Contents("https://ergast.com/api/f1/results?limit=1000&offset="&[Offset])))
in
    #"Added Custom"

When running this code with that URL, Power BI will generate the necessary data, taking the offset parameter into account.

This needed to be done for each REST call for a dataset that had more than 1000 records. So in this case I created a function which could be called in any dataset. This allowed me to centralize the list generation and minimize the maintenance work.

When implementing this function, it was a piece of cake to retrieve other datasets as well. Here is a full list of the datasets I retrieved:

Results
Qualification Results
Driver
Constructor
Season
Race
Status

2. Model the data

When the data is retrieved, it is often hard to work with, as the data is not yet modeled to be linked together for analytical purposes. Questions arise like: "do I need role-playing dimensions?", "Do I need a time dimension?", "Is all necessary data available?".

In this step, it was needed to model the data in a way to link everything together for analytical purposes. Here you can find out what I did:

Treat the tables 'Results' and 'Qualification Results' as facts
Other tables are treated as dimensions
Addition of surrogate keys to make the lookup to certain tables more clear
Provided a role-playing dimension for the driver dimension, as I wanted to be able to compare drivers with each other

The final data model that ended up in the relationship view is the following:

3. Visualize the data

In order to visualize the data, I needed to understand the data. So I first took a look at what was possible to show, and which visual would be ideal. I also needed to have a frame where I would place my visuals in, so I downloaded some images from the web to get a background etc.

With regards to Power BI, the approach to build visuals is really easy. You can switch from one visual to another, and determine fast which one is better. For this specific project, I wanted to see the status of a certain season. I also wanted to dig out which driver had the overall better performance according to some parameters:

Most fastest laps
Most pole positions
Most race wins
Most points

Next to that, I also wanted to be able to switch easily between points and wins, as these are two different things to analyze and could potentially uncover some insights.

For the visual part itself, and if you are a fan of Formula One, you know that teams have their own specific colors, so I added some conditional formatting on the labels to display the colors accordingly.

Finally, I wanted to be able to compare drivers with each other. Creating the roleplaying dimension in the modelling step really helped me to be able to generate the necessary views. Images are downloaded from the web and regenerated via a base64 creator.

In the driver selection, it is possible to select a driver that has driven in the specified season.

Information is gathered to show the driver's code, full name, nationality, and age.

Next to that, the user gets to see the total points scored, pole positions, race wins, and DNFs (did not finish).

The same goes for selecting the second driver in a different frame.

The final result looks like this:

Conclusion

If the data is there and ready to use it, don't hesitate to exploit its power. In this case, the REST API was used to dig into the data and overcome challenges like pagination, dimensional modelling, role-playing dimensions, and visualization creation. Power BI is a tool which can offer all of these, making it very versatile. Even though the report itself can still be improved, I am pretty happy with the result and I can't wait for the next Formula One season to begin to uncover more insights!

HENNY SPEELMAN |