Adapted from Marvin Ronsdorf on Unsplash

Bundesliga in Numbers: Before and After the Corona-Restart [Project]

This project corresponds to my article ‘Bundesliga in Numbers: Before and After the Corona-Restart’. In my article I described the change in performance statistics which occured after the restart of the German Bundesliga after the Corona-break. It was showed that some clubs clearly showed a bettter performance, but others did badly after the break. All in all, the home-field advantage disappeared.

Since this was a statistical analysis, there is not much technical to say, here I want to focus on the part of data visualization. Specifically I want to talk about two little tricks which upvalue the shown images and plots significantly. The one point I want present to you is annotating points in plots, the other is to use images and logos, respectively, as points in a graph. Espacially the second trick is extremely helpful for presentations, in which two companies are compared, for instance. Not only the plots get more appealing, a quicker information intake from the plot is notably simplified. I used the popular python plotting library Matplotlib, oythons oldest and most common plotting library, and Seaborn, a data visualization library based on matplotlib.

The Data

First, I want to say some words how I collected the data, although I will not go into detail on this topic. Originally, the data was collected for another project, on which an articel will be released at a later date. Most of the data is from API-Football, supplemented by data made available at football-data.co.uk.

Annotation of Points

For the plots from the article I used seaborn’s pointplot function. From this function we immideately get the mean of the data without the need to compute it beforehand.

import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import seaborn as sns
ax = sns.pointplot(x=x_column, y=y_column, data=data, ci=None)
sns.despine()

We use ci=None to disabble error bars, the ax.despine()-command disables the upper axis as well as the right axis. The pointplot-function returns an Matplotlib Axes object from which we can extract the coordinates of the plotted points.
I like to mention that we could do the whole thing the other way around by simple calulating the mean of the groups of data in the first place and do the plotting afterwards. As a result, there would be no need to get the coordinates from the Axes object. However, the advantage is that we do not have to change the inital data or create a new DataFrame. Therefore, the python script is cleaner regarding the data and copies of the data. On another note, this method can be used for all other types of plots as well. Sometimes it is easier to get coordinates from the plot in the aftermath as opposed to do the neccessary calculations beforehand.
A detail for this plot: we just want to annotate the points on the right. For this purpose we get all y-values of all points of a line via line.get_ydata(). This returns a list, in which one value together with the corresponding elemnt fro the list of x-values reassembles the coordinates of a point. Since in this plot the lines go from 0 to 1, we are interested in the elemnts at index 1.

right_points = [line.get_ydata()[1] for line in ax.lines if np.array_equal(line.get_xdata(), np.array([0, 1]))]
colors = [line.get_color() for line in ax.lines if np.array_equal(line.get_xdata(), np.array([0, 1]))]

But why we need the if-statement? This is because I wanted to plot one black point in the origin of the coordinate system, which overlays the other colors.
We want to change the font color of the annotations from black to the one of the corresponding line. We can achieve this in an identical way, as shown above. The text to annotate the points are simply the entries from the legend.
The legend itself gets disabled afterwards. A function which is extremely useful but gets rarely used is enumerate. It comes in handy if we want to iterate over a list and use the value of the elemnt as well as the index. This way we can assign the right coordinates and the right color to the text. We hardcode the x-value which should be a little bit larger than 1 for the text not interfering with the points but is slightly shifted to the right.

for i, label in enumerate(ax.legend().get_texts()):
    annotation = label.get_text()
    ax.text(1.07, right_points[i], annotation, color=colors[i])

Images as Points in a Plot

One question I often asked myself was how to use logos as points in a plot. This comes in very handy in a presentation were one would like to compare some data of two companies, or in this particular case of Bundesliga clubs. By this, one is not dependent on the line colors. Rather, a quicker information intake is made possible because the assigmnents of the values to one of the companies is much easier and more intuitive. The function which acomplishes that is the following:

def imscatter(x, y, label, image_dict, ax=None):
    if ax is None:
        ax = plt.gca()
    x, y = np.atleast_1d(x, y)
    artists = []
    for x0, y0, l0 in zip(x, y, label):
        image = plt.imread(image_dict.get(l0, some_default_image_path))
        height, width = image.shape[0], image.shape[1]
        if 2*height > width:
            zoom = 20 / height
        else:
            zoom = 40 / width

        im = OffsetImage(image, zoom=zoom)
        ab = AnnotationBbox(im, (x0, y0), xycoords='data', frameon=False)
        artists.append(ax.add_artist(ab))
    ax.update_datalim(np.column_stack([x, y]))
    ax.autoscale()
    return artists

In the following we will go through it line by line. First, it is checked whether the function gets an Axes object. If not, the current instance is assigned. Then we iterate over the coordinates x and y and the corresponding label of this point. These three values are stored in lists, respectively. We then use the label to get the path to the image we want to use. The assignment between these to values is stored in a dictionary, which is passed through one of the arguments of the function. Afterwards, the size, or the zoom-factor of the image is set. Intead of using fixed values to determin the zoom-factor, this could be another argument of the function, alternatively. The class OffsetImage is then used to scale the image, afterwards we add it to the plot via an annotation. This is done by the add_artists()-command. Artists are objects which will be rendered in a plot. Lastely, we add all the new coordinates to the Axes object and scale so that all relevant points are shown in the plot.

Summary

Today I showed you two little tricks which will upvalue your plots substantially. One, I explained how to annotate points in a Matplotlib Axes object and two, I showed how to integrate images and logos as points into a plot. I hope you find these two tricks helpful. The corresponding article can be found here.

Marian Biermann
Data Scientist

Posts

How were performance statistics in the German Bundesliga affected by Corona-caused break and by empty stadiums? Which clubs did profit, which did have a hard time?