Using analytics to predict the World Cup winner
At the beginning of the 2014 World Cup in Brazil, speculations were, as always, intense about who is going to win, this time around. As the most important sporting event in the football world was about to start, the sheer number of global audiences it gathers was bound to lead to a vast array of biased or unbiased predictions for its final outcome. Needless to say, it most definitely did. Among this variety of suppositions and guesses some companies took to analysing big data in order to offer an information driven and unbiased prediction for the eventual winner.
The most prominent was FiveThirtyEight which used a model developed by ESPN, called the Soccer Power Index (SPI), to predict each team’s chances of advancing throughout the stages and ultimately winning the cup. The SPI, detailed in a Nate Silver article, is a rating system that aggregates a team’s overall quality based on historical data from several years regarding team results, as well as all individual players’ performances.
The team at FiveThirtyEight used the forecasting SPI model to run over 10,000 simulations in order to calculate the probability of each team of advancing in the tournament and actually winning the World Cup. Brazil was deemed to have a 45.2% chance of winning the World Cup, with Argentina coming in a distant second place, with 12.8%, and Germany rounding the top tree with 10.9%. The result was not so surprising given the fact that a previous prediction published by Goldman Sachs credited the home country with a 48.5% chance to win the cup.
The basis for this for this statistical model is not easy to argue with, as the documentation is very solid and backed up with facts that are made available to the public and presented in the aforementioned article. Nonetheless, giving Brazil a 45.2% chance to win the tournament is at least overly enthusiastic, given the quality of the current crop of Brazilian players selected by Scolari and the crucial fact that Brazil did not participate in the South American qualifiers.
Last night’s semi-final exposed, in my opinion, exactly those particular aspects. Before the game, the FiveThirtyEight model continued to back Brazil with a 65% chance to win the game in their pre-match article in which they even took into consideration the absence of probably the most important players in Brazil’s current team: Thiago Silva and Neymar.
But the reality of football kicked in yesterday and everyone saw probably one of the worst Brazilian sides in recent memory, maybe even history, mulled 7-1 by Germany, whom I consider should have been touted the tournaments’ favourites from the start. Even though the magnitude of the defeat was a shocker for everyone, the fact that Germany surpassed Brazil came to a surprise to no one except probably just Brazilians and FiveThirtyEight.
Even though the SPI model is a clear advancement in terms of measuring the quality of football players and teams, this World Cup, and particularly the game last night, shows that we are still some distance apart from being able to accurately measure aspects of the beautiful game. The actual quality of each individual player, their current form, the cohesiveness of a team as a unit and the number of official matches together played prior to a tournament bear a critical influence on a team’s performance and it is apparent that important strides have to be made for us to be able to adequately quantify a football squad’s quality and make accurate forecasts based on that.
Mihai Toma. Business Research Analyst