How Probably Is a Six Nations Grand Slam in 2025? | by Harry Snart | Jan, 2025

Quantifying uncertainty in sports activities fixtures

Picture by Thomas Serer on Unsplash

For rugby followers the lengthy wait is almost over, like Christmas the Six Nations comes every year to carry our spirits within the chilly winter months. Should you’re not very accustomed to rugby, the Six Nations is an annual event the place the highest nationwide sides in Europe (England, France, Eire, Italy, Scotland, Wales) every play 5 fixtures alternating who performs at house or away every year. All groups compete to win, however essentially the most coveted prize is a ‘Grandslam’ — the place a workforce wins all 5 of their fixtures. Given how aggressive the event is a Grandslam in all fairness uncommon, and for the reason that event was expanded to 6 sides in 2000 there have solely been 13 Grandslams of a doable 25.

This yr, within the 2025 event, Eire come into the competitors competing for a 3rd consecutive collection win with stiff competitors from France, who’s home league (The High 14) has been electrical this yr within the European Champions Cup.

With that in thoughts, and on condition that roughly half of tournaments have led to a Grandslam, how seemingly is a Grandslam in 2025? On this quick article we’ll discover how we are able to use earlier fixture outcomes and different data to make a finest guess at how seemingly a Grandslam is. We’ll be specializing in linear fashions, and we’ll discover this from each the Frequentist and Bayesian Perspective. The fashions are constructed utilizing SciKit-Be taught and the Bayesian modelling library Bambi (which is constructed on high of the wonderful PyMC framework).

Learn on to grasp how and why I estimate the chance of a Six Nations Grandslam to be round 30–40% in 2025.

Within the age of AI persons are more and more used to mapping inputs to outputs with extremely correct predictions. Whether or not that is utilizing LLMs to generate pure language responses, Pc Imaginative and prescient fashions to tag photos and even Auto ML to foretell tabular datasets it’s more and more taken with no consideration that these fashions simply work.

Despite this, the connection between inputs and outputs naturally includes a degree of uncertainty — and when you find yourself working with small or noisy datasets, such as you typically see in sports activities, it is very important connect an estimate of uncertainty to your predictions. For instance, the opening fixture of the 2025 Six Nations France host Wales at house — we could predict that France will win, however how assured are we about this?

The dataset used for this evaluation is sourced from publicly out there sources, resembling Wikipedia. The problem with predicting 2025 fixture outcomes is that the out-of-sample predictions are based mostly on panel information, and workforce type usually fluctuates throughout the years as squads and managers change.

In our publicly sourced information we collect stats from 2020–2024 together with:

  • The age profile of squads
  • The expertise of squads (i.e. variety of worldwide caps)
  • The variety of distinct membership sides that make up a nationwide squad
  • Earlier desk place
  • Earlier fixture consequence
  • Whether or not there’s a change of coach for the reason that earlier event

The information preparation right here is completed utilizing Pandas. Determine 1 exhibits how we merge the info on a fixture degree foundation, incorporating details about the squad for every year of the event. Taking a look at this we are able to see that in 2025:

  • Eire have the oldest squad with a proportionally excessive variety of caps on common. This tells us that the squad is very established and, since Irish rugby is provincial, the squad is made up of solely 4 sides. Given the age profile of the aspect and that they’ve a brand new coach for this event there could also be uncertainty over whether or not they could be at or close to the ‘peak’ as a squad
  • France have one of many youngest squads on common and, on common, the bottom variety of caps. Regardless of this they’ve been performing exceptionally effectively, and got here second within the 2024 event suggesting their squad is on the rise
  • England have the second youngest squad, however proportionally extra caps on common suggesting they’re making an attempt to steadiness youth with expertise within the 2025 event
  • Scotland have the second oldest and one of the crucial capped squads within the event. They’ve a longtime aspect and, arguably, underperformed in 2024 the place they got here in fourth place. Their aspect could also be nearing its peak earlier than they undergo a interval of rebuilding
  • Italy are in the same place to Scotland by way of common variety of caps, however with a barely youthful age profile. There was a lot of adjustments in administration through the years however come into the competitors this yr with a longtime squad and the identical coach. They could shock individuals this yr
  • Wales are in a interval of rebuilding and have a younger and inexperienced squad and underperformed within the 2024 event the place they got here in final place

Since we’re utilizing linear strategies to foretell outcomes, I created a binary flag for whether or not or not the house aspect received the fixture, and for every fixture we’ll predict the chances of the house aspect successful (i.e. sure/no). The chance of not successful at house is, implicitly, the identical as predicting that the away aspect win.

Determine 1 — Match Historical past and 2025 Fixtures ready utilizing Pandas

Earlier than constructing a predictive mannequin, it is very important do some exploratory evaluation. Determine 2 exhibits the correlation plot for the options.

As you may count on, the place you completed final yr is very correlated to successful this yr. Likewise, your squad profile is very correlated with successful. Having a change of coach is correlated, however not as strongly — although this can be as a result of there are proportionally fewer cases the place this occurs between tournaments.

An necessary consideration right here is whether or not there may be correlation amongst the inputs (options) of the mannequin, since autocorrelation can negatively affect mannequin reliability. We will see right here that there’s a sturdy correlation to the age and variety of caps, that is intuitive since older gamers will (on common) have extra caps. To accommodate this we exchange these inputs with a composite characteristic which represents the proportion of caps to age. We additionally take away a number of of the much less correlated inputs from the mannequin, since typically much less is extra when becoming a mannequin to keep away from overfitting.

Determine 2 — Correlation plot on historic outcomes

As soon as we now have recognized the options of our mannequin we are able to put together the info for coaching. Since it is a panel information downside we break up the info as beneath.

Mannequin Validation: We begin by validating the mannequin and getting an estimate of out-of-sample accuracy. To do that we back-test on earlier tournaments

  • Prepare dataset — fixture outcomes 2020–2023
  • Check dataset — fixture ends in 2024 event

Mannequin Predictions: We will create our predictive mannequin for 2025 for out-of-sample predictions as

  • Prepare dataset — fixture outcomes from 2020–2024
  • Prediction dataset — upcoming fixtures for 2025

We put together the dataset for modelling utilizing:

  • One-hot encoding for fixtures
  • MinMax scaling for numeric options

It is very important apply the scaling on every dataset individually to mitigate the chance of information leakage.

We will create our Frequentist mannequin utilizing SciKit-Be taught’s Logistic Regression classifier. Determine 3 exhibits the Confusion Matrix for the back-testing on 2020–2024 fixtures

Determine 3 — Confusion Matrix for Backtesting

In Determine 3 we are able to see that the accuracy of the mannequin is round 73%. Chances are you’ll be questioning why there’s a complete of 30 fixtures for the 2024 predictions when there’s solely 15 fixtures every event? The explanation for that is, to be able to enhance mannequin accuracy, we stack the info in order that we get a Dwelling and Away consequence for every fixture. It’s because sides solely play one another as soon as per yr and swap house and away every event. We, as people, perceive that France v Wales is identical as Wales v France, however the mannequin can’t straight perceive this. To do that we swap house and away, after which swap the binary flag for house win, preserving the integrity of the info.

For instance:

  • 2024 Wales v France → HomeWin = 0 [original]
  • 2024 France v Wales → HomeWin = 1 [inverted]

Utilizing our out-of-sample predictions for 2025 we get the beneath win possibilities for the upcoming 2025 event.

Desk 1 — Level Estimates for 2025

In Desk 1 we see that:

  • Eire are anticipated to do effectively based mostly on earlier type and an opportunity to get a ‘three-peat’ (third consecutive title)
  • France are anticipated to do very effectively, notably at house
  • England have a fairly sturdy likelihood, however in all chance will end mid-table
  • Scotland are anticipated to have the slight edge within the Calcutta cup once more this yr, however will probably be tight
  • Italy and Wales shall be anticipated to compete to keep away from the picket spoon, with Italy anticipated to be slight favourites

As soon as we’ve estimated the chances for the fixtures, we are able to use Monte Carlo strategies to simulate the event and estimate the chance of a Six Nations Grandslam. Monte Carlo strategies use random sampling to estimate possibilities and quantify uncertainty.

To do that we run 10,000 event simulations making a random selection seeded with our win possibilities. To do that we use Numpy’s random selection methodology for our set of house and away fixtures with the corresponding win possibilities. Determine 4 exhibits us a violin plot for the simulated variety of wins per event per aspect

Determine 4 — Simulated Match Outcomes from Frequentist Possibilities

It’s value noting that these factors are jittered to enhance the aesthetics of the plot, however general, we are able to see from Determine 4 that:

  • France and Eire are clear favourites to win, although based mostly on previous type Eire is likely to be anticipated to be extra more likely to win a Grandslam
  • It’s necessary to notice that previous type doesn’t all the time predict present type, for instance Eire have a brand new head coach, the oldest workforce and are a rebuild section following the retirement of their key playmaker, Jonny Sexton
  • England and Scotland might trigger some upsets, however are more likely to be battling it out for the upper-mid desk place. Based mostly on current type Scotland usually tend to get 3 wins and England 2 wins, however there may be extra uncertainty on how England might do within the competitors
  • Wales and Italy are more likely to be scrapping it out for the underside of the desk, with each groups pretty more likely to choose up a minimum of one win within the event, although this can be the Italy-Wales fixture, which Italy are doable favourites for given house benefit in 2025

General, this mannequin seems in-line with what many pundits have mentioned about their expectations for the event. One limitation of this method is that we’re making the idea that the win possibilities of the fixtures are usually distributed across the level estimates from the Logistic Regression mannequin. This can be a powerful assumption.

One other assumption of the mannequin is that the end result of a win in a single fixture doesn’t have an effect on the win possibilities in different fixtures, i.e. that fixtures are unbiased. Personally, I don’t assume that is fully unreasonable since that is skilled sport, and sides are coached to have a successful mindset in every fixture — and infrequently sides are inconsistent between fixtures. For instance, Scotland carried out very effectively towards England in 2024 however went on to lose subsequent fixtures and England went on to beat Eire who in the end received the event.

We will keep away from making sturdy assumptions on the distribution of win possibilities throughout the event by as an alternative sampling these straight. To do that we are able to use Markov Chain Monte Carlo (MCMC) strategies — which offer a Bayesian method to estimating the distribution of mannequin parameters by random sampling. Basically, the fashions work by updating their prior beliefs on the distribution of mannequin parameters because the sampler observes actual information. As soon as the mannequin converges across the ‘true’ distributions it samples straight from the posterior distribution of the mannequin parameters. Within the case of a Logistic Regression mannequin, we mannequin the goal variable as a Bernoulli distribution.

There are potential drawbacks to utilizing Bayesian Logistic Regression fashions, for instance they are often delicate to the priors that the mannequin assumes, the prediction possibilities is probably not effectively calibrated (relying on the prior assumptions) and, within the case of a hierarchical mannequin, there could also be ‘shrinkage’. Shrinkage happens the place hierarchy ranges are pulled the imply of the father or mother degree — in sports activities modelling the affect of that is that groups which can be on the high and backside of the desk could have their estimates pulled up or down in direction of the imply of the desk.

Determine 5 — Estimating Grandslam possibilities as samples straight from the predictive posterior distribution

Determine 5 exhibits the violin plot for the estimated distribution of wins taken straight from the predictive posterior distribution. The distributions look just a little extra unfold out than from our Logistic Regression, probably indicating the upper unfold of uncertainty in our mannequin. Trying on the plot there could also be some shrinkage as each Wales and Italy are anticipated to do higher than within the Logistic Regression mannequin, and Eire seem to have much less likelihood of a Grandslam.

We will use our samples to straight estimate the chance of a Grandslam by merely taking the variety of Grandslams over the variety of tournaments, that is proven in Determine 6.

Determine 6 — Estimating possibilities from Monte Carlo and MCMC Samples

We will then examine our mannequin outcomes to printed odds. I discovered some odds printed by a wager maker on January 1st that gave the next odds:

  • No Winner 5/6 [this implies Any Winner odds of 6/5]
  • Eire 10/3
  • France 9/2
  • England 9/1
  • Scotland 14/1
  • Wales 500/1
  • Italy 2000/1

We will convert the printed odds to approximate possibilities utilizing the beneath formulation:

There are two issues to think about right here:

  • Firstly, betting firms publish implied odds fairly than true odds since they think about a revenue margin for the percentages they publish (i.e. the home all the time wins)
  • Secondly, odds change as new data turns into out there. Our evaluation is comparatively easy and doesn’t think about accidents or different elements. That is necessary since there have been notable accidents and withdrawals forward of the beginning of the event so the percentages could have modified. For this reason I’m evaluating the percentages we’ve estimated to ones printed in the beginning of the yr the place current accidents received’t have an effect on the printed odds.

So how do our fashions examine to printed odds? Our Frequentist mannequin was surprisingly shut, and our Bayesian mannequin implied there was much less certainty on the chance of a Six Nations Grandslam. In Desk 2 you may see a comparability of the transformed odds and our estimated possibilities

Desk 2 — Comparability of our mannequin to printed odds

General, our estimates don’t look unreasonable regardless of the comparatively small and sparse dataset we had been utilizing.

Our evaluation discovered that:

  • Within the 2025 Six Nations France more likely to find yourself punching above their weight given the comparatively youthful aspect they’ve received
  • Eire look the almost definitely to get a Grandslam, however that is based mostly on previous efficiency. With a brand new coach, getting older squad and altering of playmakers the outlook is much less sure
  • England’s True Odds are more likely to be worse than their Implied Odds and based mostly on previous efficiency ought to intention for a powerful mid-table place. They’ve one of many youngest squads however with extra caps than different sturdy sides relative to their age profile. They’ve the potential to be disruptive within the event
  • Scotland have a greater likelihood of a Grandslam than England and are more likely to be additionally competing for a powerful mid-table place. They’ve the second oldest and most skilled workforce after Eire and could also be at or close to their peak as a squad. Might it’s now or by no means for this squad?
  • Wales and Italy are unlikely to be excessive performers within the 2025 Six Nations, and Italy shall be vying to complete above Wales for the second yr operating
  • There’s a moderately sturdy likelihood of a Grandslam by any workforce, round a 30–40% likelihood
  • This could possibly be a really aggressive event general with many sides having a great likelihood of successful

On this article we’ve seen how we are able to leverage Frequentist and Bayesian strategies to quantify uncertainty across the seemingly winners of the Six Nations in 2025. While our fashions had been comparatively easy and constrained to utilizing a small dataset our possibilities weren’t too dissimilar from printed odds, although these have since modified as occasions have developed (accidents, call-ups, and so on.).

Thanks for studying this text, I hope its been fascinating. Should you’re enthusiastic about studying extra concerning the evaluation you’ll find the complete code on my GitHub account.