How Expected Goals Became a Crutch — and the Four Ways Football Gets It Wrong

FotMatch Insights · Data Story

xG was invented to remove bias from shot quality assessment. A decade later it is quoted by pundits, printed on broadcast graphics, and weaponised by fans in online arguments. Somewhere along the way, the metric stopped describing the game and started replacing it.

By FotMatch Editorial Team

· Updated 2026-05-06

· 6 min read

Expected goals was designed to answer a narrow question: how good was this shot, based on historical data? It was never designed to answer who deserved to win, which player is better, or whether a manager should be sacked.

What xG actually measures — and what it does not

Expected goals assigns a probability, between zero and one, to every shot taken in a match based on a model that has been trained on hundreds of thousands of historical shots. The inputs vary by model — Opta's version uses shot location, body part, assist type, and defensive pressure; StatsBomb's adds goalkeeper position, shot height, and number of defenders between ball and goal — but the output is the same: a decimal that represents the historical likelihood of that shot becoming a goal.

The critical, and frequently ignored, limitation is that xG is a retrospective aggregate. It tells you what should have happened to a large sample of similar shots in the past. It does not tell you what happened in this specific instance, and it certainly does not tell you why. A shot with an xG of 0.75 means that, historically, three out of four such shots were scored. If the striker missed, xG does not explain whether the miss was caused by a slip, a deflection, a goalkeeper's extraordinary save, or the striker's poor technique. It simply records that the shot was taken in conditions that usually produce goals.

This distinction matters because xG is increasingly treated as a diagnostic tool — a way to identify which team "deserved" a result or which striker is underperforming. That is not what it was built for. The metric was designed to improve shot-quality assessment over raw shot counts, which had been the standard for a century. It was never intended to become the single number that settles debates about luck, skill, and managerial competence.

Misuse one: xG as a justice scoreboard

The most common distortion occurs after a match in which one team wins despite a lower xG total. A team that creates 0.8 xG and wins 1-0 against a team that created 2.4 xG is described, almost automatically, as "lucky" or "undeserving." The implication is that xG reveals the "true" result beneath the actual scoreline, and that the team with the higher xG was somehow robbed.

This interpretation ignores what football is. A team that creates 2.4 xG but fails to score has, by definition, failed to convert chances. A team that creates 0.8 xG and scores once has, by definition, converted its chance. The match is not a simulation that should have produced a different result; it is a contest in which finishing, decision-making under pressure, and defensive organisation are as much a part of the sport as chance creation. xG measures the quality of the chances, not the quality of the players who took them.

The "deserved" framing also misrepresents what high xG can mean. A team that dominates possession against a deep defensive block often accumulates high xG through a large number of low-quality shots — long-range efforts, crowded headers, and rushed attempts under pressure. A team that defends deeply and scores on a single well-worked counter-attack may have a lower xG total but a higher average xG per shot. The aggregate number obscures this. Two teams with 1.5 xG can have arrived at that total through entirely different tactical processes, one sustainable and one not.

Misuse two: xG overperformance as proof of finishing skill

When a striker scores twenty goals from fifteen xG over a season, the reflexive conclusion is that he is an elite finisher — a player who consistently converts chances at a rate above the historical average. Sometimes this is true. More often, it is a statistical mirage that will correct itself over time.

The problem is sample size. A Premier League season contains roughly thirty to forty shots for a starting striker. Over a sample that small, random variation can produce significant divergence from the expected mean. A striker who overperforms his xG by five goals in one season has roughly a 40% probability, based on historical Premier League data, of regressing toward or underperforming his xG in the following season. The overperformance is not necessarily skill; it may be variance.

There are genuine elite finishers — Harry Kane, Erling Haaland, Mohamed Salah — who have demonstrated, over multiple seasons and hundreds of shots, a sustained ability to outperform xG. But even these players show season-to-season fluctuation. Kane overperformed his Premier League xG by roughly 25% across the 2015-16 to 2021-22 seasons, a sample large enough to suggest genuine finishing skill. Yet within that period, individual seasons varied: he overperformed by 35% in 2016-17 and by only 8% in 2019-20. The multi-season trend is meaningful; the single-season deviation is not. Treating a one-year overperformance as evidence of a permanent "clinical" finishing ability is a misuse of statistics that has led clubs to overpay for strikers who promptly regressed.

Misuse three: defensive xG and the invisible goalkeeper

The application of xG to defensive performance is even more problematic. Some analysts use "xG against" — the total expected goals conceded by a team — as a measure of defensive quality. A team with a low xG against is described as defensively solid; a team with a high xG against is described as defensively porous. The metric seems objective. It is not.

xG against is heavily influenced by goalkeeper performance, which xG models deliberately exclude. A goalkeeper who makes a series of saves on high-xG shots will reduce his team's xG against total — not because the defence improved, but because the goalkeeper prevented the goals that the model predicted should have been scored. If the goalkeeper is then sold or injured, the team's xG against may spike in the following season, and the defence will be described as having "regressed" when in fact the goalkeeper was the variable that changed.

The same problem applies in reverse. A defence that concedes a low xG against because its goalkeeper has been exceptional is not necessarily a good defence; it is a defence protected by a good goalkeeper. Separating the two requires models that measure "post-shot xG" — the probability of a goal after the shot has been taken, based on placement and power — which allows analysts to compare the quality of the chances conceded with the quality of the saves made. Even then, the interaction between defensive pressure and shot placement is complex, and most xG models do not capture it fully. The result is a metric that appears to measure defence but actually measures a combination of defence, goalkeeping, and random variation that the model cannot disentangle.

Misuse four: using xG to evaluate managers in isolation

Perhaps the most damaging misuse of xG is its application to managerial evaluation. A manager whose team consistently underperforms its xG is described as unable to "coach finishing" or guilty of creating "the wrong kind of chances." A manager whose team overperforms its xG is praised for tactical genius, often without evidence that the overperformance is caused by tactics rather than individual finishing skill or random variation.

The error here is causal confusion. A manager's tactical system influences shot quality — the positions from which shots are taken, the defensive pressure on the shooter, and the assist type — and these are legitimate inputs to xG models. But the manager does not control whether the striker slips, whether the goalkeeper guesses right, or whether the woodwork is struck. These are the variables that produce the gap between xG and actual goals, and they are largely outside managerial influence. A manager who creates high-xG chances but watches his strikers miss them repeatedly is, in xG terms, underperforming. But the cause of the underperformance may be striker quality, luck, or both — none of which is a direct reflection of the manager's tactical competence.

The responsible use of xG in managerial evaluation is comparative, not absolute. A manager should be assessed on whether his team's xG trends are improving over time, whether the quality of chances created is rising relative to the league average, and whether the team's xG profile matches its tactical identity. A defensive team that is conceding fewer high-xG chances than the previous season is improving defensively, regardless of whether the goalkeeper is saving them. An attacking team that is creating more high-xG chances per match is improving offensively, regardless of whether the striker is converting them. The metric is a tool for describing process, not a verdict on outcomes. When it is used as the latter, it becomes not just misleading but actively harmful to the sport's understanding of itself.