How to build an xG analysis service based on open data?

What is xG in football and why is the xG analysis service needed

xG (expected goals) is a metric that assesses the likelihood that a specific shot on goal will result in a goal. For each shot, the xG model assigns a value from 0 to 1, where 0 means a nearly hopeless moment, and 1 means a virtually guaranteed goal. At the final level, the xG of a team for a match or season shows how many goals the team should have scored based on the quality of created chances, rather than the actual result on the scoreboard.

xG analysis services allow you to see what is hidden behind the score: the real quality of play, the stability of style, the effectiveness of attack and defense. Clubs use xG to evaluate coaches and players, scouts use it to select reinforcements, media use it for in-depth match analytics, and betting operators and tipsters use it to build advanced forecasting models. Unlike simple statistics of shots or possession, xG takes into account the context of the moment: the shooting zone, the type of pass, the position of defenders, the position of the goalkeeper, and other factors (in advanced models).

A separate xG analytics service turns these calculations into a product: interactive dashboards, xG graphs during the match, shot maps, team and player ratings over time. Such a service can be monetized through subscriptions, B2B access, partnerships with media and bookmakers. For the xG service to be sustainable and scalable, it must rely on a reliable data source — primarily on Sports events API, which consistently provides matches, events, and statistics in a convenient machine-readable format.

Open data and free APIs for football statistics for xG models

You can start working on an xG service with open data: publicly available match protocols from leagues and federations, CSV files from enthusiasts, free demo or free API plans. However, such sources often provide a limited amount of information: the final score, goals, authors, and sometimes basic team statistics. To build a sustainable xG model, this is usually not enough, especially if you want to take into account the quality of moments, shooting zones, or create advanced metrics like xGOT (expected goals on target) and xA (expected assists).

Free APIs are generally limited in the number of requests per day, the number of tournaments, or the depth of history. For a prototype, this may be sufficient, but for a commercial xG service, you need guaranteed availability, scalability, and detailed match data. Here, it is more convenient to rely on specialized solutions at the level of by the sports events API api-sport.ru, which combine a wide pool of tournaments, season history, and detailed match statistics in a unified format.

For example, even at the level of team statistics, you can collect important xG features: the number of shots on target, shots from the penalty area, «big chances,» threats from set pieces, etc. All of this is available through a single HTTP request to the API. A simple example of a request for matches by date using the Sport Events API might look like this:

fetch('https://api.api-sport.ru/v2/football/matches?date=2025-09-03', {
  headers: {
    'Authorization': 'YOUR_API_KEY'
  }
})
  .then(res => res.json())
  .then(data => {
    console.log('Всего матчей:', data.totalMatches);
    // здесь вы можете отфильтровать нужные турниры и дальше забирать детали по каждому матчу
  });

Based on such responses, you form a raw dataset: matches, tournament, teams, score, basic statistics. You can then enrich it with events (goals, shots, cards) and build your xG model on top of this layer. The key advantage of the API approach is that the data comes in a structured form and is ready for direct loading into storage without manual tagging and HTML parsing.

How to choose and connect a sports events API for xG calculation

When choosing a sports events API for xG tasks, several criteria are important: the depth of match statistics, tournament coverage, stability of operation, as well as the availability of historical data and live updates. For xG cases, fields with advanced statistics on shots and moments are especially useful. In the Sport Events API, this data is available through the entity matchStatistics, which can be obtained when requesting a list of matches or detailed information about a specific game.

The algorithm for selection and connection usually looks like this: first, you test the functionality on several leagues, evaluate the structure of responses and the completeness of statistics, then you arrange access and set up automatic loading. In the case of the service personal account API-Sport allows you to quickly obtain an API key and manage it (restrictions, usage statistics, key change). Then the key is simply passed in the header Authorization with each request.

Below is an example of how to get detailed information about a match by its ID, including events and statistics, which can then be used in xG calculations:

const matchId = 14570728;
fetch(`https://api.api-sport.ru/v2/football/matches/${matchId}`, {
  headers: {
    'Authorization': 'YOUR_API_KEY'
  }
})
  .then(res => res.json())
  .then(match => {
    const stats = match.matchStatistics;
    const odds = match.oddsBase; // коэффициенты букмекеров для беттинг-моделей
    const events = match.liveEvents; // голы, карточки, другие события
    console.log('Статистика матча для xG-модели:', stats);
  });

It is important that in the same response you receive not only statistics (the basis for xG) but also bookmaker odds through oddsBase. This allows you to build hybrid models that compare the «fair» score by xG with market estimates and line dynamics.

How to build an xG model based on match events from open data

The xG model is essentially an algorithm that outputs the probability of a goal based on a set of features of the moment (shot). Ideally, the input includes the coordinates of the shot, type of pass, body part, situation (play/set piece), pressure from defenders, and other detailed features. If you have such a level of event data from open sources, you can train a logistic regression model, gradient boosting, or neural network architecture. However, even without coordinates, you can build a useful team xG estimate at the level of aggregated statistics available through the Sport Events API.

For example, in matchStatistics there are groups Shots, Attack, Match overview, containing features that strongly correlate with expected goals: totalShotsOnGoal, shotsOnGoal, totalShotsInsideBox, bigChanceCreated, bigChanceScored, hitWoodwork and others. On the historical dataset of matches, you can adjust weights for these metrics and obtain an approximation of team xG without tracking data. This approach already allows for building team rankings and comparing the quality of play over time.

An example of a simplified calculation of team xG in Python based on match statistics obtained through the API:

import requests
API_KEY = 'YOUR_API_KEY'
MATCH_ID = 14570728
resp = requests.get(
    f'https://api.api-sport.ru/v2/football/matches/{MATCH_ID}',
    headers={'Authorization': API_KEY}
)
match = resp.json()
stats = match['matchStatistics']
all_period = next(p for p in stats if p['period'] == 'ALL')
shots_group = next(g for g in all_period['groups'] if g['groupName'] == 'Shots')
values = {item['key']: (item['homeValue'], item['awayValue']) for item in shots_group['statisticsItems']}
# Примитивная линейная модель xG на агрегатах
weights = {
    'totalShotsInsideBox': 0.10,
    'shotsOnGoal': 0.08,
    'bigChanceCreated': 0.30,
}
def calc_team_xg(side_index):
    xg = 0.0
    for key, w in weights.items():
        if key in values:
            xg += values[key][side_index] * w
    return xg
home_xg = calc_team_xg(0)
away_xg = calc_team_xg(1)
print('Ожидаемые голы (упрощенная модель):', home_xg, away_xg)

Such an example is far from advanced academic models, but it demonstrates the approach itself: you use stable aggregates from the API and adjust weights based on historical data. As the service evolves, you can add additional features (for example, possession, number of penetrations into the final third, types of attacks), as well as connect AI models for more accurate probability predictions — including on top of new capabilities that appear in API providers.

Architecture of the xG analytics service: storage, processing, and updating data

A full-fledged xG service is not just a model, but also a data architecture. In a typical solution, several layers are distinguished: a data ingestion layer from external APIs, a storage layer (OLTP/OLAP), a calculation layer (batch and near-real-time), and a delivery layer (your public or internal API). Historical matches are loaded in batches, while new games are scheduled or in near-live mode, so users can see current xG graphs during the match.

The Sport Events API is convenient to use as a single source of data. For history, you go through seasons and tournaments, for live — you regularly poll the endpoints /v2/football/matches with status inprogress and then you take the details for the matches that interest you. In the near future, WebSocket subscriptions are actively appearing in the infrastructures of such providers, including api-sport.ru, which allows you to abandon frequent polling and receive events in «push» mode — this is especially important for live xG graphs and betting scenarios.

Below is an example of a simple «worker» in Node.js that regularly updates data on live matches and writes it to your storage (conditionally designated function saveMatch):

const API_KEY = 'YOUR_API_KEY';
async function loadLiveMatches() {
  const res = await fetch('https://api.api-sport.ru/v2/football/matches?status=inprogress', {
    headers: { 'Authorization': API_KEY }
  });
  const data = await res.json();
  for (const match of data.matches) {
    // здесь можно дополнительно забрать события или статистику, если нужно
    await saveMatch(match); // сохранение в БД
  }
}
// простой планировщик: обновляем каждые 30 секунд
setInterval(loadLiveMatches, 30_000);

In a real architecture, you would add a task queue, retries on errors, caching, and a separate circuit for historical loading. A separate important layer is xG calculations: the model can be run both periodically (batch for completed matches) and in near-live mode (calculation after each statistics update). As WebSocket channels and AI tools appear in third-party APIs, you will be able to further simplify this circuit by offloading part of the logic to streaming processing.

How to create a public API for the xG service and integrate it with the website and applications

When xG calculations are already being performed regularly and the data is stored, the next step is to create your own public xG service API. Usually, several main endpoints are implemented: xG by match (team-level and player-level), aggregates by tournaments and seasons, team and player rankings, as well as timelines for building graphs during the game. Such an API should be simple and predictable: stable URLs, clear response schema, convenient authentication (API keys or OAuth), and request limits.

The public xG API integrates well with both websites and mobile applications, analytical dashboards, and betting platforms. On the backend side, the xG API can combine data from its own database with up-to-date information received in real-time from the sports API api-sport.ru — for example, mixing in live odds from bookmakers from oddsBase or updating the match status. Below is a simplified example of a route in Node.js (Express) that returns xG by match along with basic info about the game itself:

const express = require('express');
const fetch = require('node-fetch');
const app = express();
const API_KEY = 'YOUR_API_KEY';
app.get('/api/xg/match/:id', async (req, res) => {
  const matchId = req.params.id;
  // 1. Берем предрасчитанный xG из своей БД
  const xgData = await loadXgFromDb(matchId);
  // 2. Подмешиваем актуальную информацию о матче из Sport Events API
  const matchRes = await fetch(`https://api.api-sport.ru/v2/football/matches/${matchId}`, {
    headers: { 'Authorization': API_KEY }
  });
  const match = await matchRes.json();
  res.json({
    matchId,
    matchInfo: {
      status: match.status,
      score: {
        home: match.homeScore.current,
        away: match.awayScore.current
      }
    },
    xg: xgData
  });
});

Such an abstraction layer allows not to disclose to external users the details of integration with the data source and the structure of internal tables. The website or application simply calls your xG-API, receives a unified response, and builds the visualization on the frontend. For a commercial product, you can add billing, usage analytics, and different access tiers (for example, only pre-match xG or access to live charts and bookmaker APIs on top of your own xG).

Visualization of xG statistics: dashboards, graphs, and reports for users

The strong point of any xG service is visualization. Users care not only about numbers but also about visual forms: xG graphs by match minutes, comparative bar charts of teams, shot maps, ranking tables, and report PDF/HTML reports. Based on your public xG-API and data obtained from the Sport Events API, interactive dashboards can be built: users select a tournament, match, team, or player and receive a visual story of game quality.

The frontend usually retrieves aggregated data from your xG-API and renders it through visualization libraries (Chart.js, D3.js, ECharts, etc.). In live mode, data is updated via WebSocket or through periodic requests. It is important to think through the response format so that it is convenient for you to build both classic xG graphs and advanced representations — for example, combining xG with the dynamics of bookmaker odds or combining xG with a possession map by zones.

An example of a simple frontend request to your xG endpoint to build a graph of expected goals by time slots:

async function loadMatchXgTimeline(matchId) {
  const res = await fetch(`/api/xg/match/${matchId}/timeline`);
  const data = await res.json();
  // data = { points: [{ minute: 5, homeXg: 0.1, awayXg: 0 }, ...] }
  const labels = data.points.map(p => p.minute);
  const homeSeries = data.points.map(p => p.homeXg);
  const awaySeries = data.points.map(p => p.awayXg);
  renderXgChart(labels, homeSeries, awaySeries); // функция отрисовки графика на Canvas или SVG
}

For commercial users (clubs, media, betting companies), it is also important to be able to generate automatic reports: after each round or match, your service can generate ready-made reviews with key xG metrics, graphs, and textual conclusions. Based on the same API data and AI tools, it is possible to automate the writing of brief analytical texts: «Team A deserved to win by xG, creating 2.4 expected goals against 0.9 for the opponent,» which further increases the value of your product.