Как построить ML-модель, определяющую «момент перелома» в матче?

How to build an ML model that determines the «turning point» in a match?

Contenidos

What is the «turning point» in a match and how to formalize it for an ML model
Which sports APIs to use for obtaining real-time match data
What data and metrics to collect through APIs to determine the turning point
How to prepare the dataset and features for the ML model based on sports API data
How to train and test the ML model for determining the turning point in a match
How to integrate the ML model with sports event APIs and use it in real services

What is the «turning point» in a match and how to formalize it for an ML model

«The »turning point” in a match is a period of time after which the probability of one team winning changes sharply. In a football match, this could be a goal in the 88th minute, the sending off of a key player, or a series of dangerous attacks. In basketball, it could be a long streak of scoring possessions, in tennis — a key break. For the ML model to automatically find such moments, they need to be not just described in words, but formalized in the form of metrics and labels on the match timeline.

In practice, this is done through time series analysis: we consider the match as a sequence of states (timesteps), where each moment corresponds to a set of features (score, statistics, events, bookmaker odds, etc.). A «turning point» can be defined as a moment after which the assessment of the teams’ chances changes significantly: for example, the conditional probability of the home team’s victory, calculated based on bookmaker odds or our own model, jumps more than a specified threshold (for example, by 15–20% over a short period of time). Another approach is to look at sharp changes in the intensity of dangerous events (xG, shots, dangerous attacks, possession).

From a machine learning perspective, the task can be conveniently formulated as a binary or multi-class classification problem over time windows. For each timestep or time window (for example, 1–5 minutes), we calculate features and assign a label: «before the turning point,» «turning point,» «after the turning point,» or 0/1 «turning point / no turning point.» The scheme is similar for different sports, only the composition of features changes. The rich stream of live data on matches and bookmaker markets provided by APIs like api-sport.ru allows for precise formalization and relies not on intuition, but on objective statistical signals.

Which sports APIs to use for obtaining real-time match data

To build an ML model for the turning point, a stable source of structured data on matches in near real-time is needed. On the platform por el API de eventos deportivos api-sport.ru Unified endpoints for football, hockey, basketball, tennis, table tennis, and esports are available. The basic structure is the same for all sports: you receive a list of matches, detailed information for each match, live events, extended statistics, and bookmaker odds in one response. This is convenient when building a unified ML platform for different disciplines.

The main data for online match analytics is available through the endpoint /v2/{sportSlug}/partidos and the method for obtaining a specific match /v2/{sportSlug}/matches/{matchId}. The filter by status status=inprogress allows you to select only current live games, while the fields minutoDelPartidoActual, eventosEnVivo, estadísticasDelPartido, oddsBase provide access to the minute of the match, event chronology, detailed statistics, and betting markets, respectively. During the model training phase, you can export the complete history of matches by dates and tournaments, and in production use — regularly query only current live games or, in the future, switch to a WebSocket subscription.

Below is an example of a request in Python that retrieves all current football matches and selects key fields for future feature engineering of the model:

import requests
API_KEY = "ВАШ_API_KEY"
BASE_URL = "https://api.api-sport.ru/v2/football/matches"
params = {
    "status": "inprogress"  # только матчи, идущие прямо сейчас
}
headers = {
    "Authorization": API_KEY
}
response = requests.get(BASE_URL, params=params, headers=headers)
data = response.json()
for match in data.get("matches", []):
    match_id = match["id"]
    minute = match.get("currentMatchMinute")
    score_home = match["homeScore"]["current"]
    score_away = match["awayScore"]["current"]
    odds = match.get("oddsBase", [])
    print(match_id, minute, score_home, score_away, len(odds), "markets")

Such integration through REST already allows building a reliable data collection pipeline. As the service api-sport.ru develops, WebSocket channels and built-in AI services will be available, simplifying online detection of turning points without constant polling of the API.

What data and metrics to collect through APIs to determine the turning point

For a robust determination of the turning point in a match, it is important to gather as many signals as possible that reflect the change in the balance of power. In the API responses for the endpoints /v2/{sportSlug}/partidos и /v2/{sportSlug}/matches/{matchId} you receive several levels of data: basic information (score, status, minute), detailed live events (eventosEnVivo), extended statistics (estadísticasDelPartido) and bookmaker odds (oddsBase). In aggregate, this forms a complete description of the match dynamics, based on which the ML model can «see» the turning point, even if it does not coincide with a specific goal or red card.

Key features include: score and time dynamics (changes in puntajeLocal, puntajeVisitante, minutoDelPartidoActual), events from eventosEnVivo by types (goals, cards, substitutions, penalties), as well as grouped statistical indicators of the team from estadísticasDelPartido — ball possession, shots on goal, shots on target, dangerous moments, interceptions, duels, and other metrics. Additionally, it is useful to track bookmaker odds in oddsBase: a sharp shift in the odds for one side to win or a change in the totals line often precedes a visually noticeable turning point on the field and can serve as a strong early indicator.

Below is an example of obtaining detailed information about a specific match with a focus on events and odds. Such a request is convenient to use in offline scripts for building a dataset. The API key can be obtained in the personal account app.api-sport.ru.

import requests
API_KEY = "ВАШ_API_KEY"
SPORT = "football"
MATCH_ID = 14570728
match_url = f"https://api.api-sport.ru/v2/{SPORT}/matches/{MATCH_ID}"
events_url = f"https://api.api-sport.ru/v2/{SPORT}/matches/{MATCH_ID}/events"
headers = {"Authorization": API_KEY}
match = requests.get(match_url, headers=headers).json()
events = requests.get(events_url, headers=headers).json()
# Пример извлечения некоторых метрик
m = match
minute = m.get("currentMatchMinute")
score_home = m["homeScore"]["current"]
score_away = m["awayScore"]["current"]
odds_markets = m.get("oddsBase", [])
print("Minute:", minute, "Score:", score_home, ":", score_away)
print("Odds markets:", [om["group"] + "-" + om["name"] for om in odds_markets])
print("Total events:", events.get("totalEvents"))

Based on such a dataset, you will be able to collect both raw time series by minutes and aggregated metrics over time segments (rolling number of shots, dynamics of odds, series of dangerous attacks). The richer and more diverse the feature set, the more accurately the model will capture subtle turning points in the game across different sports.

How to prepare the dataset and features for the ML model based on sports API data

Preparing a quality dataset is a critical step in building the ML model for the turning point. First, it is necessary to export historical matches through the endpoint /v2/{sportSlug}/partidos with filtering by dates, tournaments, and status completado. Next, for each match, it makes sense to additionally request events (/matches/{matchId}/events) and, if necessary, information on seasons and tournaments to enrich the data with context (playoff stage, match importance). All responses need to be brought to a unified timeline: for example, create discretization by minutes or by fixed time windows (1, 3, or 5 minutes).

At each step of the time series, you form a feature vector: current score, score difference, match time, statistics (shots, possession, fouls, etc.), accumulated values for the last N minutes (rolling features), as well as shifts and derivatives of bookmaker odds (for example, the change in the odds for the home team’s victory over the last 5 minutes). The target label can be defined in two main ways: either manually marking turning points based on historical data (analysts mark key episodes), or automatically based on rules related to a sharp jump in the conditional probability of the team’s victory taken from the odds. oddsBase.

Below is a simplified example of how to turn a list of match events into a tabular format by minutes, suitable for training a model. In a real project, a separate ETL pipeline should be built for this block, which will regularly fetch new data from api-sport.ru and update the datasets.

import pandas as pd
# Предполагается, что у нас уже есть match_json и events_json из API
minutes = list(range(1, 91))  # пример для футбола, 90 минут
rows = []
for minute in minutes:
    row = {
        "minute": minute,
        "score_home": 0,
        "score_away": 0,
        "shots_home": 0,
        "shots_away": 0,
        "yellow_cards_home": 0,
        "yellow_cards_away": 0,
        # ... другие признаки
    }
    # здесь вы бы прошлись по событиям и статистике и заполнили значения
    rows.append(row)
features_df = pd.DataFrame(rows)
print(features_df.head())

After forming tables with features, you can proceed to the standard ML workflow: splitting into train/validation/test by matches or seasons, normalizing numerical features, encoding categorical ones (tournament, stage, sport type), class balancing. It is important to watch for target variable leakage: features should not contain information from the future regarding the moment in time for which you are predicting the presence of a turning point.

How to train and test the ML model for determining the turning point in a match

When the dataset and features are prepared, you can proceed to the selection and training of the ML model. For the first version of the system, gradient boosting models on tabular data (CatBoost, XGBoost, LightGBM) are often sufficient, as they work well with heterogeneous features and are relatively robust to noise. The task is formulated as classification: for each timestep or time window, predict the probability that this is a «turning moment» (class 1) or a regular course of the match (class 0). At the same time, different types of turning points can be introduced: a turning point in favor of the home team, the away team, a general turning point in game intensity.

In terms of quality metrics, it is appropriate to use ROC-AUC, PR-AUC, F1 measure, and, importantly, metrics at the match level: the share of matches in which the model found a turning point within a reasonable window around the actual event. Be sure to conduct temporal splitting: matches from recent seasons should be included in the test to check how the model transfers to new data. It also makes sense to adjust probability calibration so that the estimated model chances of a turning point align with actual frequencies.

Below is an example of a conditional piece of code that trains a simple model on prepared features and saves it for subsequent online inference on data obtained from the API:

from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import joblib
# features_df содержит признаки, target_series — метки перелома
X_train, X_test, y_train, y_test = train_test_split(
    features_df, target_series, test_size=0.2, shuffle=False  # временное разбиение можно делать и сложнее
)
model = XGBClassifier(
    n_estimators=300,
    max_depth=6,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
)
model.fit(X_train, y_train)
print("Train score:", model.score(X_train, y_train))
print("Test score:", model.score(X_test, y_test))
joblib.dump(model, "momentum_break_model.xgb")

After the initial calibration of the model, it is useful to conduct A/B testing in the real product: compare user behavior, alert accuracy, bet quality, or content recommendations before and after the implementation of break point predictions. This will not only improve the ML part but also optimize the integration with the business logic of your service.

How to integrate the ML model with sports event APIs and use it in real services

After training the model, the next step is to integrate it into the production system, which in real-time receives match data from the sports API and requests predictions at the right moments. A typical architecture looks like this: a microservice or background worker periodically requests live matches through /v2/{sportSlug}/matches?status=inprogress, updates the internal storage of match states, forms features in the required format, runs them through the saved ML model, and issues a signal about the occurrence of a break point. In the future, switching to a WebSocket channel from api-sport.ru will allow receiving updates without polling and further reduce the delay between the real event and the alert.

Integration is especially valuable for betting projects and analytical platforms: when the probability of a break point changes, matches can be promptly highlighted to the user, priorities for output can be rearranged, and notifications and betting tips can be automatically generated. The presence of a block in the API oddsBase with bookmaker market odds allows combining your own predictions with market estimates and creating complex risk management strategies. All this can be implemented based on a single data source api-sport.ru — API for sports events and odds, without spending resources on parsing websites and supporting dozens of providers.

Below is an example of a minimalist service in Python that periodically polls the API, forms a simple set of features, and accesses a locally saved model. In a real project, it is advisable to separate the model into a separate service (REST/gRPC), add caching, and an alert system.

import time
import requests
import joblib
API_KEY = "ВАШ_API_KEY"
SPORT = "football"
BASE_URL = f"https://api.api-sport.ru/v2/{SPORT}/matches"
MODEL = joblib.load("momentum_break_model.xgb")
headers = {"Authorization": API_KEY}
while True:
    params = {"status": "inprogress"}
    matches = requests.get(BASE_URL, params=params, headers=headers).json().get("matches", [])
    for m in matches:
        # здесь должен быть ваш код подготовки признаков features_row из m
        # features_row = prepare_features(m)
        # prob = MODEL.predict_proba(features_row)[0, 1]
        # if prob > 0.8:
        #     send_alert(m["id"], prob)
        pass
    time.sleep(30)  # период опроса, при WebSocket это не понадобится

This approach allows gradually increasing functionality: first implement basic REST inference, then switch to streaming processing, add support for new sports, and more accurate models (recurrent/transformer architectures). The flexible structure of endpoints and data extensibility in the API make the platform api-sport.ru a convenient foundation for any solutions at the intersection of sports, betting, and machine learning.