Как использовать ML, чтобы выявлять опасные моменты автоматически?

How to use ML to automatically identify dangerous moments?

Contents

What is machine learning for analyzing sports events and dangerous moments
What data is needed to identify dangerous moments in sports through the sports events API
How to use the sports events API for automatic detection of dangerous moments
How to build a machine learning model to identify dangerous moments in sports matches
Examples of integrating an ML model with the sports events API in Python (REST, Webhook)
How to evaluate the accuracy of the model and reduce false positives when searching for dangerous moments

What is machine learning for analyzing sports events and dangerous moments

Machine learning in sports allows for the automatic identification and marking of the most intense episodes of a match: one-on-one situations, powerful attacks, moments with a high probability of a goal, dangerous shots in hockey, or decisive fights in esports. Instead of manually reviewing hours of broadcasts, the algorithm is trained on historical data and learns to recognize patterns that precede a dangerous moment.

The key feature of the approach is that the model does not rely on the subjective opinion of an expert but extracts patterns from large volumes of statistics: shots on goal, ball possession, number of duels, bookmaker odds, sequence of game events. Thanks to the structured data provided by the sports events API, complex ML models can be built while quickly launching MVP solutions.

The platform by the sports events API api-sport.ru provides unified access to data on football, basketball, tennis, table tennis, hockey, esports, and other sports. Through the REST API, you get matches, events, advanced statistics, bookmaker odds, and video highlights that can be used for training and operating models that automatically identify dangerous moments in real time.

Example: getting a list of available sports for ML analytics

import requests
API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.api-sport.ru'
headers = {
    'Authorization': API_KEY,
}
response = requests.get(f'{BASE_URL}/v2/sport', headers=headers, timeout=5)
response.raise_for_status()
for sport in response.json():
    print(sport['id'], sport['slug'], sport['translations']['ru'])

Thus, you can programmatically list all supported sports and build your own ML models for detecting dangerous moments for each, taking into account the specifics of the rules and dynamics of the game.

What data is needed to identify dangerous moments in sports through the sports events API

For the machine learning model to reliably identify dangerous moments, it needs at least three groups of data: a chronicle of game events, detailed match statistics, and context in the form of team lineups and bookmaker odds. All these types of data are available in the API: matches, events, extended statistics by keys such as ballPossession, totalShotsOnGoal, bigChanceCreated, as well as oddsBase and video highlights.

The basis of the dataset consists of historical matches obtained through endpoints. /v2/{sportSlug}/matches и /v2/{sportSlug}/matches/{matchId}. The responses contain arrays of liveEvents with a chronology of goals, cards, substitutions, and other episodes, as well as matchStatistics with aggregated statistics by periods. For football matches, for example, you can extract shots from the penalty area, accurate passes in the final third, won duels, and goalkeeper saves, and then link them to the occurrence of a dangerous moment or goal.

Additionally, for advanced models, bookmaker odds from the oddsBase field are used. Sharp changes in quotes often reflect changes in the balance of power on the field, so including these features helps improve the quality of predicting dangerous episodes. For calibration and model validation, links to video highlights are useful, allowing for a quick check of whether the moments marked by the algorithm indeed appear dangerous subjectively.

Example: exporting completed matches with extended statistics for training.

import requests
API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.api-sport.ru'
headers = {'Authorization': API_KEY}
params = {
    'date': '2025-09-03',
    'status': 'finished',
}
resp = requests.get(f'{BASE_URL}/v2/football/matches', headers=headers, params=params, timeout=10)
resp.raise_for_status()
matches = resp.json()['matches']
for match in matches:
    stats = match.get('matchStatistics', [])
    odds = match.get('oddsBase', [])
    print(match['id'], match['homeTeam']['name'], '-', match['awayTeam']['name'])
    print('Статистика:', len(stats), 'блоков, рынков ставок:', len(odds))

Based on such samples, a training dataset for the ML model is built: for each match and time interval, features are formed based on statistics and events, and the target label reflects the presence or absence of a dangerous moment.

How to use the sports events API for automatic detection of dangerous moments

After the model is trained, the next step is to connect it to streams of current data. To do this, it is enough to receive current matches with the status inprogress, regularly update their data from the API, and at each step pass fresh feature values to the ML engine. The endpoints /v2/{sportSlug}/matches и /v2/{sportSlug}/matches/{matchId} contain fields currentMatchMinute, liveEvents, matchStatistics, and oddsBase, which are sufficient for making real-time decisions.

A typical workflow scenario: your service requests a list of all live matches on a schedule, selects the tournaments or teams of interest, and then pulls detailed statistics by match IDs. This data is transformed into a set of features, fed into the trained ML model, which returns the probability that the current phase of the attack is a dangerous moment. If the probability is above the threshold value, you instantly create a trigger for a push notification, a mark in the interface, or automatic video clipping.

The platform api-sport.ru already today provides a stable REST API, and in the near future, it is planned to add WebSocket for receiving match updates with minimal delay and additional AI services. This will allow for even tighter integration of machine learning models with data and processing events almost instantly, without frequent polling of the API.

Example: monitoring live matches and preparing data for the model

import requests
API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.api-sport.ru'
headers = {'Authorization': API_KEY}
params = {
    'status': 'inprogress',
}
resp = requests.get(f'{BASE_URL}/v2/football/matches', headers=headers, params=params, timeout=10)
resp.raise_for_status()
for match in resp.json()['matches']:
    match_id = match['id']
    detail = requests.get(
        f'{BASE_URL}/v2/football/matches/{match_id}',
        headers=headers,
        timeout=10,
    ).json()
    minute = detail.get('currentMatchMinute')
    stats = detail.get('matchStatistics', [])
    events = detail.get('liveEvents', [])
    # Здесь вы формируете вектор признаков и вызываете свою ML‑модель
    print(f'Матч {match_id}, минута {minute}, событий {len(events)}')

This approach easily scales to different sports and tournaments, as the structure of API responses is unified and allows for the reuse of a large part of the integration code.

How to build a machine learning model to identify dangerous moments in sports matches

Building an ML model begins with a formal definition of the term «dangerous moment.» In football, this can be a shot from within the penalty area with few defenders, a one-on-one situation, or an episode preceding a goal within a few minutes. In hockey — a series of shots from the point, in esports — a mass team fight around a key objective. This definition turns into a target label that you calculate based on historical events and statistics from the API.

Then, a training dataset is formed: for each match, time slices are created (for example, every 30 seconds or 1 minute), and features are calculated based on them. Among them may be ball possession for the last N minutes, the number of shots and accurate passes, the number of dangerous attacks, changes in bookmaker odds, the number of fouls and cards. A classification model is trained on this dataset: logistic regression, gradient boosting, random forest, or neural network, depending on the volume of data and performance requirements.

It is important to divide the data into training, validation, and test parts based on a temporal principle to avoid information leakage from the future. For cross-validation in sports, a sliding window across seasons is often used: the model is trained on several past seasons and tested on more recent tournaments. This approach ensures that the algorithm will be resilient to changes in team tactics and championship dynamics.

Example: preparing a dataset for the model based on API data

import requests
import pandas as pd
API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.api-sport.ru'
headers = {'Authorization': API_KEY}
# Условно выбираем несколько завершённых матчей по ID
match_ids = [14570728, 14586240]
rows = []
for match_id in match_ids:
    detail = requests.get(
        f'{BASE_URL}/v2/football/matches/{match_id}',
        headers=headers,
        timeout=10,
    ).json()
    stats_blocks = detail.get('matchStatistics', [])
    # Пример извлечения нескольких признаков из сводной статистики
    for period_block in stats_blocks:
        period = period_block['period']
        for group in period_block['groups']:
            for item in group['statisticsItems']:
                if item['key'] in ['ballPossession', 'totalShotsOnGoal']:
                    rows.append({
                        'match_id': match_id,
                        'period': period,
                        'metric': item['key'],
                        'home_value': item['homeValue'],
                        'away_value': item['awayValue'],
                        # Целевая метка: здесь условный пример, в реальности
                        # её стоит строить по событиям гола/хайлайтам
                        'label_dangerous': 0,
                    })
df = pd.DataFrame(rows)
print(df.head())

Based on such a dataframe, you add a correctly calculated target variable and proceed to train the model in any ML framework: scikit-learn, XGBoost, LightGBM, or PyTorch, depending on the chosen architecture.

Examples of integrating an ML model with the sports events API in Python (REST, Webhook)

The integration of the machine learning model with the sports events API can be implemented either as periodic REST polling or through your own Webhook layer. In the first case, a separate service requests updates on the matches of interest every few seconds, runs the data through the model, and saves the results to the database or sends them to the interface. In the second case, you can build a microservice that will be called from your backend every time the match state changes, and at that moment make a call to the API and the ML engine.

Since api-sport.ru provides a flexible REST API and is preparing to launch WebSocket subscriptions and additional AI features, developers can gradually evolve the architecture: starting with a simple cron script, then moving to message queues and reactive updates. You can obtain a personal API key in the developer’s cabinet at the link the personal account api-sport.ru and use it in the Authorization headers to authorize all requests for data.

Below is a simplified example of a Python service on Flask that acts as a Webhook handler: it receives a notification from your application that the danger moment for a specific match needs to be recalculated, pulls fresh data from the API, and calls a local model prediction function.

Example: Flask Webhook that calls the ML model when a match is updated

from flask import Flask, request, jsonify
import requests
API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.api-sport.ru'
headers = {'Authorization': API_KEY}
app = Flask(__name__)

def predict_danger(features: dict) -> float:
    # Заглушка: здесь загружается и вызывается ваша обученная модель
    # Например, через pickle или ML‑сервис
    return 0.78

@app.route('/webhook/match-updated', methods=['POST'])
def match_updated():
    data = request.get_json(force=True)
    sport_slug = data.get('sport', 'football')
    match_id = data['match_id']
    detail = requests.get(
        f'{BASE_URL}/v2/{sport_slug}/matches/{match_id}',
        headers=headers,
        timeout=10,
    ).json()
    minute = detail.get('currentMatchMinute')
    stats = detail.get('matchStatistics', [])
    odds = detail.get('oddsBase', [])
    features = {
        'minute': minute,
        'stats': stats,
        'odds': odds,
    }
    prob = predict_danger(features)
    return jsonify({'match_id': match_id, 'danger_probability': prob})

if __name__ == '__main__':
    app.run(port=8000, debug=True)

This pattern is easily extensible: you can handle multiple sports, store predictions in a database, send them to the frontend or push system, and build a full-fledged analytics ecosystem based on data from Sports events API.

How to evaluate the accuracy of the model and reduce false positives when searching for dangerous moments

After training the model on API data, it is critically important to evaluate its quality on historical matches and understand how reliably it highlights truly interesting episodes. For tasks of finding dangerous moments, key metrics become recall, precision, and F1 score. High recall ensures that you almost never miss dangerous phases of the game, while high precision means that the user is not overwhelmed with too many false alarms.

Additional attention should be paid to class imbalance: dangerous moments occur significantly less frequently than regular game episodes. To combat this, stratified splitting, weighted loss functions, and oversampling techniques are used. To reduce the number of false positives, fine-tuning the probability threshold is practiced: the model can output a danger probability, and you can adjust the threshold value at which the number of false and missed moments will be acceptable for a specific scenario—from media clips to betting analytics.

Based on historical data from the API, backtesting is conveniently conducted: you reproduce the course of the match based on liveEvents and matchStatistics, apply the model step by step in time, and compare its decisions with the occurrence of a goal, a serious moment, or the addition of an episode to the official highlights. This approach provides a realistic assessment of the model’s behavior in conditions close to real-time.

Example: calculation of basic quality metrics of the model

from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
# y_true и y_pred_prob должны быть заранее сохранены по результатам бэктестинга
# y_true: 1 если эпизод был опасным, 0 иначе
# y_pred_prob: предсказанная моделью вероятность опасного момента
threshold = 0.7
y_true = np.array([0, 1, 0, 1, 1, 0])
y_pred_prob = np.array([0.1, 0.8, 0.3, 0.9, 0.4, 0.2])
y_pred = (y_pred_prob >= threshold).astype(int)
print('Confusion matrix:')
print(confusion_matrix(y_true, y_pred))
print('Отчёт по качеству:')
print(classification_report(y_true, y_pred, digits=3))

Regular review of metrics, updating the model with fresh seasons, and using increasingly rich data from the API (including bookmaker odds and new statistical fields) gradually reduce the number of false positives and achieve a stable increase in accuracy in finding truly dangerous moments.