How to train a model to predict the number of corners?

Factors influencing the number of corners in football for accurate prediction

The number of corners in a match is best described not by «luck» or randomness, but by a set of stable tactical and gameplay patterns. Teams that focus on wing attacks, with a high number of crosses and shots on goal, generate more moments that end in deflections and clearances over the goal line. In contrast, teams that carefully control the ball in the center of the field and rarely finish attacks with shots create fewer prerequisites for corners. Therefore, when building a model, it is important to translate tactical features into measurable indicators: shots, crosses, possession, number of entries into the final third, and other metrics that can be easily obtained through match statistics.

The scenario of the game significantly affects the number of corners. An outsider, having conceded an early goal, is forced to open up and often finishes attacks with shots or crosses under defensive pressure – this increases the likelihood of clearances for corners. A leader, defending a minimal advantage, may consciously sit back at their goal, which also increases the number of clearances and blocked shots. Additionally, the difference in the class of opponents, the home field factor, the specifics of the tournament, and the density of the schedule are important. All these factors are reflected in the numbers: indicators such as totalShotsOnGoal, totalShotsInsideBox, accurateCross are available in the Sport Events API., finalThirdEntries, ballPossession, as well as a separate metric угловые удары broken down by hosts and guests.

For an accurate prediction, the model must consider the entire context of the match, not just the average number of corners for the season. At the data level, this means that the sample should include the teams’ form over the last N matches, the typical playing style of the opponents, the tournament and stage of the season, attack and defense statistics, as well as the match status (home or away). With the correct formalization of factors into numerical features, it is possible to build robust models that predict both the total number of corners and the breakdown by teams and halves, relying on real data rather than intuition.

What data is needed for corner prediction and how to obtain statistics via API

To train the corner prediction model, a historical dataset of matches with detailed post-match statistics is required. The target variable usually used is the total number of corners (home + away), or corners for each team separately. In Sport Events API based on by the sports events API api-sport.ru this value is available in the dataset matchStatistics, where the metric with the key угловые удары contains numerical values домашняяСтоимость и выезднаяСтоимость. Thus, for any historical match, it is possible to obtain the exact number of corners and use it as a target for the machine learning model.

In addition to the target variable, it is important to gather the most informative set of features. For corner predictions, the following are well-suited: the number of shots on goal and off-target (totalShotsOnGoal, shotsOffGoal), shots from the penalty area (totalShotsInsideBox), ball possession (ballPossession), the number of crosses and corners (accurateCross are available in the Sport Events API.), entries into the final third (finalThirdEntries), the number of attacking and defensive duels. All these indicators are available in matchStatistics by period ALL, and if necessary — with a breakdown by halves. When combining several seasons for top leagues, a sample of thousands of matches is formed, which is sufficient for building stable models.

Additionally, market expectations can be taken into account by using odds and lines on total corners from the bookmakers’ API available through the infrastructure api-sport.ru. A model that sees not only «raw» statistics but also market assessment often provides more accurate predictions and helps identify inefficiencies in the odds. As a result, a complete dataset for corners consists of three levels of data: fact (actual corners and match statistics), team form and style (aggregated indicators from recent games), and market expectations (lines and odds), all of which can be automatically collected through HTTP requests to the API.

The best sports event APIs for corner statistics in football

When building models for predicting corners, it is important to rely on a reliable source of statistics rather than scraping websites or manual exports. A quality sports events API should provide broad league coverage, historical depth, and detailed post-match statistics, including the number of corners, shots, crosses, possession, fouls, and other metrics. Additionally, stability of availability, predictable data structure, and clear documentation are critical so that developers can easily integrate the service into their analytics systems, bots, and internal betting platforms.

Sport Events API from api-sport.ru combines all these requirements and is specifically aimed at developers and analysts. Through a single interface, matches for football and other sports can be obtained, filtered by date, tournament, season, or team, and immediately extracted matchStatistics with the indicator угловые удары. In addition to corners, the API returns dozens of advanced metrics on shots, passes, duels, and defensive actions, allowing for the construction of complex multidimensional models. Block oddsBase adds bookmaker odds for various markets, including totals, to match statistics and allows for market assessment when training the model.

An additional advantage of the Sport Events API is the availability of recommended tournaments defaultTournaments, which simplifies the start if you want to quickly gather a data corpus on popular leagues. Continuous development of the service is a key part of the product: in the near future, support for WebSocket for streaming live events and integration of AI tools is planned, which will help accelerate the development of your own models. For users, this means that once you set up integration with the API, you receive not only historical data on corners but also infrastructure for online analytics and hybrid solutions at the intersection of statistics and artificial intelligence.

Connecting to the API and extracting corner data: example requests

To start working with corner data, you need a personal API key. It can be obtained at your personal account at api-sport.ru after registration. The key is passed in the header Authorization with each request. The base URL for sports events — https://api.api-sport.ru, followed by the version and type of sport. For example, for football, the path is used /v2/football/matches, where you specify the date, tournaments, match statuses, and other filters through query parameters. In the response, you receive a list of matches with the object matchStatistics, from which the indicator is extracted угловые удары.

Below is a basic example in Python that requests completed football matches for a selected date and extracts the number of corners for each game:

import requests
API_KEY = "ВАШ_API_КЛЮЧ"
BASE_URL = "https://api.api-sport.ru/v2/football/matches"
params = {
    "date": "2025-09-03",   # дата матчей
    "status": "finished"    # нужны только завершённые игры
}
headers = {
    "Authorization": API_KEY
}
response = requests.get(BASE_URL, params=params, headers=headers)
response.raise_for_status()
data = response.json()
for match in data.get("matches", []):
    stats_periods = match.get("matchStatistics", [])
    all_period = next((p for p in stats_periods if p.get("period") == "ALL"), None)
    if not all_period:
        continue
    total_corners = None
    for group in all_period.get("groups", []):
        for item in group.get("statisticsItems", []):
            if item.get("key") == "cornerKicks":
                home_corners = item.get("homeValue", 0)
                away_corners = item.get("awayValue", 0)
                total_corners = home_corners + away_corners
    if total_corners is not None:
        print(f"Match ID {match['id']}: total corners = {total_corners}")

Similarly, matches can be filtered by tournament (tournament_id), season (season_id) or team (team_id). The resulting set of matches with the number of corners and extended statistics can be conveniently saved to a database or files for further processing. These data are then used to calculate aggregates (average corners over the last N matches, attacking activity indicators, etc.) and to form a training sample for the model, which will be discussed in the following sections.

Data preparation and feature selection for the corner prediction model

After exporting raw data from the API, the first step is cleaning and normalization. It is necessary to filter out matches without complete statistics, standardize date formats, tournament and team identifiers, and handle missing values in the metrics. Special attention should be paid to the stability of tournaments: for very low leagues, where statistics may be incomplete or irregular, it is better to either allocate a separate model or exclude them at the training stage. For each match, the target variable needs to be explicitly calculated, for example total_corners = home_corners + away_corners, where values are taken from the metric угловые удары period ALL.

Next, features at the team and match level are formed. In practice, aggregates over the last 5–10 games work well: average number of corners per match, average shots on goal and from the penalty area, frequency of crosses (accurateCross are available in the Sport Events API.), entries into the final third (finalThirdEntries), ball possession percentage (ballPossession), number of attacking duels. These indicators are calculated separately for home and away teams, and then additional derived features are added: the difference in average corners between opponents, the difference in shots and possession, indicators of home/away matches, and belonging to top leagues. As a result, each row of the dataset describes a match through dozens of numerical characteristics that reflect the teams’ style and attacking strength.

Below is an example of how to form basic features in pandas based on an already compiled list of matches, assuming that you have previously extracted numerical values of statistics from matchStatistics:

import pandas as pd
# matches_df содержит колонки:
# match_id, home_team_id, away_team_id, date, 
# home_corners, away_corners, home_shots, away_shots, ...
matches_df["total_corners"] = matches_df["home_corners"] + matches_df["away_corners"]
# сортируем по дате, чтобы считать скользящие средние корректно
matches_df = matches_df.sort_values("date")
# функция для расчёта агрегатов по команде
rolling_window = 10
for side in ["home", "away"]:
    team_col = f"{side}_team_id"
    for metric in ["corners", "shots", "shots_inside_box"]:
        col = f"{side}_{metric}"
        rolling_mean = matches_df.groupby(team_col)[col] \
            .rolling(rolling_window, min_periods=3).mean().reset_index(level=0, drop=True)
        matches_df[f"{side}_{metric}_avg_{rolling_window}"] = rolling_mean
# удаляем первые матчи, где мало истории
train_df = matches_df.dropna(subset=["home_corners_avg_10", "away_corners_avg_10"])

This approach allows transforming the stream of matches from the API into a structured training dataset, where each row contains both the actual corners and context: team form, attacking activity, and other factors. In the future, market features (lines and odds from bookmaker APIs), tournament indicators, and stages of the season can be added to this dataset, which will further improve the quality of the prediction.

How to train a model to predict the number of corners in Python using data from the API

Once the data is prepared and the features are formed, you can proceed to model training. In practice, predicting the number of corners is conveniently solved as a regression task: a vector of match features is input, and the model outputs the expected number of corners. This approach then allows easily converting the result into probabilities of exceeding various totals (8.5, 9.5, 10.5, etc.). Algorithms like gradient boosting or random forests, which work well with nonlinear dependencies and heterogeneous features without complex manual normalization, are suitable for training.

Below is an example of a basic pipeline in Python using scikit-learn. It is assumed that you already have a dataframe train_df with the target variable total_corners and a set of numerical features formed in the previous step:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
# список признаков (пример — используйте свои колонки)
FEATURES = [
    "home_corners_avg_10", "away_corners_avg_10",
    "home_shots_avg_10", "away_shots_avg_10",
    "home_shots_inside_box_avg_10", "away_shots_inside_box_avg_10",
    "home_possession_avg_10", "away_possession_avg_10",
    "home_crosses_avg_10", "away_crosses_avg_10",
]
X = train_df[FEATURES]
y = train_df["total_corners"]
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, shuffle=False  # для временных рядов лучше не перемешивать
)
model = RandomForestRegressor(
    n_estimators=400,
    max_depth=8,
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)
y_pred = model.predict(X_valid)
mae = mean_absolute_error(y_valid, y_pred)
rmse = mean_squared_error(y_valid, y_pred, squared=False)
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")

In practical use, it is convenient to build derived metrics on top of such a regression forecast. For example, you can assess the probability of exceeding a total of 9.5 corners by modeling the error distribution or using simulations, and at this level, compare your estimates with lines from bookmaker APIs. As new matches come in from the Sport Events API, the model should be regularly retrained on fresh data to account for changes in team styles, coaching changes, and other factors reflected in corner statistics.

Evaluating the quality of the corner prediction model and improving prediction accuracy

The model evaluation for predicting the number of corners should consider both the accuracy of the numerical prediction and its practical usefulness. For regression, metrics like MAE and RMSE are commonly used, which show the average error in absolute value and in «quadratic» scale. In the context of betting and risk management, it is important to understand how often the model makes mistakes by 1-2 corners and what the share of gross misses is. For time series data, it is preferable to apply splitting considering chronology (for example, time-based train/validation split) to avoid information leakage from the future and obtain an honest assessment of quality on yet unseen matches.

In addition to general metrics, it makes sense to analyze model quality by slices: by individual tournaments, types of teams (favorite/underdog), total ranges, and seasons. It often turns out that the model works well in top leagues, where the statistics are richer and more stable, but worse in lower divisions. In such cases, one can either train separate models by league clusters or add indicators of tournament level to the features and calibrate predictions. It is useful to study feature importance in decision trees and boosting: this helps to understand which statistics (shots, crosses, possession, recent form) contribute the most to corner predictions and where else the signal can be strengthened.

To further improve accuracy, several approaches can be used. First, add market information from bookmaker APIs to the model: lines and odds on total corners often contain concentrated market opinion and complement «raw» statistics well. Second, implement online model updates as new data comes in from the Sport Events API, and in the future — from the WebSocket stream when it becomes available in the api-sport.ru infrastructure. Third, consider more advanced algorithms (gradient boosting on decision trees, neural networks) and ensembles of several models. Combining quality corner data, accurate quality assessment, and thoughtful model improvements allows for gradually reducing error and building a prediction system resilient to changes in the football landscape.