docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    ELO Rating System

    In adversarial games, the cumulative environment reward may not be a meaningful metric by which to track learning progress.

    This is because the cumulative reward is entirely dependent on the skill of the opponent.

    An agent at a particular skill level will get more or less reward against a worse or better agent, respectively.

    Instead, it's better to use ELO rating system, a method to calculate the relative skill level between two players in a zero-sum game.

    If the training performs correctly, this value should steadily increase.

    What is a zero-sum game?

    A zero-sum game is a game where each player's gain or loss of utility is exactly balanced by the gain or loss of the utility of the opponent.

    Simply explained, we face a zero-sum game when one agent gets +1.0, its opponent gets -1.0 reward.

    For instance, Tennis is a zero-sum game: if you win the point you get +1.0 and your opponent gets -1.0 reward.

    How works the ELO Rating System

    • Each player has an initial ELO score. It's defined in the initial_elo trainer config hyperparameter.

    • The difference in rating between the two players serves as the predictor of the outcomes of a match.

    Example Elo For instance, if player A has an Elo score of 2100 and player B has an ELO score of 1800 the chance that player A wins is 85% against 15% for player b.

    • We calculate the expected score of each player using this formula:

    Elo Expected Score Formula

    • At the end of the game, based on the outcome we update the player’s actual Elo score, we use a linear adjustment proportional to the amount by which the player over-performed or under-performed. The winning player takes points from the losing one:

      • If the higher-rated player wins → a few points will be taken from the lower-rated player.
      • If the lower-rated player wins → a lot of points will be taken from the high-rated player.
      • If it’s a draw → the lower-rated player gains a few points from higher.
    • We update players rating using this formula:

    Elo Score Update Formula

    The Tennis example

    • We start to train our agents.
    • Both of them have the same skills. So ELO score for each of them that we defined using parameter initial_elo = 1200.0.

    We calculate the expected score E: Ea = 0.5 and Eb = 0.5

    So it means that each player has 50% chances of winning the point.

    If A wins, the new rating R would be:

    Ra = 1200 + 16 * (1 - 0.5) → 1208

    Rb = 1200 + 16 * (0 - 0.5) → 1192

    Player A has now an ELO score of 1208 and Player B an ELO score of 1192. Therefore, Player A is now a little bit better than Player B.

    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)