1.4 Building a Competitor Model: Glicko Scores

In the last post, we reviewed setting win-loss odds and betting prices for trading card games. In that last segment, we were given the players’ win probabilities.

In this post, we’ll take an important step (but not the last one) in generating those player win probabilities.

We’ll use a ratings system, called Glicko/Glicko2, to rate each player in a game. These ratings give a value to each player’s skill, and can be used to compare one against another to generate win/loss probabilities between them.

Glicko/Glicko2 Ratings System

Competitive games like chess have long used the Elo rating system, developed by Arpad Elo, to rate the skill level of players.

The Elo system assigns a numerical value to each player at the start of their careers. For Elo, new players start off with a rating of 1000. Each time a player wins or loses against another player, the winning player’s Elo rating increases while the losing player’s Elo rating decreases. This allows changes over time in a player’s skill level to be tracked.

The scale of the Elo rating system is logarithmic, meaning that a player with a rating of 1600 isn’t 60% better than a player with a rating of 1000, but rather, ninety eight times better. This logarithmic scale also has the advantage that if a weaker player beats a stronger player, the weaker player’s rating rises faster than beating a player of similar skill while if a stronger player beats a weaker player, the stronger player’s rating rises slower than beating a player of similar skill.

Mark Glickman, Fellow of the American Statistical Association and Senior Lecturer on Statistics at the Harvard University Department of Statistics (among other accolades), invented a new spin on the traditional Elo system called Glicko (and later, Glicko2).

These systems make several improvements on the Elo methodology that are key for trading card games:

Glicko/Glicko2 incorporates a rating “deviation” (σ). Player deviations begin at 350 and change based on how often a player plays. This helps us measure our certainty about how accurate a player’s rating is as well as allows for a sort of “rust” factor to occur, whereby players who don’t play for a long time have their deviations drift back to that of a new player (at the σ=350 level).
With the deviations in mind, ratings are typified as credible intervals. For instance, a player with a rating of 1750 and σ=150 has a credible rating of ~1600 to 1900. The more the player plays, the more the deviation shrinks and the narrower and narrower the credible interval becomes (and thus, the greater and greater our certainty about that rating).
The deviations are not just window dressing: they are an integral part of predicting the win/loss probabilities between two competitors. A player with a lower σ is predicted to win more reliably than one with a higher σ, all else being equal.
Glicko uses a constant variable (“c” or the “c-value”) which limits how much a player’s rating can change due to the outcome of a single match; this limits wild fluctuations due to upset wins or losses, as well as sets how quickly (or slowly) a player without play has his or her deviation drift back down to its initial level (350).
Glicko2 takes a “volatility” variable (τ) into account. This variable rates how much luck is a factor in the game at hand, and helps regulate probabilities between players with different levels of volatility. Similar to σ, one competitor with a high τ, but similar skill, with be predicted to perform worse against another player with lower τ, all else being equal. Luck is accounted for.

For these reasons, we’ll use the Glicko2 system for rating competitors.

You can find an excellent paper on the Glicko System by Mark Glickman here, as well as his follow-up for Glicko2, here.

To apply the Glicko2 system, we’ll generate some synthetic (made up) player match data and use the PlayerRatings package in R, which incorporates the Glicko2 system in its functions.

Synthetic Tournament Data

A .csv file was set up to represent a tournament season.

The file has four fields:

Period (numbered 1 through 12)
Player (numbered 1 through 500).
Opponent (numbered 1 through 500).
Result (0, 0.5, or 1).

Each “Period” represents a single month during the tournament season (Period 1 = January, Period 2 = February, etc.). We assume that the tournament season runs from January to December of a single year.

Players and Opponents are numbered from 1 to 500. Each of these is a unique competitor. For each matchup in the following steps, the Player/Opponent represents the two competitors involved in each match.

The Result records a loss (0), a draw (0.5), or a win (1). These Results apply to the Player vs. the given Opponent. In the first match in the file, we see the following:

This means that in Period 1, Player 121 played Player 444 and Player 121 lost (thus, Player 444 won).

Each Period records 1000 matches between randomly selected Players and Opponents with random Results.

Thus, the .csv file records 12,000 randomly generated matches. All of this was done with Microsoft Excel.

The file can be found here.

Using R to Generate Glicko Ratings

Using the R programming language via R Studio, the following steps were performed:

Load the PlayerRatings package.
Load tournament data .csv.
Set the tournament data as an R data frame properly reporting Period, Player, Opponent, and Result.
Set starting player Ratings, Deviation, and Volatility.
Run the Glicko2 algorithm for all players with the tournament data.
Convert the output from the algorithm to a new data frame with each player’s final ratings, deviation, volatility, and game history
Export the output to a new .csv file.

For this simulation, we’ve given all the players the following attributes:

Initial rating of 1500
Initial deviation of 350
Initial volatility of 0.6

The rating of 1500 and deviation of 350 are recommended by Mark Glickman for initial ratings. He suggests an initial volatility of between 0.3 (for low randomness games, like chess) to as much as 1.2 for high randomness games. We’ve chosen 0.6 for a trading card game, due to its higher randomness factor (i.e., the order in which cards are drawn from decks).

For the ratings algorithm, we’ve chosen a constant (“cval”) of 60. This means, approximately, that a player without play would see his or her deviation return from whatever level it is currently to the initial deviation of 350 in approximately three years of non-activity.

The volatility and cval should be evaluated for any given trading card game and will be the subject of future studies to determine the appropriate levels for games like Pokémon, Magic: the Gathering, and Yu-Gi-Oh, separately. For now, we’ve settled on these values for this demonstration.

You can see the R code, below.

# Step 1: Load PlayerRatings Package
library(PlayerRatings)

# Step 2: Load tournament data
tournamentdata <- read.csv("synthetic_season_1_match_data.csv", 
head=TRUE, sep=",")

# Step 3: Convert tournament data to data frame
season1matches <- data.frame(Period=tournamentdata$Period, 
Player=tournamentdata$Player, 
Opponent=tournamentdata$Opponent,
Result=tournamentdata$Result)

# Step 4: Set starting ratings, deviation, and volatility for all 
# players
startratings <- data.frame(Player=seq(1, 500, 1), Rating=rep(1500,500), 
Deviation=rep(350,500), Volatility=rep(0.60,500))

# Step 5: Run Glicko2 algorithm for all players with data
season1ratings <- glicko2(season1matches, status=startratings, cval=60, tau=0.60)

# Step 6: Set results of the algorithm as a new data frame with 
# reported rating, deviation, volatility, and game history
season1finals <- data.frame(Player=season1ratings$ratings$Player, 
Rating=season1ratings$ratings$Rating, 
Deviation=season1ratings$ratings$Deviation,
Volatility=season1ratings$ratings$Volatility,
Games=season1ratings$ratings$Games, 
Win=season1ratings$ratings$Win,
Loss=season1ratings$ratings$Loss,
Draw=season1ratings$ratings$Draw,
Lag=season1ratings$ratings$Lag)

# Step 7: Export results in a new.csv file
write.csv(season1finals, "season_one_final_ratings.csv")

You can replicate the steps shown above in R.

Final Ratings

Running the steps shown above, we get a neatly formatted .csv file that reports player data at the end of the season (e.g., at the end of the 12,000 games played over 12 months).

Looking at the first few entries in this output, we see the following:

We find that Player 42 came out on top with a Rating of ~1827, a Deviation of ~183, and a Volatility of ~0.62.

We can draw the following conclusions about Player 42:

A Rating of ~1827 implies that Player 42 ~5.5 times more skilled than an average player (Rating 1500), ceteris paribus (given the same σ and τ for both players).
Player 42’s σ has fallen from 350 to ~183. This is expected, as the more games a player plays, the lower σ becomes, as we can assume the credible interval of the player’s skill is closer and closer to the reported Rating. (Note that many other players have even lower σ, because they have played more reliably throughout the season).
Player 137’s τ (~0.62) is about unchanged and matches that of the system of a whole. This is expected for a player with top placement who’s wins/losses/draws have been less due to luck and more due to skill over the course of the season.

Given the structure of the Glicko/Glicko2 system, we can confidently say that Player 42’s true skill level is somewhere between ~1644 to ~2,009. Given the player’s high volatility, we should err to say that the player’s real skill is closer to this lower bound.

The completed output file can be found here.

Generating Win/Loss Probabilities

With these data, we can generate win/loss probabilities for future matchups, which is what we need for our TCG Sportsbook project.

Let’s pit the top two players against one another:

This can be done easily in R with the predict function.

# Predict the probability of Player 42 winning against Player 67
predict(season1ratings, newdata=data.frame(Player=42, Opponent=67, tng=1))

The output we receive is:

[1] 0.6926854

This means that Player 42 has a win probability of ~0.69 against Player 67 in a hypothetical match between them.

Next Steps: Deck Factors

We’ve demonstrated an ability to give players ratings based on their skill from tournament data.

The next issue we’ll have to address is that of deck strategy.

Glicko/Glicko2 (and its forebear, Elo), were made to gauge skill in low randomness games like chess. In games like these, both players come to the table with identical game pieces. Both sides have the same number of pawns, knights, rooks, bishops, etc. Both sides have these pieces in the same starting position.

Trading card games have a higher level of randomness due to the cards in use (which, in part, we addressed by setting the initial Volatility at 0.6 for the algorithm). Each competitor could have a very different deck of cards, or maybe even the same theme of deck, but with a different card list.

All decks don’t perform equally well against one another in every match up. Some decks are simply superior to others, or at least, have very lopsided matchups (where Deck A is three times as likely to win against Deck B, for example), ceteris paribus.

The predict function in R gives us the ability to take such factors into account via a gamma variable (Γ). We’ll use this in the next phase of the project. Γ will be the stand in for the decks in use by either player and allow us to account for how well those decks match up against one another.

1.4 Building a Competitor Model: Glicko Scores

Glicko/Glicko2 Ratings System

Synthetic Tournament Data

Using R to Generate Glicko Ratings

Final Ratings

Generating Win/Loss Probabilities

Next Steps: Deck Factors

Like this:

Leave a Reply Cancel reply

Glicko/Glicko2 Ratings System

Synthetic Tournament Data

Using R to Generate Glicko Ratings

Final Ratings

Generating Win/Loss Probabilities

Next Steps: Deck Factors

Share this:

Like this:

Leave a Reply Cancel reply