Glicko/Glicko2 – Nick Beaver

1.5 Building a Strategy Model: Deck Quality and the Metagame

In the last post, we discussed assigning skill ratings to competitors in trading card games. These ratings can be used to make predictions between competitors about which side we believe would win or lose in a match.

This time, we take deck strategy into account.

Unlike with games like chess, trading card games are asymmetric in that each competitor brings different cards to the match. Some strategies have an edge on others. We’ll have to take this into account for our working model.

Why Deck Strategy Matters

In games like chess, both players sit down with the same game pieces. All the game pieces are completely symmetrical in their game abilities and positional states. There are no surprises that can’t already been seen from the start of the game. A player can’t pull out a special piece from a hidden part of the board to use, or sacrifice three pieces to give another one some special, hidden power, until the end of turn.

Trading card games, however, are exactly like this.

They are asymmetrical. Your deck isn’t like my deck, and even if it is, the exact cards and the order in which they are drawn are different.

Chess is a game that sees both players come to the game with pistols. It’s fair in this way.

Trading card games allow players to pick different kinds of weapons: swords, baseball bats, pistols, explosives, etc.

Some weapons to well against another (a sword vs. a baseball bat, for example), while others do less well against another (a sword vs. an automatic rifle).

Trading card games are unfair in this way, but it is something that deck building tries to take into account: if I can’t beat another strategy outright, how can I upset the opponent’s strategy or undermine it to make my win more likely.

Ultimately, all trading card game players know that the deck they choose has some great matchups and some bad ones. We need a way to take this reality into account in a way that games like chess, on which so much of the probability modelling is based, don’t.

Adding Deck Detail to Match Data

In the last post, we evaluated a .csv file that contained synthetic data on 500 competitors playing a total of 12,000 games over a tournament season.

To this data, we now add the deck used by each competitor in each match, as shown below:

You can see the revised .csv file here.

There are a total of 15 decks, numbered 1 through 15, randomly assigned to each competitor in each match.

The randomization is such that as the tournament season continues, older decks fall out of use as new decks come into use. This simulates a change in metagame over the course of the season.

Deck Matchup Probabilities

With the deck detail added to the seasonal tournament data, we can assess how well each deck does against each other deck, independent of the competitors that use these decks.

Comparing decks in Excel, we get the following:

Deck	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
1	0.5000	0.5618	0.5434	0.5050	0.6029	-	-	-	-	-	-	-	-	-	-
2	0.4382	0.5000	0.5047	0.4969	0.5288	0.5271	0.4783	-	-	-	-	-	-	-	-
3	0.4566	0.4953	0.5000	0.5326	0.4907	0.4961	0.5216	0.5245	0.5357	-	-	-	-	-	-
4	0.4950	0.5031	0.4674	0.5000	0.4736	0.5053	0.4793	0.4806	0.4330	0.5063	0.4286	-	-	-	-
5	0.3971	0.4712	0.5093	0.5264	0.5000	0.5111	0.4547	0.4832	0.5253	0.4760	0.5595	0.4821	0.6136	-	-
6	-	0.4729	0.5039	0.4947	0.4889	0.5000	0.5095	0.5241	0.5433	0.5085	0.5224	0.5045	0.5137	0.4286	0.7105
7	-	0.5217	0.4784	0.5207	0.5453	0.4905	0.5000	0.5116	0.4036	0.5452	0.5891	0.4494	0.4519	0.5306	0.4773
8	-	-	0.4755	0.5194	0.5168	0.4759	0.4884	0.5000	0.5503	0.5444	0.4795	0.5804	0.5667	0.3684	0.6389
9	-	-	0.4643	0.5670	0.4747	0.4567	0.5964	0.4497	0.5000	0.5402	0.5039	0.4491	0.4865	0.5439	0.7292
10	-	-	-	0.4938	0.5240	0.4915	0.4548	0.4556	0.4598	0.5000	0.5000	0.4592	0.5565	0.5000	0.4412
11	-	-	-	0.5714	0.4405	0.4776	0.4109	0.5205	0.4961	0.5000	0.5000	0.4745	0.4722	0.3909	0.6250
12	-	-	-	-	0.5179	0.4955	0.5506	0.4196	0.5509	0.5408	0.5255	0.5000	0.5083	0.6538	0.4412
13	-	-	-	-	0.3864	0.4863	0.5481	0.4333	0.5135	0.4435	0.5278	0.4917	0.5000	0.4649	0.3235
14	-	-	-	-	-	0.5714	0.4694	0.6316	0.4561	0.5000	0.6091	0.3462	0.5351	0.5000	0.5833
15	-	-	-	-	-	0.2895	0.5227	0.3611	0.2708	0.5588	0.3750	0.5588	0.6765	0.4167	0.5000

Here we assume that ties are valued at 0.5 wins.

Taking an example from the table, we can see that Deck 8 has a historical win percentage against Deck 10 of ~55%. Likewise, Deck 10 won against Deck 8 ~45% of the tine.

And as expected, each deck has precisely a 50% win probability against itself. A deck playing against itself will either win, lose, or draw, meaning that the opposing deck, itself, has the opposite outcome.

Half of all wins (win = 1), half of all losses (loss = 0), and half of all draws (draw = 0.5) come out to half of all outcomes. Thus, 50% win probability.

Gamma (Γ) & the Gamma Curve

The deck matchup win/loss percentages can serve in helping us determine how much of an edge to assign to competitors that use these decks.

The PlayerRatings package in R (that we’ve been using to calculate Glicko2 scores and predict win/loss probabilities), provides an open variable called gamma (abbreviated Γ).

Assigning Γ=0 gives neither competitor an edge.
If Γ<0, the “Player” (vs. “Opponent”) suffers a negative edge, one that subtracts from his or her probability of winning.
If Γ>0, the “Player” (vs. “Opponent”) gets a positive edge, one that adds to his or her probability of winning.

But how much Γ should be applied between competitors in a given match?

To help answer this, let’s turn to R with the following code:

# Step 1: Load the PlayerRatings package
library(PlayerRatings)

# Step 2: Set up two competitors with same rating, deviation, and  # volatility
startrate <- data.frame(Player=c("A", "B"), Rating=c(1500,1500), 
Deviation=c(350,350), Volatility=c(0.6,0.6))

# Step 3: Set up a match between these two equivalent players
samplematch <- data.frame(Period=1, Player="A", Opponent="B", Result=0.5)

# Step 4: Determine final ratings for both players  in this match
samplefinals <- glicko2(samplematch, status=startrate, tau=0.6)

# Step 5: Predict the win probabilities for the first player
# with gamma from -1000 to 1000 in 0.1 increments
gammacurve <- predict(samplefinals, newdata=data.frame(Period=2, Player="A", Opponent="B"), tng=1, gamma=seq(-1000, 1000, 0.1))

# Step 6: Convert output from Step 5 to a data frame (this will be 
# useful later)
as.data.frame(gammacurve)

What we’ve done, highlighted in Step 5, above, is predict the win probability between two evenly matched competitors.

We didn’t do this just once, but 20,001 times.

Each time we’ve used a different Γ, starting at -1000 and building to 1000 in increments of 0.1. In this range is also included Γ=0, which favors neither side.

We can now visualize this on a plot using R and the ggplot2 package.

# Plot the gamma curve from -1000 to 1000 with the ggplot2 package.

library(ggplot2)
ggplot(data=gammacurve2, mapping=aes(x=Gamma, y=Win_Prob))+geom_line(color="red", linewidth=1.25)+geom_line(data=gammacurve2, mapping=aes(x=0))+labs(title="Gamma & Win Probability", subtitle="NicholasABeaver.com", caption="Note: Assumes both players have identitical Glicko2 Ratings, Deviations, and Volatility.")

We get the following plot:

As expected, Γ=0 does not favor one competitor or another. The lower Γ gets, the more the “opponent” is favored. The higher Γ gets, the more the “player” is favored.

If we look carefully (and think about what Γ is measuring), the win probability can never reach as low as 0 or as high as 1. Γ closer to 0 has a larger effect than Γ farther away from it.

Γ is logarithmic in a way similar to that of Glicko2 ratings. Each increment of Γ close to 0 has a larger effect than the same increment further away from 0, and increasingly (or decreasingly) so, such that Γ can never reach 0 or 1. Just like win probabilities never reach 0 or 1.

We can export this data as a .csv file, which can serve us as a useful table.

To do that in R, we use the following code:

write.csv(gammacurve, "gamma_curve_1_to_1000_by_0.1.csv")

We can see the output .csv file here.

We’ll use this in the next section to illustrate how Γ helps us account for deck-vs-deck quality.

Putting Glicko2 and Deck Gamma Together

Let’s tie win probabilities and deck quality together to illustrate how they work.

We’ll make the following assumptions:

Player A uses Deck 9
Player B uses Deck 7
Both players have Glicko2 Ratings of 1500
Both players have Deviations (σ) of 350
Both players have Volatility (τ) of 0.6

Using our Deck Matchup Probabilities table, we can see that Player A’s Deck 9 has a 59.64% probability of beating Player B’s Deck 7, as shown below:

Looking up a Γ = ~0.5964 on out Γ output table from R, we see the following:

For the matchup of these two decks (9 vs. 7), our Γ = 113.5.

We can now use this in the predict function in R, setting “gamma” equal to 82.

predict(samplefinals, newdata=data.frame(Period=2, Player="A", Opponent="B"), tng=1, gamma=113.5)

The out put is:

[1] 0.5964414

This is what we expect, because, both players have exactly the same skill, the same deviation, and the same volatility.

The only variable that is different is the deck in use by either player (Deck 9 vs. Deck 11). Since Deck 9 has a ~59.64% probability against Deck 11, it makes perfect sense that given this matchup, the probability for Player A to beat Player B is ~59.64%. Everything else about the two competitors is the same.

We can carry out this same process for any two competitors using any two decks by doing the following:

Find the Ratings, deviation (σ), and volatility (τ) for two given players.
Find the Decks to be used by each player and consult the Deck Matchup Probability Chart for the decks’ win probabilities.
Use the decks’ win probabilities to consult the Gamma (Γ) Chart and find the correct Γ to apply to the match.
Set the predict function with the players’ skill details and correct Γ to find the win probability.

This is a somewhat manual process, which could be automated with software.

But this is another important step in our proof-of-concept.

Next, we’ll add some fine-tuning to our basic model, putting it’s various parts together into a cohesive whole.

1.4 Building a Competitor Model: Glicko Scores

In the last post, we reviewed setting win-loss odds and betting prices for trading card games. In that last segment, we were given the players’ win probabilities.

In this post, we’ll take an important step (but not the last one) in generating those player win probabilities.

We’ll use a ratings system, called Glicko/Glicko2, to rate each player in a game. These ratings give a value to each player’s skill, and can be used to compare one against another to generate win/loss probabilities between them.

Glicko/Glicko2 Ratings System

Competitive games like chess have long used the Elo rating system, developed by Arpad Elo, to rate the skill level of players.

The Elo system assigns a numerical value to each player at the start of their careers. For Elo, new players start off with a rating of 1000. Each time a player wins or loses against another player, the winning player’s Elo rating increases while the losing player’s Elo rating decreases. This allows changes over time in a player’s skill level to be tracked.

The scale of the Elo rating system is logarithmic, meaning that a player with a rating of 1600 isn’t 60% better than a player with a rating of 1000, but rather, ninety eight times better. This logarithmic scale also has the advantage that if a weaker player beats a stronger player, the weaker player’s rating rises faster than beating a player of similar skill while if a stronger player beats a weaker player, the stronger player’s rating rises slower than beating a player of similar skill.

Mark Glickman, Fellow of the American Statistical Association and Senior Lecturer on Statistics at the Harvard University Department of Statistics (among other accolades), invented a new spin on the traditional Elo system called Glicko (and later, Glicko2).

These systems make several improvements on the Elo methodology that are key for trading card games:

Glicko/Glicko2 incorporates a rating “deviation” (σ). Player deviations begin at 350 and change based on how often a player plays. This helps us measure our certainty about how accurate a player’s rating is as well as allows for a sort of “rust” factor to occur, whereby players who don’t play for a long time have their deviations drift back to that of a new player (at the σ=350 level).
With the deviations in mind, ratings are typified as credible intervals. For instance, a player with a rating of 1750 and σ=150 has a credible rating of ~1600 to 1900. The more the player plays, the more the deviation shrinks and the narrower and narrower the credible interval becomes (and thus, the greater and greater our certainty about that rating).
The deviations are not just window dressing: they are an integral part of predicting the win/loss probabilities between two competitors. A player with a lower σ is predicted to win more reliably than one with a higher σ, all else being equal.
Glicko uses a constant variable (“c” or the “c-value”) which limits how much a player’s rating can change due to the outcome of a single match; this limits wild fluctuations due to upset wins or losses, as well as sets how quickly (or slowly) a player without play has his or her deviation drift back down to its initial level (350).
Glicko2 takes a “volatility” variable (τ) into account. This variable rates how much luck is a factor in the game at hand, and helps regulate probabilities between players with different levels of volatility. Similar to σ, one competitor with a high τ, but similar skill, with be predicted to perform worse against another player with lower τ, all else being equal. Luck is accounted for.

For these reasons, we’ll use the Glicko2 system for rating competitors.

You can find an excellent paper on the Glicko System by Mark Glickman here, as well as his follow-up for Glicko2, here.

To apply the Glicko2 system, we’ll generate some synthetic (made up) player match data and use the PlayerRatings package in R, which incorporates the Glicko2 system in its functions.

Synthetic Tournament Data

A .csv file was set up to represent a tournament season.

The file has four fields:

Period (numbered 1 through 12)
Player (numbered 1 through 500).
Opponent (numbered 1 through 500).
Result (0, 0.5, or 1).

Each “Period” represents a single month during the tournament season (Period 1 = January, Period 2 = February, etc.). We assume that the tournament season runs from January to December of a single year.

Players and Opponents are numbered from 1 to 500. Each of these is a unique competitor. For each matchup in the following steps, the Player/Opponent represents the two competitors involved in each match.

The Result records a loss (0), a draw (0.5), or a win (1). These Results apply to the Player vs. the given Opponent. In the first match in the file, we see the following:

This means that in Period 1, Player 121 played Player 444 and Player 121 lost (thus, Player 444 won).

Each Period records 1000 matches between randomly selected Players and Opponents with random Results.

Thus, the .csv file records 12,000 randomly generated matches. All of this was done with Microsoft Excel.

The file can be found here.

Using R to Generate Glicko Ratings

Using the R programming language via R Studio, the following steps were performed:

Load the PlayerRatings package.
Load tournament data .csv.
Set the tournament data as an R data frame properly reporting Period, Player, Opponent, and Result.
Set starting player Ratings, Deviation, and Volatility.
Run the Glicko2 algorithm for all players with the tournament data.
Convert the output from the algorithm to a new data frame with each player’s final ratings, deviation, volatility, and game history
Export the output to a new .csv file.

For this simulation, we’ve given all the players the following attributes:

Initial rating of 1500
Initial deviation of 350
Initial volatility of 0.6

The rating of 1500 and deviation of 350 are recommended by Mark Glickman for initial ratings. He suggests an initial volatility of between 0.3 (for low randomness games, like chess) to as much as 1.2 for high randomness games. We’ve chosen 0.6 for a trading card game, due to its higher randomness factor (i.e., the order in which cards are drawn from decks).

For the ratings algorithm, we’ve chosen a constant (“cval”) of 60. This means, approximately, that a player without play would see his or her deviation return from whatever level it is currently to the initial deviation of 350 in approximately three years of non-activity.

The volatility and cval should be evaluated for any given trading card game and will be the subject of future studies to determine the appropriate levels for games like Pokémon, Magic: the Gathering, and Yu-Gi-Oh, separately. For now, we’ve settled on these values for this demonstration.

You can see the R code, below.

# Step 1: Load PlayerRatings Package
library(PlayerRatings)

# Step 2: Load tournament data
tournamentdata <- read.csv("synthetic_season_1_match_data.csv", 
head=TRUE, sep=",")

# Step 3: Convert tournament data to data frame
season1matches <- data.frame(Period=tournamentdata$Period, 
Player=tournamentdata$Player, 
Opponent=tournamentdata$Opponent,
Result=tournamentdata$Result)

# Step 4: Set starting ratings, deviation, and volatility for all 
# players
startratings <- data.frame(Player=seq(1, 500, 1), Rating=rep(1500,500), 
Deviation=rep(350,500), Volatility=rep(0.60,500))

# Step 5: Run Glicko2 algorithm for all players with data
season1ratings <- glicko2(season1matches, status=startratings, cval=60, tau=0.60)

# Step 6: Set results of the algorithm as a new data frame with 
# reported rating, deviation, volatility, and game history
season1finals <- data.frame(Player=season1ratings$ratings$Player, 
Rating=season1ratings$ratings$Rating, 
Deviation=season1ratings$ratings$Deviation,
Volatility=season1ratings$ratings$Volatility,
Games=season1ratings$ratings$Games, 
Win=season1ratings$ratings$Win,
Loss=season1ratings$ratings$Loss,
Draw=season1ratings$ratings$Draw,
Lag=season1ratings$ratings$Lag)

# Step 7: Export results in a new.csv file
write.csv(season1finals, "season_one_final_ratings.csv")

You can replicate the steps shown above in R.

Final Ratings

Running the steps shown above, we get a neatly formatted .csv file that reports player data at the end of the season (e.g., at the end of the 12,000 games played over 12 months).

Looking at the first few entries in this output, we see the following:

We find that Player 42 came out on top with a Rating of ~1827, a Deviation of ~183, and a Volatility of ~0.62.

We can draw the following conclusions about Player 42:

A Rating of ~1827 implies that Player 42 ~5.5 times more skilled than an average player (Rating 1500), ceteris paribus (given the same σ and τ for both players).
Player 42’s σ has fallen from 350 to ~183. This is expected, as the more games a player plays, the lower σ becomes, as we can assume the credible interval of the player’s skill is closer and closer to the reported Rating. (Note that many other players have even lower σ, because they have played more reliably throughout the season).
Player 137’s τ (~0.62) is about unchanged and matches that of the system of a whole. This is expected for a player with top placement who’s wins/losses/draws have been less due to luck and more due to skill over the course of the season.

Given the structure of the Glicko/Glicko2 system, we can confidently say that Player 42’s true skill level is somewhere between ~1644 to ~2,009. Given the player’s high volatility, we should err to say that the player’s real skill is closer to this lower bound.

The completed output file can be found here.

Generating Win/Loss Probabilities

With these data, we can generate win/loss probabilities for future matchups, which is what we need for our TCG Sportsbook project.

Let’s pit the top two players against one another:

This can be done easily in R with the predict function.

# Predict the probability of Player 42 winning against Player 67
predict(season1ratings, newdata=data.frame(Player=42, Opponent=67, tng=1))

The output we receive is:

[1] 0.6926854

This means that Player 42 has a win probability of ~0.69 against Player 67 in a hypothetical match between them.

Next Steps: Deck Factors

We’ve demonstrated an ability to give players ratings based on their skill from tournament data.

The next issue we’ll have to address is that of deck strategy.

Glicko/Glicko2 (and its forebear, Elo), were made to gauge skill in low randomness games like chess. In games like these, both players come to the table with identical game pieces. Both sides have the same number of pawns, knights, rooks, bishops, etc. Both sides have these pieces in the same starting position.

Trading card games have a higher level of randomness due to the cards in use (which, in part, we addressed by setting the initial Volatility at 0.6 for the algorithm). Each competitor could have a very different deck of cards, or maybe even the same theme of deck, but with a different card list.

All decks don’t perform equally well against one another in every match up. Some decks are simply superior to others, or at least, have very lopsided matchups (where Deck A is three times as likely to win against Deck B, for example), ceteris paribus.

The predict function in R gives us the ability to take such factors into account via a gamma variable (Γ). We’ll use this in the next phase of the project. Γ will be the stand in for the decks in use by either player and allow us to account for how well those decks match up against one another.