Development Blog

1.6 Fine-Tuning the TCG Sportsbook Model

Before we move on to simulating a TCG world tournament, there are a few issues that our working model needs to have addressed.

These issues are:

  1. How to account for new decks that enter the competitive format for which we have no previous data.
  2. How the sportsbook will manage the liabilities it owes to winning bettors on either side of a match.
  3. How to predict the winner of the entire tournament from the outset and how bet prices are placed on this outcome.

These three issues will become more central when we move into the next post in this series. For now, let’s grapple with how to deal with these issues.

“Rogue” Decks

In the previous post, we determined how much a competitor’s selection of deck contributes to his or her win probability. This parameter, called “gamma” (represented by the Greek letter of the same name: Γ) works with our Glicko2 algorithm to determine this.

We noted that over time, the metagame (or the competitive environment) evolves. New decks emerge to beat existing decks, making older decks obsolete as time goes on. This is a natural process in trading card games and a big part of their draw and fun.

But what gamma do we assign to a brand new deck that’s never been seen before? If the current metagame landscape supports 15 decks, and an 16th deck enters the field, how will that deck compare to the other 15 and vice versa?

Consider the following:

The table above shows which decks existed in the format during each period by marking those that were used in each period with a “Yes” (and those that weren’t used in a period with a “No”). For example, we can see that in Period 1, only Decks 1, 2, 3, and 4 existed. In Period 2, Deck 5 entered the format. In Period 3, Deck 6 entered the format, but Deck 1 left, etc.

The World Tournament will occur in a future period, Period 13. What if a new deck, Deck 16, which no one has ever seen before in a tournament setting and for which we have no data, enters the format?

Certainly, given time, we’ll know this. But as the sportsbook, we have to be ready to give bet prices (and rate the underlying probabilities) before this data is available.

How do we rate these newcomers, or “rogue” decks?

Rating New Decks

To the synthetic tournament data from the regular season, we’ve added new columns to indicate whether a deck used by a competitor is “Old” (meaning existing prior to that period) or “New” (meaning that it debuted during that period).

To keep things making sense, all decks (Decks 1 through 4) that existed in Period 1 were rated as “Old”, since these decks would have been presumed to exist before the start of the season.

Those added details look like the below:

The revised .csv file can be seen here.

Using this additional detail, we can see how well each of the 15 unique decks used during the tournament season fared against any new deck. This is summarized on the table below. A new row and column titled “New” has been added.

Deck123456789101112131415New
10.50000.56180.54340.50500.6029----------0.6029
20.43820.50000.50470.49690.52880.52710.4783--------0.5088
30.45660.49530.50000.53260.49070.49610.52160.52450.5357------0.5121
40.49500.50310.46740.50000.47360.50530.47930.48060.43300.50630.4286----0.4268
50.39710.47120.50930.52640.50000.51110.45470.48320.52530.47600.55950.48210.6136--0.5227
6-0.47290.50390.49470.48890.50000.50950.52410.54330.50850.52240.50450.51370.42860.71050.5368
7-0.52170.47840.52070.54530.49050.50000.51160.40360.54520.58910.44940.45190.53060.47730.5392
8--0.47550.51940.51680.47590.48840.50000.55030.54440.47950.58040.56670.36840.63890.5273
9--0.46430.56700.47470.45670.59640.44970.50000.54020.50390.44910.48650.54390.72920.5364
10---0.49380.52400.49150.45480.45560.45980.50000.50000.45920.55650.50000.44120.4970
11---0.57140.44050.47760.41090.52050.49610.50000.50000.47450.47220.39090.62500.4964
12----0.51790.49550.55060.41960.55090.54080.52550.50000.50830.65380.44120.5421
13----0.38640.48630.54810.43330.51350.44350.52780.49170.50000.46490.32350.4667
14-----0.57140.46940.63160.45610.50000.60910.34620.53510.50000.58330.5341
15-----0.28950.52270.36110.27080.55880.37500.55880.67650.41670.50000.5000
New0.39710.49120.48790.57320.47730.46320.46080.47270.46360.50300.50360.45790.53330.46590.50000.5000

This seems, however, an unsatisfactory solution. Yes, we know how each deck fared against a new entrant, but are all new entrants alike?

Does each existing deck have the same potential against each existing deck? Intuition tells us that this isn’t correct, that some deck types or strategies fare better against others because of their qualities. For instance, an “Aggro” deck may do very well against a “Mid Range” deck, but fare poorly against a “Control” deck.

We need to consider these “deck styles”.

Rating Deck Styles

Our season tournament data contains fifteen unique decks, numbered 1 through 15.

To these we added one of five deck styles: Aggro, Combo, Control, Mid Range, and Mill. These were assigned randomly.

The outcome of this assignment is as follows:

Deck Style
1 Control
2 Combo
3 Aggro
4 Combo
5 Mid Range
6 Mid Range
7 Control
8 Combo
9 Control
10 Control
11 Mill
12 Mill
13 Combo
14 Mid Range
15 Aggro

To the same season tournament data, we add the style detail to each deck for each matchup.

Those details look like the below:

The revised .csv file can be seen here.

Using this added detail, we can now see how well each deck style does against each deck style and vice versa. We’ve also kept the detail for matchups against “new” decks.

The data are summarized on the table below:

Deck123456789101112131415NewAggroComboControlMid RangeMill
10.50000.56180.54340.50500.6029----------0.60290.54340.53240.50000.6029-
20.43820.50000.50470.49690.52880.52710.4783--------0.50880.50470.49850.44900.5282-
30.45660.49530.50000.53260.49070.49610.52160.52450.5357------0.51210.50000.51840.49130.4931-
40.49500.50310.46740.50000.47360.50530.47930.48060.43300.50630.4286----0.42680.46740.49810.47950.48760.4286
50.39710.47120.50930.52640.50000.51110.45470.48320.52530.47600.55950.48210.6136--0.52270.50930.50250.47380.50520.5286
6-0.47290.50390.49470.48890.50000.50950.52410.54330.50850.52240.50450.51370.42860.71050.53680.51830.50330.51920.49110.5142
7-0.52170.47840.52070.54530.49050.50000.51160.40360.54520.58910.44940.45190.53060.47730.53920.47830.51090.48650.51710.5361
8--0.47550.51940.51680.47590.48840.50000.55030.54440.47950.58040.56670.36840.63890.52730.50000.51330.52350.48450.5233
9--0.46430.56700.47470.45670.59640.44970.50000.54020.50390.44910.48650.54390.72920.53640.56060.49090.54070.47520.4788
10---0.49380.52400.49150.45480.45560.45980.50000.50000.45920.55650.50000.44120.49700.44120.48450.47300.50420.4828
11---0.57140.44050.47760.41090.52050.49610.50000.50000.47450.47220.39090.62500.49640.62500.51380.46940.44870.4893
12----0.51790.49550.55060.41960.55090.54080.52550.50000.50830.65380.44120.54210.44120.45060.54740.53860.5129
13----0.38640.48630.54810.43330.51350.44350.52780.49170.50000.46490.32350.46670.32350.47250.50000.46380.5114
14-----0.57140.46940.63160.45610.50000.60910.34620.53510.50000.58330.53410.58330.57370.47450.53490.4813
15-----0.28950.52270.36110.27080.55880.37500.55880.67650.41670.50000.50000.50000.51430.43650.35140.4595
New0.39710.49120.48790.57320.47730.46320.46080.47270.46360.50300.50360.45790.53330.46590.50000.50000.48870.52080.46500.47030.4850
Aggro0.45660.49530.50000.53260.49070.48170.52170.50000.43940.55880.37500.55880.67650.41670.50000.51130.50000.51820.48380.48450.4595
Combo0.46760.50150.48160.50190.49750.49670.48910.48670.50910.51550.48620.54940.52750.42630.48570.47920.48180.50000.49380.49310.5118
Control0.50000.55100.50880.52050.52620.48080.51350.47650.45930.52700.53060.45260.50000.52550.56350.53500.51620.50620.50000.50520.4978
Mid Range0.39710.47180.50690.51240.49480.50890.48290.51550.52480.49580.55130.46140.53620.46510.64860.52970.51550.50690.49480.50000.5112
Mill---0.57140.47140.48580.46390.47670.52120.51720.51070.48710.48860.51870.54050.51500.54050.48820.50220.48880.5000

This is more satisfactory. We can theorize which new decks might enter the format and use these styles as comparisons for our gamma parameter.

For instance, looking at Period 12, which precedes the upcoming Period 13 in which the world tournament will occur, we can see only the following decks in the format (with their corresponding styles):

Deck Style
6 Mid Range
7 Control
8 Combo
9 Control
10 Control
11 Mill
12 Mill
13 Combo
14 Mid Range
15 Aggro

We can theorize that some of the older decks might drop out of the format by the time the world tournament occurs and about which new decks and deck styles will enter to fill the vacuum.

If by the time of the world tournament, Decks 6 through 10 drop out (because perhaps they are uniquely week to Deck 15, the latest entrant, which was designed to beat the old “best” decks), our format would look like this:

Deck Style
11 Mill
12 Mill
13 Combo
14 Mid Range
15 Aggro

What new decks will emerge to exploit the power vacuum left in such a competitive environment?

We now have the tools to consider this and give probabilities.

Sportsbook Risk Management

If the house does not carefully manage its risk, an upset outcome of an event can ruin it.

Book makers try to keep the liability, that is, the amount of money it will pay out to winners of one side, as equal as possible on both sides of an event.

Three Scenarios & Six Outcomes

Consider the following:

We have Player A vs. Opponent B in a match under three different scenarios.

  1. In Scenario 1, the liability for both players is completely independent.
  2. In Scenario 2, the liability for both players is identical.
  3. In Scenario 3, the liability for both players is managed to be within a narrow margin of one another.

As can be seen in the outcomes in the lower part of the table, unmanaged risk can ruin the house. A loss of 53.7% is simply catastrophic and absolutely unacceptable in outcome 1B (that is, in Scenario 1 if Player B wins). This potential loss, no matter how probable or improbable, is not weighed evenly by the potential upside (that is, in Scenario 1 if Player A wins), which gets up a GGR margin of 37.3%. We can see that we should expect, either way, a long run margin of -8.2%. Also unacceptable.

In Scenario 2, the liabilities on both sides are identical, and so, too are the payouts to players. This is ideal and lands us a tidy profit. But reality is never so good to us.

In Scenario 3, the sportsbook is managing its risk by limiting bets on either side so that that were within some close margin to one another. The fact that the profit in Scenario 3 is higher than in Scenario 2 is the result of randomness; we should expect perfect parity to be the best option, and with risk management, we’re trying to get as close to perfect parity as possible.

If the liability is equal on both sides, the house is indifferent to the outcome of the game. No matter which side wins, the house gets its cut. We’re happy with that. It this ideal—perfect parity of liability—that we’re seeking.

How We’ll Model Risk Management

So how do we put this into practice in our model?

In our next post in which we model the outcome of a world tournament, we’ll assume that our traders are managing risk by limiting bets on either side so that they are roughly equal.

In each simulated game, we will apply the following rules:

  1. The handle on “Player A” will be a random dollar amount between $5,000 and $10,000. This creates an independent variable.
  2. The handle on “Player B” will be based on the handle for Player A as follows:
      • The handle will be randomly determined to be that of Player A within -10% and +10%.
      • If our calculated win probability for Player B is less than 0.5, we will apply a divisor to the handle we take for the player (see below).
      • If our calculated win probability for Player B is greater than 0.5, we will not further modify the handle for Player B.

The divisor applied to a Player B with a less than 0.5 win probability is:

[math] Divisor = \frac{Moneyline (Player A)}{100} [/math]

This risk management model can thus be summarized as follows:

The more than Player A is favorited over Player B (or the more than Player B is an underdog) the more handle is limited for Player B. This is because, as in our example scenarios and outcomes above, an upset win by an underdog can wipe out the house.

The noise in the model (the -10 to +10% differential) helps keep things from being perfect. We shouldn’t expect perfect liability parity. This model helps bring us within striking distance of perfection and, I think, is reasonable for a real world application.

Predicting the Winner of a World Tournament

What is the probability that a given player invited to the world tournament will win the entire event? What place do we think each competitor will get? Can we set bet prices for these outcomes?

The Bradley-Terry Model (BTM) allows us to find reasonable outcomes.

BTM uses a “preference” algorithm that allows for a comparison between each competitor based on their relative strengths, then gives us a win probability (all of which, by the way, sum to 1, which means that these are probabilities for each competitor to win the whole shebang).

Without giving away too much about the next post, in which we simulate a world tournament, we can assume that we have the following players for which we wish to compute a win probability to win the entire event: Players A through H.

For all players, we calculate their win probabilities against one another (arbitrarily chosen for the sake of this example):

ABCDEFGH
A0.50000.50150.51750.68420.51530.65290.57010.6023
B0.49860.50000.45820.68610.51410.60010.57010.6028
C0.48260.54180.50000.66960.55550.63750.55280.5860
D0.31580.31400.33050.50000.32620.46630.37510.4124
E0.48480.48590.44450.67390.50000.58670.55610.5895
F0.34720.39990.36250.53390.41330.50000.40930.4461
G0.42990.43010.44730.62490.44410.59090.50000.5361
H0.39770.39730.41410.58760.41060.55400.46390.5000

To make life easy for us, we’ll employ the BTM with an excellent Excel plugin from Charles Zaiontz at Real Statistics.

With this table and the =BT_MODEL() function from this plug in, we get the following:

Player Probability
A 0.1420
B 0.1384
C 0.1414
D 0.0950
E 0.1350
F 0.1066
G 0.1251
H 0.1164

This means, for example, that Player A is estimated to have a 14.2% probability of winning the tournament between these 8 players. Likewise, Player B has a 13.84% probability.

We can assign bet prices to these probabilities using our previously established methods. We’ll assume an “overround” of 10% on the real computed probabilities of winning to bake in our bookmaker’s profit.

Doing so, we get the following moneyline odds:

PlayerProbabilityOverroundMoneyline
A0.14200.16+540
B0.13840.15+557
C0.14140.16+543
D0.09500.10+857
E0.13500.15+573
F0.10660.12+753
G0.12510.14+627
H0.11640.13+681

This means that, for example, a bet placed on Player A to win the tournament would win a total of $640 for a $100 stake.

These probabilities, and their corresponding prices, relate to each player at the outset of the tournament, before any games have been played. After each update (e.g., after games occur and players either win or lose), the field will shrink as players are eliminated and any new bets placed on the winner of the whole event will need updated prices. The BTM can still do this for us.

Suppose, after Round One, Players C, D, F, and H are eliminated, leaving only Players A, B, E, and G.

Any new bets placed on the ultimate winner would be based on these matchup probabilities:

ABEG
A0.50.5015260910.5153097260.570093835
B0.4986008490.50.5140529120.570052391
E0.4848170540.4859470880.50.556053431
G0.4299061650.4300757840.444075620.5

Applying the same methods, we come to updated moneyline odds of:

PlayerProbabilityOverroundMoneyline
A0.26080.2869+283
B0.26030.2864+284
E0.25330.2787+295
G0.22550.2480+344

A bet on Player A now to win the tournament will yield $383 for a $100 stake. As the outcomes become more certain, the odds become shorter, and the payouts smaller.

We’ll quote prices like these and track their profitability for the house in our simulation of the world tournament.

Next: Simulating the World Tournament

In the next post, we’ll select the top 32 players from the tournament season and invite them to play in a World Tournament.

These players will go head-to-head in a single round elimination tournament that will last five rounds until a winner is declared.

We’ll take bets on each match and track our profitability along the way.

We’ll simulate this 1,000 times and analyze the results.

Finally, we’ll put everything we’ve discussed in this project together and see if our model proves viable and where there might be opportunities for improvement.

1.5 Building a Strategy Model: Deck Quality and the Metagame

In the last post, we discussed assigning skill ratings to competitors in trading card games. These ratings can be used to make predictions between competitors about which side we believe would win or lose in a match.

This time, we take deck strategy into account.

Unlike with games like chess, trading card games are asymmetric in that each competitor brings different cards to the match. Some strategies have an edge on others. We’ll have to take this into account for our working model.

Why Deck Strategy Matters

In games like chess, both players sit down with the same game pieces. All the game pieces are completely symmetrical in their game abilities and positional states. There are no surprises that can’t already been seen from the start of the game. A player can’t pull out a special piece from a hidden part of the board to use, or sacrifice three pieces to give another one some special, hidden power, until the end of turn.

Trading card games, however, are exactly like this.

They are asymmetrical. Your deck isn’t like my deck, and even if it is, the exact cards and the order in which they are drawn are different.

Chess is a game that sees both players come to the game with pistols. It’s fair in this way.

Trading card games allow players to pick different kinds of weapons: swords, baseball bats, pistols, explosives, etc.

Some weapons to well against another (a sword vs. a baseball bat, for example), while others do less well against another (a sword vs. an automatic rifle).

Trading card games are unfair in this way, but it is something that deck building tries to take into account: if I can’t beat another strategy outright, how can I upset the opponent’s strategy or undermine it to make my win more likely.

Ultimately, all trading card game players know that the deck they choose has some great matchups and some bad ones. We need a way to take this reality into account in a way that games like chess, on which so much of the probability modelling is based, don’t.

Adding Deck Detail to Match Data

In the last post, we evaluated a .csv file that contained synthetic data on 500 competitors playing a total of 12,000 games over a tournament season.

To this data, we now add the deck used by each competitor in each match, as shown below:

You can see the revised .csv file here.

There are a total of 15 decks, numbered 1 through 15, randomly assigned to each competitor in each match.

The randomization is such that as the tournament season continues, older decks fall out of use as new decks come into use. This simulates a change in metagame over the course of the season.

Deck Matchup Probabilities

With the deck detail added to the seasonal tournament data, we can assess how well each deck does against each other deck, independent of the competitors that use these decks.

Comparing decks in Excel, we get the following:

Deck123456789101112131415
10.50000.56180.54340.50500.6029----------
20.43820.50000.50470.49690.52880.52710.4783--------
30.45660.49530.50000.53260.49070.49610.52160.52450.5357------
40.49500.50310.46740.50000.47360.50530.47930.48060.43300.50630.4286----
50.39710.47120.50930.52640.50000.51110.45470.48320.52530.47600.55950.48210.6136--
6-0.47290.50390.49470.48890.50000.50950.52410.54330.50850.52240.50450.51370.42860.7105
7-0.52170.47840.52070.54530.49050.50000.51160.40360.54520.58910.44940.45190.53060.4773
8--0.47550.51940.51680.47590.48840.50000.55030.54440.47950.58040.56670.36840.6389
9--0.46430.56700.47470.45670.59640.44970.50000.54020.50390.44910.48650.54390.7292
10---0.49380.52400.49150.45480.45560.45980.50000.50000.45920.55650.50000.4412
11---0.57140.44050.47760.41090.52050.49610.50000.50000.47450.47220.39090.6250
12----0.51790.49550.55060.41960.55090.54080.52550.50000.50830.65380.4412
13----0.38640.48630.54810.43330.51350.44350.52780.49170.50000.46490.3235
14-----0.57140.46940.63160.45610.50000.60910.34620.53510.50000.5833
15-----0.28950.52270.36110.27080.55880.37500.55880.67650.41670.5000

Here we assume that ties are valued at 0.5 wins.

Taking an example from the table, we can see that Deck 8 has a historical win percentage against Deck 10 of ~55%. Likewise, Deck 10 won against Deck 8 ~45% of the tine.

And as expected, each deck has precisely a 50% win probability against itself. A deck playing against itself will either win, lose, or draw, meaning that the opposing deck, itself, has the opposite outcome.

Half of all wins (win = 1), half of all losses (loss = 0), and half of all draws (draw = 0.5) come out to half of all outcomes. Thus, 50% win probability.

Gamma (Γ) & the Gamma Curve

The deck matchup win/loss percentages can serve in helping us determine how much of an edge to assign to competitors that use these decks.

The PlayerRatings package in R (that we’ve been using to calculate Glicko2 scores and predict win/loss probabilities), provides an open variable called gamma (abbreviated Γ).

  • Assigning Γ=0 gives neither competitor an edge.
  • If Γ<0, the “Player” (vs. “Opponent”) suffers a negative edge, one that subtracts from his or her probability of winning.
  • If Γ>0, the “Player” (vs. “Opponent”) gets a positive edge, one that adds to his or her probability of winning.

But how much Γ should be applied between competitors in a given match?

To help answer this, let’s turn to R with the following code:

# Step 1: Load the PlayerRatings package
library(PlayerRatings)

# Step 2: Set up two competitors with same rating, deviation, and  # volatility
startrate <- data.frame(Player=c("A", "B"), Rating=c(1500,1500), 
Deviation=c(350,350), Volatility=c(0.6,0.6))

# Step 3: Set up a match between these two equivalent players
samplematch <- data.frame(Period=1, Player="A", Opponent="B", Result=0.5)

# Step 4: Determine final ratings for both players  in this match
samplefinals <- glicko2(samplematch, status=startrate, tau=0.6)

# Step 5: Predict the win probabilities for the first player
# with gamma from -1000 to 1000 in 0.1 increments
gammacurve <- predict(samplefinals, newdata=data.frame(Period=2, Player="A", Opponent="B"), tng=1, gamma=seq(-1000, 1000, 0.1))

# Step 6: Convert output from Step 5 to a data frame (this will be 
# useful later)
as.data.frame(gammacurve)

What we’ve done, highlighted in Step 5, above, is predict the win probability between two evenly matched competitors.

We didn’t do this just once, but 20,001 times.

Each time we’ve used a different Γ, starting at -1000 and building to 1000 in increments of 0.1. In this range is also included Γ=0, which favors neither side.

We can now visualize this on a plot using R and the ggplot2 package.

# Plot the gamma curve from -1000 to 1000 with the ggplot2 package.

library(ggplot2)
ggplot(data=gammacurve2, mapping=aes(x=Gamma, y=Win_Prob))+geom_line(color="red", linewidth=1.25)+geom_line(data=gammacurve2, mapping=aes(x=0))+labs(title="Gamma & Win Probability", subtitle="NicholasABeaver.com", caption="Note: Assumes both players have identitical Glicko2 Ratings, Deviations, and Volatility.")

We get the following plot:

As expected, Γ=0 does not favor one competitor or another. The lower Γ gets, the more the “opponent” is favored. The higher Γ gets, the more the “player” is favored.

If we look carefully (and think about what Γ is measuring), the win probability can never reach as low as 0 or as high as 1. Γ closer to 0 has a larger effect than Γ farther away from it.

Γ is logarithmic in a way similar to that of Glicko2 ratings. Each increment of Γ close to 0 has a larger effect than the same increment further away from 0, and increasingly (or decreasingly) so, such that Γ can never reach 0 or 1. Just like win probabilities never reach 0 or 1.

We can export this data as a .csv file, which can serve us as a useful table.

To do that in R, we use the following code:

write.csv(gammacurve, "gamma_curve_1_to_1000_by_0.1.csv")

We can see the output .csv file here.

We’ll use this in the next section to illustrate how Γ helps us account for deck-vs-deck quality.

Putting Glicko2 and Deck Gamma Together

Let’s tie win probabilities and deck quality together to illustrate how they work.

We’ll make the following assumptions:

  • Player A uses Deck 9
  • Player B uses Deck 7
  • Both players have Glicko2 Ratings of 1500
  • Both players have Deviations (σ) of 350
  • Both players have Volatility (τ) of 0.6

Using our Deck Matchup Probabilities table, we can see that Player A’s Deck 9 has a 59.64% probability of beating Player B’s Deck 7, as shown below:

 

Looking up a Γ = ~0.5964 on out Γ output table from R, we see the following:

For the matchup of these two decks (9 vs. 7), our Γ = 113.5.

We can now use this in the predict function in R, setting “gamma” equal to 82.

predict(samplefinals, newdata=data.frame(Period=2, Player="A", Opponent="B"), tng=1, gamma=113.5)

The out put is:

[1] 0.5964414

This is what we expect, because, both players have exactly the same skill, the same deviation, and the same volatility.

The only variable that is different is the deck in use by either player (Deck 9 vs. Deck 11). Since Deck 9 has a ~59.64% probability against Deck 11, it makes perfect sense that given this matchup, the probability for Player A to beat Player B is ~59.64%. Everything else about the two competitors is the same.

We can carry out this same process for any two competitors using any two decks by doing the following:

  1. Find the Ratings, deviation (σ), and volatility (τ) for two given players.
  2. Find the Decks to be used by each player and consult the Deck Matchup Probability Chart for the decks’ win probabilities.
  3. Use the decks’ win probabilities to consult the Gamma (Γ) Chart and find the correct Γ to apply to the match.
  4. Set the predict function with the players’ skill details and correct Γ to find the win probability.

This is a somewhat manual process, which could be automated with software.

But this is another important step in our proof-of-concept.

Next, we’ll add some fine-tuning to our basic model, putting it’s various parts together into a cohesive whole.

1.4 Building a Competitor Model: Glicko Scores

In the last post, we reviewed setting win-loss odds and betting prices for trading card games. In that last segment, we were given the players’ win probabilities.

In this post, we’ll take an important step (but not the last one) in generating those player win probabilities.

We’ll use a ratings system, called Glicko/Glicko2, to rate each player in a game. These ratings give a value to each player’s skill, and can be used to compare one against another to generate win/loss probabilities between them.

Glicko/Glicko2 Ratings System

Competitive games like chess have long used the Elo rating system, developed by Arpad Elo, to rate the skill level of players.

The Elo system assigns a numerical value to each player at the start of their careers. For Elo, new players start off with a rating of 1000. Each time a player wins or loses against another player, the winning player’s Elo rating increases while the losing player’s Elo rating decreases. This allows changes over time in a player’s skill level to be tracked.

The scale of the Elo rating system is logarithmic, meaning that a player with a rating of 1600 isn’t 60% better than a player with a rating of 1000, but rather, ninety eight times better. This logarithmic scale also has the advantage that if a weaker player beats a stronger player, the weaker player’s rating rises faster than beating a player of similar skill while if a stronger player beats a weaker player, the stronger player’s rating rises slower than beating a player of similar skill.

Mark Glickman, Fellow of the American Statistical Association and Senior Lecturer on Statistics at the Harvard University Department of Statistics (among other accolades), invented a new spin on the traditional Elo system called Glicko (and later, Glicko2).

These systems make several improvements on the Elo methodology that are key for trading card games:

  1. Glicko/Glicko2 incorporates a rating “deviation” (σ). Player deviations begin at 350 and change based on how often a player plays. This helps us measure our certainty about how accurate a player’s rating is as well as allows for a sort of “rust” factor to occur, whereby players who don’t play for a long time have their deviations drift back to that of a new player (at the σ=350 level).
  2. With the deviations in mind, ratings are typified as credible intervals. For instance, a player with a rating of 1750 and σ=150 has a credible rating of ~1600 to 1900. The more the player plays, the more the deviation shrinks and the narrower and narrower the credible interval becomes (and thus, the greater and greater our certainty about that rating).
  3. The deviations are not just window dressing: they are an integral part of predicting the win/loss probabilities between two competitors. A player with a lower σ is predicted to win more reliably than one with a higher σ, all else being equal.
  4. Glicko uses a constant variable (“c” or the “c-value”) which limits how much a player’s rating can change due to the outcome of a single match; this limits wild fluctuations due to upset wins or losses, as well as sets how quickly (or slowly) a player without play has his or her deviation drift back down to its initial level (350).
  5. Glicko2 takes a “volatility” variable (τ) into account. This variable rates how much luck is a factor in the game at hand, and helps regulate probabilities between players with different levels of volatility. Similar to σ, one competitor with a high τ, but similar skill, with be predicted to perform worse against another player with lower τ, all else being equal. Luck is accounted for.

For these reasons, we’ll use the Glicko2 system for rating competitors.

You can find an excellent paper on the Glicko System by Mark Glickman here, as well as his follow-up for Glicko2, here.

To apply the Glicko2 system, we’ll generate some synthetic (made up) player match data and use the PlayerRatings package in R, which incorporates the Glicko2 system in its functions.

Synthetic Tournament Data

A .csv file was set up to represent a tournament season.

The file has four fields:

  1. Period (numbered 1 through 12)
  2. Player (numbered 1 through 500).
  3. Opponent (numbered 1 through 500).
  4. Result (0, 0.5, or 1).

Each “Period” represents a single month during the tournament season (Period 1 = January, Period 2 = February, etc.). We assume that the tournament season runs from January to December of a single year.

Players and Opponents are numbered from 1 to 500. Each of these is a unique competitor. For each matchup in the following steps, the Player/Opponent represents the two competitors involved in each match.

The Result records a loss (0), a draw (0.5), or a win (1). These Results apply to the Player vs. the given Opponent. In the first match in the file, we see the following:

This means that in Period 1, Player 121 played Player 444 and Player 121 lost (thus, Player 444 won).

Each Period records 1000 matches between randomly selected Players and Opponents with random Results.

Thus, the .csv file records 12,000 randomly generated matches. All of this was done with Microsoft Excel.

The file can be found here.

Using R to Generate Glicko Ratings

Using the R programming language via R Studio, the following steps were performed:

  1. Load the PlayerRatings package.
  2. Load tournament data .csv.
  3. Set the tournament data as an R data frame properly reporting Period, Player, Opponent, and Result.
  4. Set starting player Ratings, Deviation, and Volatility.
  5. Run the Glicko2 algorithm for all players with the tournament data.
  6. Convert the output from the algorithm to a new data frame with each player’s final ratings, deviation, volatility, and game history
  7. Export the output to a new .csv file.

For this simulation, we’ve given all the players the following attributes:

  • Initial rating of 1500
  • Initial deviation of 350
  • Initial volatility of 0.6

The rating of 1500 and deviation of 350 are recommended by Mark Glickman for initial ratings. He suggests an initial volatility of between 0.3 (for low randomness games, like chess) to as much as 1.2 for high randomness games. We’ve chosen 0.6 for a trading card game, due to its higher randomness factor (i.e., the order in which cards are drawn from decks).

For the ratings algorithm, we’ve chosen a constant (“cval”) of 60. This means, approximately, that a player without play would see his or her deviation return from whatever level it is currently to the initial deviation of 350 in approximately three years of non-activity.

The volatility and cval should be evaluated for any given trading card game and will be the subject of future studies to determine the appropriate levels for games like Pokémon, Magic: the Gathering, and Yu-Gi-Oh, separately. For now, we’ve settled on these values for this demonstration.

You can see the R code, below.

# Step 1: Load PlayerRatings Package
library(PlayerRatings)

# Step 2: Load tournament data
tournamentdata <- read.csv("synthetic_season_1_match_data.csv", 
head=TRUE, sep=",")

# Step 3: Convert tournament data to data frame
season1matches <- data.frame(Period=tournamentdata$Period, 
Player=tournamentdata$Player, 
Opponent=tournamentdata$Opponent,
Result=tournamentdata$Result)

# Step 4: Set starting ratings, deviation, and volatility for all 
# players
startratings <- data.frame(Player=seq(1, 500, 1), Rating=rep(1500,500), 
Deviation=rep(350,500), Volatility=rep(0.60,500))

# Step 5: Run Glicko2 algorithm for all players with data
season1ratings <- glicko2(season1matches, status=startratings, cval=60, tau=0.60)

# Step 6: Set results of the algorithm as a new data frame with 
# reported rating, deviation, volatility, and game history
season1finals <- data.frame(Player=season1ratings$ratings$Player, 
Rating=season1ratings$ratings$Rating, 
Deviation=season1ratings$ratings$Deviation,
Volatility=season1ratings$ratings$Volatility,
Games=season1ratings$ratings$Games, 
Win=season1ratings$ratings$Win,
Loss=season1ratings$ratings$Loss,
Draw=season1ratings$ratings$Draw,
Lag=season1ratings$ratings$Lag)

# Step 7: Export results in a new.csv file
write.csv(season1finals, "season_one_final_ratings.csv")

You can replicate the steps shown above in R.

Final Ratings

Running the steps shown above, we get a neatly formatted .csv file that reports player data at the end of the season (e.g., at the end of the 12,000 games played over 12 months).

Looking at the first few entries in this output, we see the following:

 

We find that Player 42 came out on top with a Rating of ~1827, a Deviation of ~183, and a Volatility of ~0.62.

We can draw the following conclusions about Player 42:

  1. A Rating of ~1827 implies that Player 42 ~5.5 times more skilled than an average player (Rating 1500), ceteris paribus (given the same σ and τ for both players).
  2. Player 42’s σ has fallen from 350 to ~183. This is expected, as the more games a player plays, the lower σ becomes, as we can assume the credible interval of the player’s skill is closer and closer to the reported Rating. (Note that many other players have even lower σ, because they have played more reliably throughout the season).
  3. Player 137’s τ  (~0.62) is about unchanged and matches that of the system of a whole. This is expected for a player with top placement who’s wins/losses/draws have been less due to luck and more due to skill over the course of the season.

Given the structure of the Glicko/Glicko2 system, we can confidently say that Player 42’s true skill level is somewhere between ~1644 to ~2,009. Given the player’s high volatility, we should err to say that the player’s real skill is closer to this lower bound.

The completed output file can be found here.

Generating Win/Loss Probabilities

With these data, we can generate win/loss probabilities for future matchups, which is what we need for our TCG Sportsbook project.

Let’s pit the top two players against one another:

This can be done easily in R with the predict function.

# Predict the probability of Player 42 winning against Player 67
predict(season1ratings, newdata=data.frame(Player=42, Opponent=67, tng=1))

The output we receive is:

[1] 0.6926854

This means that Player 42 has a win probability of ~0.69 against Player 67 in a hypothetical match between them.

Next Steps: Deck Factors

We’ve demonstrated an ability to give players ratings based on their skill from tournament data.

The next issue we’ll have to address is that of deck strategy.

Glicko/Glicko2 (and its forebear, Elo), were made to gauge skill in low randomness games like chess. In games like these, both players come to the table with identical game pieces. Both sides have the same number of pawns, knights, rooks, bishops, etc. Both sides have these pieces in the same starting position.

Trading card games have a higher level of randomness due to the cards in use (which, in part, we addressed by setting the initial Volatility at 0.6 for the algorithm). Each competitor could have a very different deck of cards, or maybe even the same theme of deck, but with a different card list.

All decks don’t perform equally well against one another in every match up. Some decks are simply superior to others, or at least, have very lopsided matchups (where Deck A is three times as likely to win against Deck B, for example), ceteris paribus.

The predict function in R gives us the ability to take such factors into account via a gamma variable (Γ). We’ll use this in the next phase of the project. Γ will be the stand in for the decks in use by either player and allow us to account for how well those decks match up against one another.

1.3 Win/Loss Odds & Betting Prices in Trading Card Games

Given our conditional probabilities of unique Competitor + Strategy (Cn+Sn) combinations from last post, in this post we’ll look at pricing these probabilities as bets.

We are, after all, trying to model a sportsbook that would take bets from bettors on trading card game events.

Let’s now look at how we’d make money doing this.

Differences Between Odds and Probabilities

Probabilities, as discussed previously, are values we give to uncertain outcomes.

If we say that Player A has a “probability of 0.492 to beat Player B”, we mean that, we lend 49.2% of our belief that Player A, will, in fact, beat Player B. This leaves 50.8% of our belief outside of this outcome, meaning that this amount of our belief is placed with Player B winning, the two players drawing, or something else (a player is disqualified? the event is shut down due to an emergency?).

“Odds” are often used with “probability” interchangeably in everyday speech. But for our purposes, they mean different, but similar things.

To say that Player A has 4:5 odds to beat Player B means that we, in fact, assign a 44.4% probability that Player A beats Player B.

This is because, as we saw with conditional probabilities and the concept of “limiting the probability space”, the total space of the 4:5 odds is 9.

In other words, when we say that Player A has 4:5 odds to beat Player B, we are saying that in a hypothetical 9 matches between these two players, we expect Player A to win 4 matches and Player B to win 5 matches.

That’s where the 4:5 comes from.

A little simple algebra does the trick:

[math] 4:5 \:odds = \frac{4}{4+5} = 0.4444 [/math]

If this looks like the probability formula we used in Excel to find the conditional probabilities of one Competitor + Strategy combination against another… it is!

We’re going to see this kind of conditional probability formulation show up quite a bit throughout this project.

Converting Probability to Odds

To convert a probability to odds, we use this simple formula:

[math]Odds = \frac{Probability}{1-Probability}[/math]

For example, given our above example, if Player A was to have a 0.492 probability to win against Player B, we convert this probability to odds like so:

[math]Odds \: (P|B) = \frac{.492}{1-0.492}=\frac{.492}{.508}=\frac{123}{127}[/math]

The resulting fraction, [math]\frac{123}{127}[/math] is pretty ugly, so we can “normalize” it by setting the denominator (127) equal to 1:

[math]\frac{123}{127}=\frac{x}{1}≈\frac{0.9685}{1}[/math]

The odds that Player A beat Player B are 0.9685:1, 9.685:10, 96.85:100, or 968.5:1000, etc.

This means, given say, 1,969 games between Player A and B, given the present state of our knowledge, belief, and the circumstances of them playing against one another, we believe that Player A would win about 969 of those games.

Converting Odds to Probability

To convert odds to probability, we perform the opposite operation:

[math]Probability(A|B)=\frac{Odds(A)}{Odds(A)+Odds(B)}[/math]

To reconvert our above odds of 0.9685:1, we plug these values into our formula and find:

[math]Probability(A|B)=\frac{0.9685}{0.9685+1}≈0.492[/math]

Given the calculations we’ve done, we’ll have to settle for very approximate equivalency (≈), or the number of decimal places we carry these calculations out will become unmanageable! (This is much less of an issue when using Excel or R to do the calculations, as we’ll see soon enough.)

The “Vig” (or, Sportsbook Odds are Not Fair Odds)

Sportsbook are in business. This means that they are out to make a profit from the service (betting prices) that they offer to their customers.

Bookmakers bake their profits into the betting prices they offer.

As the saying goes, “you can’t beat the house.” (I live in Las Vegas, a town built on this simple, but largely disregarded, truth.)

The vig or vigorish is the marking up (or “over rounding”) of betting prices. Bookmakers intentionally set the probabilities on either side of a bet higher than the real probability when setting these prices. The result is that the prices, when combined, sum to more than a probability of 1.

As we discussed previously, it’s not possible for probabilities to sum to more than 1. The entire concept of probability is that 1 means absolute certainty the outcome will happen and 0 means absolute certainty the outcome will not happen.

Sportsbooks ignore this rule which is how we intend to make money from the bets players place. Regardless of which individual players win or lose any given bet, as the house, we’re out to always win in the long run.

When we set our betting prices, we’re going to explore to methods to “over round” the probabilities of a given match.

Setting Bet Prices

Let’s start by returning to a previous example: we pit two player and deck combinations—C3+S5 and C7+S3—against one another.

We previously determined that C3+S5 had a 0.733 probability of beating C7+S3, which means that C7+S3 has a 0.267 probability of beating C3+S5.

CompetitorWin Probability
C3+S50.733
C7+S30.267
TOTAL1.000

So far, so good.

As the bookmaker, we’re going to over round these to bake in our expected profit.

Proportional Vig

The easiest thing to do is to increase both sides of the contest by a proportional amount.

Let’s say we increase both sides by 10%. We just multiply each probability by 1.1, like so:

CompetitorWin Probability
C3+S50.8063
C7+S30.2937
TOTAL1.1000

Easy. Now our betting prices are overpriced on both sides, equally.

But there’s another way.

Disproportional Vig

If we want one side to be higher priced than the other (perhaps we have too much bet liability on one side and we want to make the other more attractive) we can apply the over round in a disproportional way.

Let’s say that C7+S3 is a favorite underdog, and lots of players are placing bets on that side. If C7+S3 wins, we could get wiped out as the sportsbook on this one game, so we hedge our bets by increasing the vig by 80% on the side of C3+S5 and 20% on the side of C7+S3, like so:

CompetitorWin Probability
C3+S50.813
C7+S30.287
TOTAL1.100

As we’ll see later, this can drastically affect our profitability, depending on how the game concludes.

From Vig to Prices

Once we’ve set up our vig (either proportional or disproportional), we can turn these new (unfair) probabilities into prices.

For these examples, and the examples used throughout the entire project, we’ll be using American moneyline odds, where a positive (+) price means that the quoted side is an underdog and a negative price (-) means that the quoted side is a favorite.

In the American system, a +price means that you will win this amount for every $100 bet, while a -price means you must bet this amount to win $100 (more about that in a bit).

There are two equations we’ll use to set bet prices for two outcomes.

P(C7+S3|C3+S5)

Since C7+S3 is our underdog, we’ll calculate the probability of this player + deck combination winning, first.

The moneyline for this will be positive (+), so we use the following formula:

[math]x=\frac{100}{Probability}-100[/math]

“X” in this case, is the price we’ll quote to bettors.

Plugging in our probability that C7+S3 win, we get for the proportional vig:

[math]\frac{100}{0.2937}-100 ≈+240[/math]

Or for the disproportional vig:

[math]\frac{100}{0.287}-100 ≈+249[/math]

This means that, given a $100 bet for C7+S3 to win, a player would win either $340 total (including the $100 stake) if the sportsbook uses the proportional vig, or $349 total (including the $100 stake) if the sportsbook uses the disproportional vig.

P(C3+S5|C7+S3)

Now we turn to pricing bets for the favorite, C3+S5, who will have negative (-) moneyline prices.

For a favorite with a negative (-) moneyline price, we use the following formula:

[math]x=-\frac{100*Probability}{1-Probability}[/math]

“X”, again, is out desired bet price.

Plugging in our probability variable for a proportional vig, we find:

[math]-\frac{100*0.8063}{1-0.8063}≈-416[/math]

For the disproportional vig, we get:

[math]-\frac{100*0.813}{1-0.813}≈-435[/math]

This means that to win $100 a bettor must bet $416 for the proportional vig or to win $100 a bettor must bet $435 for the disproportional vig.

We can see the effects on both sides between the proportional and disproportional vig variations.

Long Run Example of Profitability

Let’s put all of this together and find out how much money we expect to make.

Let’s imagine four different outcomes:

  1. C3+S5 (the favorite) beats C7+S3 (underdog) with a proportional vig.
  2. C7+S3 (underdog) beats C3+S5 (the favorite) with a proportional vig.
  3. C3+S5 (the favorite) beats C7+S3 (underdog) with a disproportional vig.
  4. C7+S3 (underdog) beats C3+S5 (the favorite) with a disproportional vig.

To keep things simple, we’ll imagine that in each case, 100 bettors each place a moneyline on each side of the match.

This means that in every case we’ll examine below, there will be 100 bets on each side, regardless of price.

C3+S5 Wins, Proportional Vig

C7+S3 Wins,  Proportional Vig

C3+S5 Wins, Disproportional Vig

C7+S3 Wins, Disproportional Vig

Summary of Profitability

With these simple examples, we can see that the book maker’s unfair pricing has given it quite a cushion.

If C3+S5 wins in either case, the sportsbook loses no money, as shown above.

If, however, the underdog C7+S3 wins in either case, the sportsbook makes money.

The moneyline is priced in such a way that a bettor on the underdog winner wins far less than the bettor on the favorite paid for their “sure win”. And the sportsbook is indifferent to the bettor on a favorite to win because, it’s just a wash.

Given 1000 games with the outcomes above, we’d expect the following:

For a proportional vig:

For a disproportional vig:

Notice that in every case, our disproportional vig makes us (a little) more money.

While the pricing mechanics of real events is much more complex, this simple illustration tells us that we have the right stuff, in theory, to make money on trading card game bets.

Project 2: Trading Card Game Sportsbook Financial Calculator

Websites abound with information about how to translate moneyline prices into the bookmaker’s implied probabilities.

These webpages also include some discussion about how moneyline prices are read and what they mean. A few also have discussions about the concept of a vig or vigorish (the “over rounding” that a bookmaker does to the probabilities to bake in a profit for itself).

All of these websites speak to bettors.

In keeping with the project to theoretically model a sportsbook that takes bets on trading card game events, we want to model such variables as the house.

If we’re the sportsbook, we want to know the probable outcomes of different over round percentages, splits in the over round, competitor win probabilities, total money wagered on either side, and what our expected financial outcomes should be.

In this project, we accomplish all of these things with an Excel Calculator.

The Excel TCG Sportsbook Financial Calculator

If you’d prefer to see the Excel calculator first and skip (or save for later) the discussion on how it works and why, you can find it below:

Excel TCG Sportsbook Financial Calculator

Breaking Down the Tunable Parameters

The Excel Calculator gives us five parameters that can be changed. These are found under the Assumptions heading.

Let’s take a look at each of them.

Vig

The vigorish, or “over round”, is the markup the house puts on the probabilities it uses to quote the bettors its bet prices.

This can be viewed as the long term profit margin the house expects on the outcomes of events with similar probabilities.

For most popular sporting events, the over round hovers a bit below 5%.

The Excel Calculator allows a vig of between 0 and 25%.

(We should expect that if trading card game bet prices were set, the vig would be on the higher side, as they are more thinly traded.)

Matches Played

Here, the user can set the number of matches played between competitors.

This is a simplification, for illustrative purposes, because the probabilities are identical for each match played (whereas, in real life, we’d wish to go further into developing a Bayesian predictive model to account for changes in win probabilities for either side given a series of matches).

A more complex model deserves its own project, which will come soon enough.

These are tunable between 1 and 1,000,000 matches.

Vig Split

This allows the user to split the vig between the two competitors.

Often times, the book maker will not apply the vig equally to both, so as to help limit liability on one side of the event. Placing more of the over round weight to one side (one competitor) over another can make that side seem less attractive than the other, enticing bettors to place their money elsewhere.

Only the vig split on Player A is tunable. The vig split on Player B is automatically updated based on the input for Player A.

The Vig split for Player A can be between 1 and 100% (with Player B having the remainder).

Win Probability

The Calculator assumes that we know the win probabilities for either player.

How to arrive at these win probabilities in trading card games, at least, is the subject of another project. Here, we assume that we know them.

Only the win probability for Player A is tunable. The win probability for Player B is automatically updated based on the input for Player A.

The win probability for Player A can be between 0 and 1, inclusive (with Player B having the remainder).

Total Money Wagered

The parameters for total money wagered for either side of the matches played can be set to any amount between $1 and $1,000,000 in even dollar increments.

As noted in the Calculator, the total money wagered on either side is per match played. 

Reading Financial Outcomes

After the five tunable parameters have been set, we can see the financial outcomes of the selected series of matches.

Let’s assume we set our assumptions as follows:

  • Vig: 10%
  • Matches Played: 50
  • Vig Split: 70%/30%
  • Win Probability: 0.674/0.326
  • Total Money Wagered: $24,500/$47,850

Fair Probability & Over Round Probability

The fair probabilities are carried over from the Assumptions we placed in the Calculator.

Note that the sum of these probabilities will always sum to 1. There are fair probabilities, because they reflect our true beliefs about the winner of the matches.

The over round probabilities apply the vig and the vig split to each probability.

Since we placed the vig at 10% and weighted 70% of that vig on Player A and 30% on Player B, the Calculator applies those figures to each side accordingly.

Note that the probabilities sum to 1.1, meaning the fair probability sum of 1, plus the vig of 10%.

Moneyline

The Calculator gives us the moneyline for the players.

As we discussed in the post about setting odds and betting prices for trading card games, these prices use the American moneyline system for bet prices.

We see that Player A has a price of -291 (meaning that a bettor must bet $291 to win $100), while Player B has a price of +181 (meaning that betting $100 will win the player $181; plus the staked $100, in both cases).

Player A, as we should expect from our probabilities, is the favorite (with a negative quoted price), and Player B is our underdog (with a positive quoted price).

Money Wagered & Bettors to Win

The Calculator gives us the total money wagered on both sides for all events (remember, we put $24,500 on Player A and $47,850 on Player B on each of 50 matches).

It also gives us the bet liability for each side, or what bettors stand to win if they’re right.

The Bettors to Win calculation takes the total money wagered for each side and applies the moneyline price for each side to arrive at the liability figures.

Wins

The wins for each side simply applies the fair probability for each player as a proportion of the total number of matches we input.

Since we selected 50 matches, given probabilities of 0.674 for Player A and 0.326 for Player A, we expect Player A to win 34 matches and Player B to win 16 matches.

Financial Outcomes

Finally, given all of our assumptions, we have the expected financial outcomes for our venture.

With 50 matches, assuming our probabilities are correct, we expect to pay out the bettors on Player A a total of $1,109,379 and the winners on Player B a total of $2,191,674.

Our total handle, or the total money wagered by bettors, came to $3,617,500, on which we paid out $3,301,052.

That leaves us, the sportsbook, with a gross gaming revenue (GGR) of $316,448 for a profit margin of 8.7%.

Not too shabby!

Conclusions

Our Calculator allows us to model some basic assumptions about the financial viability of our sportsbook.

We can tune a number of parameters about each competitor and how we choose to price the bets we offer to bettors. We can experiment with how much money we’d need on either side to maintain profitability given these assumptions.

We’ve seen that, if it all works out more-or-less according to plan, the bookmaking business is good to us.

Please let me know if you have any questions or comments in the comments section below!

————

You can find a link to the completed Excel TCG Sportsbook Financial Calculator below:

Excel TCG Sportsbook Financial Calculator