Prediction – Nick Beaver

1.6 Fine-Tuning the TCG Sportsbook Model

Before we move on to simulating a TCG world tournament, there are a few issues that our working model needs to have addressed.

These issues are:

How to account for new decks that enter the competitive format for which we have no previous data.
How the sportsbook will manage the liabilities it owes to winning bettors on either side of a match.
How to predict the winner of the entire tournament from the outset and how bet prices are placed on this outcome.

These three issues will become more central when we move into the next post in this series. For now, let’s grapple with how to deal with these issues.

“Rogue” Decks

In the previous post, we determined how much a competitor’s selection of deck contributes to his or her win probability. This parameter, called “gamma” (represented by the Greek letter of the same name: Γ) works with our Glicko2 algorithm to determine this.

We noted that over time, the metagame (or the competitive environment) evolves. New decks emerge to beat existing decks, making older decks obsolete as time goes on. This is a natural process in trading card games and a big part of their draw and fun.

But what gamma do we assign to a brand new deck that’s never been seen before? If the current metagame landscape supports 15 decks, and an 16th deck enters the field, how will that deck compare to the other 15 and vice versa?

Consider the following:

The table above shows which decks existed in the format during each period by marking those that were used in each period with a “Yes” (and those that weren’t used in a period with a “No”). For example, we can see that in Period 1, only Decks 1, 2, 3, and 4 existed. In Period 2, Deck 5 entered the format. In Period 3, Deck 6 entered the format, but Deck 1 left, etc.

The World Tournament will occur in a future period, Period 13. What if a new deck, Deck 16, which no one has ever seen before in a tournament setting and for which we have no data, enters the format?

Certainly, given time, we’ll know this. But as the sportsbook, we have to be ready to give bet prices (and rate the underlying probabilities) before this data is available.

How do we rate these newcomers, or “rogue” decks?

Rating New Decks

To the synthetic tournament data from the regular season, we’ve added new columns to indicate whether a deck used by a competitor is “Old” (meaning existing prior to that period) or “New” (meaning that it debuted during that period).

To keep things making sense, all decks (Decks 1 through 4) that existed in Period 1 were rated as “Old”, since these decks would have been presumed to exist before the start of the season.

Those added details look like the below:

The revised .csv file can be seen here.

Using this additional detail, we can see how well each of the 15 unique decks used during the tournament season fared against any new deck. This is summarized on the table below. A new row and column titled “New” has been added.

Deck	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	New
1	0.5000	0.5618	0.5434	0.5050	0.6029	-	-	-	-	-	-	-	-	-	-	0.6029
2	0.4382	0.5000	0.5047	0.4969	0.5288	0.5271	0.4783	-	-	-	-	-	-	-	-	0.5088
3	0.4566	0.4953	0.5000	0.5326	0.4907	0.4961	0.5216	0.5245	0.5357	-	-	-	-	-	-	0.5121
4	0.4950	0.5031	0.4674	0.5000	0.4736	0.5053	0.4793	0.4806	0.4330	0.5063	0.4286	-	-	-	-	0.4268
5	0.3971	0.4712	0.5093	0.5264	0.5000	0.5111	0.4547	0.4832	0.5253	0.4760	0.5595	0.4821	0.6136	-	-	0.5227
6	-	0.4729	0.5039	0.4947	0.4889	0.5000	0.5095	0.5241	0.5433	0.5085	0.5224	0.5045	0.5137	0.4286	0.7105	0.5368
7	-	0.5217	0.4784	0.5207	0.5453	0.4905	0.5000	0.5116	0.4036	0.5452	0.5891	0.4494	0.4519	0.5306	0.4773	0.5392
8	-	-	0.4755	0.5194	0.5168	0.4759	0.4884	0.5000	0.5503	0.5444	0.4795	0.5804	0.5667	0.3684	0.6389	0.5273
9	-	-	0.4643	0.5670	0.4747	0.4567	0.5964	0.4497	0.5000	0.5402	0.5039	0.4491	0.4865	0.5439	0.7292	0.5364
10	-	-	-	0.4938	0.5240	0.4915	0.4548	0.4556	0.4598	0.5000	0.5000	0.4592	0.5565	0.5000	0.4412	0.4970
11	-	-	-	0.5714	0.4405	0.4776	0.4109	0.5205	0.4961	0.5000	0.5000	0.4745	0.4722	0.3909	0.6250	0.4964
12	-	-	-	-	0.5179	0.4955	0.5506	0.4196	0.5509	0.5408	0.5255	0.5000	0.5083	0.6538	0.4412	0.5421
13	-	-	-	-	0.3864	0.4863	0.5481	0.4333	0.5135	0.4435	0.5278	0.4917	0.5000	0.4649	0.3235	0.4667
14	-	-	-	-	-	0.5714	0.4694	0.6316	0.4561	0.5000	0.6091	0.3462	0.5351	0.5000	0.5833	0.5341
15	-	-	-	-	-	0.2895	0.5227	0.3611	0.2708	0.5588	0.3750	0.5588	0.6765	0.4167	0.5000	0.5000
New	0.3971	0.4912	0.4879	0.5732	0.4773	0.4632	0.4608	0.4727	0.4636	0.5030	0.5036	0.4579	0.5333	0.4659	0.5000	0.5000

This seems, however, an unsatisfactory solution. Yes, we know how each deck fared against a new entrant, but are all new entrants alike?

Does each existing deck have the same potential against each existing deck? Intuition tells us that this isn’t correct, that some deck types or strategies fare better against others because of their qualities. For instance, an “Aggro” deck may do very well against a “Mid Range” deck, but fare poorly against a “Control” deck.

We need to consider these “deck styles”.

Rating Deck Styles

Our season tournament data contains fifteen unique decks, numbered 1 through 15.

To these we added one of five deck styles: Aggro, Combo, Control, Mid Range, and Mill. These were assigned randomly.

The outcome of this assignment is as follows:

Deck	Style
1	Control
2	Combo
3	Aggro
4	Combo
5	Mid Range
6	Mid Range
7	Control
8	Combo
9	Control
10	Control
11	Mill
12	Mill
13	Combo
14	Mid Range
15	Aggro

To the same season tournament data, we add the style detail to each deck for each matchup.

Those details look like the below:

The revised .csv file can be seen here.

Using this added detail, we can now see how well each deck style does against each deck style and vice versa. We’ve also kept the detail for matchups against “new” decks.

The data are summarized on the table below:

Deck	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	New	Aggro	Combo	Control	Mid Range	Mill
1	0.5000	0.5618	0.5434	0.5050	0.6029	-	-	-	-	-	-	-	-	-	-	0.6029	0.5434	0.5324	0.5000	0.6029	-
2	0.4382	0.5000	0.5047	0.4969	0.5288	0.5271	0.4783	-	-	-	-	-	-	-	-	0.5088	0.5047	0.4985	0.4490	0.5282	-
3	0.4566	0.4953	0.5000	0.5326	0.4907	0.4961	0.5216	0.5245	0.5357	-	-	-	-	-	-	0.5121	0.5000	0.5184	0.4913	0.4931	-
4	0.4950	0.5031	0.4674	0.5000	0.4736	0.5053	0.4793	0.4806	0.4330	0.5063	0.4286	-	-	-	-	0.4268	0.4674	0.4981	0.4795	0.4876	0.4286
5	0.3971	0.4712	0.5093	0.5264	0.5000	0.5111	0.4547	0.4832	0.5253	0.4760	0.5595	0.4821	0.6136	-	-	0.5227	0.5093	0.5025	0.4738	0.5052	0.5286
6	-	0.4729	0.5039	0.4947	0.4889	0.5000	0.5095	0.5241	0.5433	0.5085	0.5224	0.5045	0.5137	0.4286	0.7105	0.5368	0.5183	0.5033	0.5192	0.4911	0.5142
7	-	0.5217	0.4784	0.5207	0.5453	0.4905	0.5000	0.5116	0.4036	0.5452	0.5891	0.4494	0.4519	0.5306	0.4773	0.5392	0.4783	0.5109	0.4865	0.5171	0.5361
8	-	-	0.4755	0.5194	0.5168	0.4759	0.4884	0.5000	0.5503	0.5444	0.4795	0.5804	0.5667	0.3684	0.6389	0.5273	0.5000	0.5133	0.5235	0.4845	0.5233
9	-	-	0.4643	0.5670	0.4747	0.4567	0.5964	0.4497	0.5000	0.5402	0.5039	0.4491	0.4865	0.5439	0.7292	0.5364	0.5606	0.4909	0.5407	0.4752	0.4788
10	-	-	-	0.4938	0.5240	0.4915	0.4548	0.4556	0.4598	0.5000	0.5000	0.4592	0.5565	0.5000	0.4412	0.4970	0.4412	0.4845	0.4730	0.5042	0.4828
11	-	-	-	0.5714	0.4405	0.4776	0.4109	0.5205	0.4961	0.5000	0.5000	0.4745	0.4722	0.3909	0.6250	0.4964	0.6250	0.5138	0.4694	0.4487	0.4893
12	-	-	-	-	0.5179	0.4955	0.5506	0.4196	0.5509	0.5408	0.5255	0.5000	0.5083	0.6538	0.4412	0.5421	0.4412	0.4506	0.5474	0.5386	0.5129
13	-	-	-	-	0.3864	0.4863	0.5481	0.4333	0.5135	0.4435	0.5278	0.4917	0.5000	0.4649	0.3235	0.4667	0.3235	0.4725	0.5000	0.4638	0.5114
14	-	-	-	-	-	0.5714	0.4694	0.6316	0.4561	0.5000	0.6091	0.3462	0.5351	0.5000	0.5833	0.5341	0.5833	0.5737	0.4745	0.5349	0.4813
15	-	-	-	-	-	0.2895	0.5227	0.3611	0.2708	0.5588	0.3750	0.5588	0.6765	0.4167	0.5000	0.5000	0.5000	0.5143	0.4365	0.3514	0.4595
New	0.3971	0.4912	0.4879	0.5732	0.4773	0.4632	0.4608	0.4727	0.4636	0.5030	0.5036	0.4579	0.5333	0.4659	0.5000	0.5000	0.4887	0.5208	0.4650	0.4703	0.4850
Aggro	0.4566	0.4953	0.5000	0.5326	0.4907	0.4817	0.5217	0.5000	0.4394	0.5588	0.3750	0.5588	0.6765	0.4167	0.5000	0.5113	0.5000	0.5182	0.4838	0.4845	0.4595
Combo	0.4676	0.5015	0.4816	0.5019	0.4975	0.4967	0.4891	0.4867	0.5091	0.5155	0.4862	0.5494	0.5275	0.4263	0.4857	0.4792	0.4818	0.5000	0.4938	0.4931	0.5118
Control	0.5000	0.5510	0.5088	0.5205	0.5262	0.4808	0.5135	0.4765	0.4593	0.5270	0.5306	0.4526	0.5000	0.5255	0.5635	0.5350	0.5162	0.5062	0.5000	0.5052	0.4978
Mid Range	0.3971	0.4718	0.5069	0.5124	0.4948	0.5089	0.4829	0.5155	0.5248	0.4958	0.5513	0.4614	0.5362	0.4651	0.6486	0.5297	0.5155	0.5069	0.4948	0.5000	0.5112
Mill	-	-	-	0.5714	0.4714	0.4858	0.4639	0.4767	0.5212	0.5172	0.5107	0.4871	0.4886	0.5187	0.5405	0.5150	0.5405	0.4882	0.5022	0.4888	0.5000

This is more satisfactory. We can theorize which new decks might enter the format and use these styles as comparisons for our gamma parameter.

For instance, looking at Period 12, which precedes the upcoming Period 13 in which the world tournament will occur, we can see only the following decks in the format (with their corresponding styles):

Deck	Style
6	Mid Range
7	Control
8	Combo
9	Control
10	Control
11	Mill
12	Mill
13	Combo
14	Mid Range
15	Aggro

We can theorize that some of the older decks might drop out of the format by the time the world tournament occurs and about which new decks and deck styles will enter to fill the vacuum.

If by the time of the world tournament, Decks 6 through 10 drop out (because perhaps they are uniquely week to Deck 15, the latest entrant, which was designed to beat the old “best” decks), our format would look like this:

Deck	Style
11	Mill
12	Mill
13	Combo
14	Mid Range
15	Aggro

What new decks will emerge to exploit the power vacuum left in such a competitive environment?

We now have the tools to consider this and give probabilities.

Sportsbook Risk Management

If the house does not carefully manage its risk, an upset outcome of an event can ruin it.

Book makers try to keep the liability, that is, the amount of money it will pay out to winners of one side, as equal as possible on both sides of an event.

Three Scenarios & Six Outcomes

Consider the following:

We have Player A vs. Opponent B in a match under three different scenarios.

In Scenario 1, the liability for both players is completely independent.
In Scenario 2, the liability for both players is identical.
In Scenario 3, the liability for both players is managed to be within a narrow margin of one another.

As can be seen in the outcomes in the lower part of the table, unmanaged risk can ruin the house. A loss of 53.7% is simply catastrophic and absolutely unacceptable in outcome 1B (that is, in Scenario 1 if Player B wins). This potential loss, no matter how probable or improbable, is not weighed evenly by the potential upside (that is, in Scenario 1 if Player A wins), which gets up a GGR margin of 37.3%. We can see that we should expect, either way, a long run margin of -8.2%. Also unacceptable.

In Scenario 2, the liabilities on both sides are identical, and so, too are the payouts to players. This is ideal and lands us a tidy profit. But reality is never so good to us.

In Scenario 3, the sportsbook is managing its risk by limiting bets on either side so that that were within some close margin to one another. The fact that the profit in Scenario 3 is higher than in Scenario 2 is the result of randomness; we should expect perfect parity to be the best option, and with risk management, we’re trying to get as close to perfect parity as possible.

If the liability is equal on both sides, the house is indifferent to the outcome of the game. No matter which side wins, the house gets its cut. We’re happy with that. It this ideal—perfect parity of liability—that we’re seeking.

How We’ll Model Risk Management

So how do we put this into practice in our model?

In our next post in which we model the outcome of a world tournament, we’ll assume that our traders are managing risk by limiting bets on either side so that they are roughly equal.

In each simulated game, we will apply the following rules:

The handle on “Player A” will be a random dollar amount between $5,000 and $10,000. This creates an independent variable.
The handle on “Player B” will be based on the handle for Player A as follows:
- - The handle will be randomly determined to be that of Player A within -10% and +10%.
  - If our calculated win probability for Player B is less than 0.5, we will apply a divisor to the handle we take for the player (see below).
  - If our calculated win probability for Player B is greater than 0.5, we will not further modify the handle for Player B.

The divisor applied to a Player B with a less than 0.5 win probability is:

[math] Divisor = \frac{Moneyline (Player A)}{100} [/math]

This risk management model can thus be summarized as follows:

The more than Player A is favorited over Player B (or the more than Player B is an underdog) the more handle is limited for Player B. This is because, as in our example scenarios and outcomes above, an upset win by an underdog can wipe out the house.

The noise in the model (the -10 to +10% differential) helps keep things from being perfect. We shouldn’t expect perfect liability parity. This model helps bring us within striking distance of perfection and, I think, is reasonable for a real world application.

Predicting the Winner of a World Tournament

What is the probability that a given player invited to the world tournament will win the entire event? What place do we think each competitor will get? Can we set bet prices for these outcomes?

The Bradley-Terry Model (BTM) allows us to find reasonable outcomes.

BTM uses a “preference” algorithm that allows for a comparison between each competitor based on their relative strengths, then gives us a win probability (all of which, by the way, sum to 1, which means that these are probabilities for each competitor to win the whole shebang).

Without giving away too much about the next post, in which we simulate a world tournament, we can assume that we have the following players for which we wish to compute a win probability to win the entire event: Players A through H.

For all players, we calculate their win probabilities against one another (arbitrarily chosen for the sake of this example):

	A	B	C	D	E	F	G	H
A	0.5000	0.5015	0.5175	0.6842	0.5153	0.6529	0.5701	0.6023
B	0.4986	0.5000	0.4582	0.6861	0.5141	0.6001	0.5701	0.6028
C	0.4826	0.5418	0.5000	0.6696	0.5555	0.6375	0.5528	0.5860
D	0.3158	0.3140	0.3305	0.5000	0.3262	0.4663	0.3751	0.4124
E	0.4848	0.4859	0.4445	0.6739	0.5000	0.5867	0.5561	0.5895
F	0.3472	0.3999	0.3625	0.5339	0.4133	0.5000	0.4093	0.4461
G	0.4299	0.4301	0.4473	0.6249	0.4441	0.5909	0.5000	0.5361
H	0.3977	0.3973	0.4141	0.5876	0.4106	0.5540	0.4639	0.5000

To make life easy for us, we’ll employ the BTM with an excellent Excel plugin from Charles Zaiontz at Real Statistics.

With this table and the =BT_MODEL() function from this plug in, we get the following:

Player	Probability
A	0.1420
B	0.1384
C	0.1414
D	0.0950
E	0.1350
F	0.1066
G	0.1251
H	0.1164

This means, for example, that Player A is estimated to have a 14.2% probability of winning the tournament between these 8 players. Likewise, Player B has a 13.84% probability.

We can assign bet prices to these probabilities using our previously established methods. We’ll assume an “overround” of 10% on the real computed probabilities of winning to bake in our bookmaker’s profit.

Doing so, we get the following moneyline odds:

Player	Probability	Overround	Moneyline
A	0.1420	0.16	+540
B	0.1384	0.15	+557
C	0.1414	0.16	+543
D	0.0950	0.10	+857
E	0.1350	0.15	+573
F	0.1066	0.12	+753
G	0.1251	0.14	+627
H	0.1164	0.13	+681

This means that, for example, a bet placed on Player A to win the tournament would win a total of $640 for a $100 stake.

These probabilities, and their corresponding prices, relate to each player at the outset of the tournament, before any games have been played. After each update (e.g., after games occur and players either win or lose), the field will shrink as players are eliminated and any new bets placed on the winner of the whole event will need updated prices. The BTM can still do this for us.

Suppose, after Round One, Players C, D, F, and H are eliminated, leaving only Players A, B, E, and G.

Any new bets placed on the ultimate winner would be based on these matchup probabilities:

	A	B	E	G
A	0.5	0.501526091	0.515309726	0.570093835
B	0.498600849	0.5	0.514052912	0.570052391
E	0.484817054	0.485947088	0.5	0.556053431
G	0.429906165	0.430075784	0.44407562	0.5

Applying the same methods, we come to updated moneyline odds of:

Player	Probability	Overround	Moneyline
A	0.2608	0.2869	+283
B	0.2603	0.2864	+284
E	0.2533	0.2787	+295
G	0.2255	0.2480	+344

A bet on Player A now to win the tournament will yield $383 for a $100 stake. As the outcomes become more certain, the odds become shorter, and the payouts smaller.

We’ll quote prices like these and track their profitability for the house in our simulation of the world tournament.

Next: Simulating the World Tournament

In the next post, we’ll select the top 32 players from the tournament season and invite them to play in a World Tournament.

These players will go head-to-head in a single round elimination tournament that will last five rounds until a winner is declared.

We’ll take bets on each match and track our profitability along the way.

We’ll simulate this 1,000 times and analyze the results.

Finally, we’ll put everything we’ve discussed in this project together and see if our model proves viable and where there might be opportunities for improvement.

1.4 Building a Competitor Model: Glicko Scores

In the last post, we reviewed setting win-loss odds and betting prices for trading card games. In that last segment, we were given the players’ win probabilities.

In this post, we’ll take an important step (but not the last one) in generating those player win probabilities.

We’ll use a ratings system, called Glicko/Glicko2, to rate each player in a game. These ratings give a value to each player’s skill, and can be used to compare one against another to generate win/loss probabilities between them.

Glicko/Glicko2 Ratings System

Competitive games like chess have long used the Elo rating system, developed by Arpad Elo, to rate the skill level of players.

The Elo system assigns a numerical value to each player at the start of their careers. For Elo, new players start off with a rating of 1000. Each time a player wins or loses against another player, the winning player’s Elo rating increases while the losing player’s Elo rating decreases. This allows changes over time in a player’s skill level to be tracked.

The scale of the Elo rating system is logarithmic, meaning that a player with a rating of 1600 isn’t 60% better than a player with a rating of 1000, but rather, ninety eight times better. This logarithmic scale also has the advantage that if a weaker player beats a stronger player, the weaker player’s rating rises faster than beating a player of similar skill while if a stronger player beats a weaker player, the stronger player’s rating rises slower than beating a player of similar skill.

Mark Glickman, Fellow of the American Statistical Association and Senior Lecturer on Statistics at the Harvard University Department of Statistics (among other accolades), invented a new spin on the traditional Elo system called Glicko (and later, Glicko2).

These systems make several improvements on the Elo methodology that are key for trading card games:

Glicko/Glicko2 incorporates a rating “deviation” (σ). Player deviations begin at 350 and change based on how often a player plays. This helps us measure our certainty about how accurate a player’s rating is as well as allows for a sort of “rust” factor to occur, whereby players who don’t play for a long time have their deviations drift back to that of a new player (at the σ=350 level).
With the deviations in mind, ratings are typified as credible intervals. For instance, a player with a rating of 1750 and σ=150 has a credible rating of ~1600 to 1900. The more the player plays, the more the deviation shrinks and the narrower and narrower the credible interval becomes (and thus, the greater and greater our certainty about that rating).
The deviations are not just window dressing: they are an integral part of predicting the win/loss probabilities between two competitors. A player with a lower σ is predicted to win more reliably than one with a higher σ, all else being equal.
Glicko uses a constant variable (“c” or the “c-value”) which limits how much a player’s rating can change due to the outcome of a single match; this limits wild fluctuations due to upset wins or losses, as well as sets how quickly (or slowly) a player without play has his or her deviation drift back down to its initial level (350).
Glicko2 takes a “volatility” variable (τ) into account. This variable rates how much luck is a factor in the game at hand, and helps regulate probabilities between players with different levels of volatility. Similar to σ, one competitor with a high τ, but similar skill, with be predicted to perform worse against another player with lower τ, all else being equal. Luck is accounted for.

For these reasons, we’ll use the Glicko2 system for rating competitors.

You can find an excellent paper on the Glicko System by Mark Glickman here, as well as his follow-up for Glicko2, here.

To apply the Glicko2 system, we’ll generate some synthetic (made up) player match data and use the PlayerRatings package in R, which incorporates the Glicko2 system in its functions.

Synthetic Tournament Data

A .csv file was set up to represent a tournament season.

The file has four fields:

Period (numbered 1 through 12)
Player (numbered 1 through 500).
Opponent (numbered 1 through 500).
Result (0, 0.5, or 1).

Each “Period” represents a single month during the tournament season (Period 1 = January, Period 2 = February, etc.). We assume that the tournament season runs from January to December of a single year.

Players and Opponents are numbered from 1 to 500. Each of these is a unique competitor. For each matchup in the following steps, the Player/Opponent represents the two competitors involved in each match.

The Result records a loss (0), a draw (0.5), or a win (1). These Results apply to the Player vs. the given Opponent. In the first match in the file, we see the following:

This means that in Period 1, Player 121 played Player 444 and Player 121 lost (thus, Player 444 won).

Each Period records 1000 matches between randomly selected Players and Opponents with random Results.

Thus, the .csv file records 12,000 randomly generated matches. All of this was done with Microsoft Excel.

The file can be found here.

Using R to Generate Glicko Ratings

Using the R programming language via R Studio, the following steps were performed:

Load the PlayerRatings package.
Load tournament data .csv.
Set the tournament data as an R data frame properly reporting Period, Player, Opponent, and Result.
Set starting player Ratings, Deviation, and Volatility.
Run the Glicko2 algorithm for all players with the tournament data.
Convert the output from the algorithm to a new data frame with each player’s final ratings, deviation, volatility, and game history
Export the output to a new .csv file.

For this simulation, we’ve given all the players the following attributes:

Initial rating of 1500
Initial deviation of 350
Initial volatility of 0.6

The rating of 1500 and deviation of 350 are recommended by Mark Glickman for initial ratings. He suggests an initial volatility of between 0.3 (for low randomness games, like chess) to as much as 1.2 for high randomness games. We’ve chosen 0.6 for a trading card game, due to its higher randomness factor (i.e., the order in which cards are drawn from decks).

For the ratings algorithm, we’ve chosen a constant (“cval”) of 60. This means, approximately, that a player without play would see his or her deviation return from whatever level it is currently to the initial deviation of 350 in approximately three years of non-activity.

The volatility and cval should be evaluated for any given trading card game and will be the subject of future studies to determine the appropriate levels for games like Pokémon, Magic: the Gathering, and Yu-Gi-Oh, separately. For now, we’ve settled on these values for this demonstration.

You can see the R code, below.

# Step 1: Load PlayerRatings Package
library(PlayerRatings)

# Step 2: Load tournament data
tournamentdata <- read.csv("synthetic_season_1_match_data.csv", 
head=TRUE, sep=",")

# Step 3: Convert tournament data to data frame
season1matches <- data.frame(Period=tournamentdata$Period, 
Player=tournamentdata$Player, 
Opponent=tournamentdata$Opponent,
Result=tournamentdata$Result)

# Step 4: Set starting ratings, deviation, and volatility for all 
# players
startratings <- data.frame(Player=seq(1, 500, 1), Rating=rep(1500,500), 
Deviation=rep(350,500), Volatility=rep(0.60,500))

# Step 5: Run Glicko2 algorithm for all players with data
season1ratings <- glicko2(season1matches, status=startratings, cval=60, tau=0.60)

# Step 6: Set results of the algorithm as a new data frame with 
# reported rating, deviation, volatility, and game history
season1finals <- data.frame(Player=season1ratings$ratings$Player, 
Rating=season1ratings$ratings$Rating, 
Deviation=season1ratings$ratings$Deviation,
Volatility=season1ratings$ratings$Volatility,
Games=season1ratings$ratings$Games, 
Win=season1ratings$ratings$Win,
Loss=season1ratings$ratings$Loss,
Draw=season1ratings$ratings$Draw,
Lag=season1ratings$ratings$Lag)

# Step 7: Export results in a new.csv file
write.csv(season1finals, "season_one_final_ratings.csv")

You can replicate the steps shown above in R.

Final Ratings

Running the steps shown above, we get a neatly formatted .csv file that reports player data at the end of the season (e.g., at the end of the 12,000 games played over 12 months).

Looking at the first few entries in this output, we see the following:

We find that Player 42 came out on top with a Rating of ~1827, a Deviation of ~183, and a Volatility of ~0.62.

We can draw the following conclusions about Player 42:

A Rating of ~1827 implies that Player 42 ~5.5 times more skilled than an average player (Rating 1500), ceteris paribus (given the same σ and τ for both players).
Player 42’s σ has fallen from 350 to ~183. This is expected, as the more games a player plays, the lower σ becomes, as we can assume the credible interval of the player’s skill is closer and closer to the reported Rating. (Note that many other players have even lower σ, because they have played more reliably throughout the season).
Player 137’s τ (~0.62) is about unchanged and matches that of the system of a whole. This is expected for a player with top placement who’s wins/losses/draws have been less due to luck and more due to skill over the course of the season.

Given the structure of the Glicko/Glicko2 system, we can confidently say that Player 42’s true skill level is somewhere between ~1644 to ~2,009. Given the player’s high volatility, we should err to say that the player’s real skill is closer to this lower bound.

The completed output file can be found here.

Generating Win/Loss Probabilities

With these data, we can generate win/loss probabilities for future matchups, which is what we need for our TCG Sportsbook project.

Let’s pit the top two players against one another:

This can be done easily in R with the predict function.

# Predict the probability of Player 42 winning against Player 67
predict(season1ratings, newdata=data.frame(Player=42, Opponent=67, tng=1))

The output we receive is:

[1] 0.6926854

This means that Player 42 has a win probability of ~0.69 against Player 67 in a hypothetical match between them.

Next Steps: Deck Factors

We’ve demonstrated an ability to give players ratings based on their skill from tournament data.

The next issue we’ll have to address is that of deck strategy.

Glicko/Glicko2 (and its forebear, Elo), were made to gauge skill in low randomness games like chess. In games like these, both players come to the table with identical game pieces. Both sides have the same number of pawns, knights, rooks, bishops, etc. Both sides have these pieces in the same starting position.

Trading card games have a higher level of randomness due to the cards in use (which, in part, we addressed by setting the initial Volatility at 0.6 for the algorithm). Each competitor could have a very different deck of cards, or maybe even the same theme of deck, but with a different card list.

All decks don’t perform equally well against one another in every match up. Some decks are simply superior to others, or at least, have very lopsided matchups (where Deck A is three times as likely to win against Deck B, for example), ceteris paribus.

The predict function in R gives us the ability to take such factors into account via a gamma variable (Γ). We’ll use this in the next phase of the project. Γ will be the stand in for the decks in use by either player and allow us to account for how well those decks match up against one another.

Project 1: What if a Sportsbook Offered Odds on Trading Card Games?

Background

I’ve been a lifelong fan of trading card games.

Ever since the Star Wars CCG (Customizable Card Game) in 1995 and later, Pokémon TCG (Trading Card Game) in 1999 (in the U.S.), I’ve been hooked.

Trading card games are games of skill where two competitors construct decks of cards from those available in the game and play against one another.

One player wins and another loses. (Sometimes, there is a draw.) These games are “zero-sum” in this way.

Working in the gambling industry, as I have, for 10 years now led me to ask: “What if a sportsbook placed betting prices on the outcome of trading card game events like they do for professional sports events?”

Basically: what if you can bet on games like Pokémon and Magic: the Gathering or Yu-Gi-Oh?

What would this take to make work? What are the theoretical concepts than underpin such an endeavor? What kind of profit could the sportsbook expect?

I attempt to answer these and more during the course of this project.

I aim for this project to change and evolve as its proceeds, knowing that the final conclusions I draw may be very different from my starting assumptions.

I hope also to get some comment from readers to help improve what’s being done here.

This project is both a demonstration and also some food for thought.

Objectives

My objectives for this project are:

Demonstrate how Bayesian Inference can help us construct a predictive model for two-player, winner-take-all events (card games).
Demonstrate how, given the probabilities assumed by these inferences, odds and betting prices by a fictional sportsbook (“TCGBook”) can be set.
Model the outcomes of fictional and real matchups in a trading card game tournament setting.
Model the profit and loss of our fictional sportsbook (“TCGBook”).
Open these ideas to the public for comment, critique, and improvement.

Limitations

Before starting on this quest to model our TCGBook endeavor, it is important that I acknowledge a few key limitations.

We compare our subject, trading card games, to the tried-and-true professional sports leagues on which our sports betting idea and models are largely based.

Data Availability

The data for card game events can be very hard to come by.

Most of the data sources are compiled by fans of the games and not the hosts or producers of the games themselves. The “big dogs”, as it were, do not wish to disclose their proprietary information. Or at least, not all of it. Maybe they never thought to or they not in a place to do this regularly.

The fans that do this tireless service for us should be acknowledged for their efforts, both for this project, and more importantly, for the fandom and playerbases of these games.

That being said, much of the data that we would like to have is simply unavailable or is, at best, incomplete.

In real sports betting, sportsbooks are able to rely heavily on data aggregators to compile every conceivable bit of data about sports, events, scores, goals, fouls, players, training, coaches, etc. This isn’t the case for trading card games. The interest and size of the market just isn’t the same. It’s much smaller.

We would love to see data on each major tournament, broken down by each round. We would love to see player data reported with unique ID keys to keep variations of a player’s name or misspellings from confusing the data. We’d love to see local, sanctioned tournament data, too. But these are not realities.

We will work within these limitations and show that, at least conceptually, our idea is possible.

We’ll focus only on the widely available data, namely that from major tournaments and the highest ranked players and best known strategies.

Nature of Trading Card Game Events

Trading card game events don’t work like professional sports matches.

In professional sports matches, we know which team will play against which team and on what date. This allows the sportsbook advance knowledge of these events and gives it time to compute odds and set prices. Season schedules for any major sport are announced well ahead of time.

This is not the case for card game tournaments.

At local tournaments, anyone can show up with a deck to sign up to play. At major events, any number of qualified players can show up (or not show up). Add to this the possibility of any given strategy (i.e. deck of cards) being used by any competitor, and the matchups are simply unknowable ahead of time.

In this project we will simply ignore this as a problem. We will make the assumption that the odds are set sometime in advance of the event taking place (maybe just minutes before). Making this assumption allows us to proceed to demonstrate our ideas.

Feasibility of Taking Bets

This project isn’t a serious attempt to find a way to start taking bets on trading card games.

This may or may not be legal in any jurisdiction, and what is proposed in this project is not legal advice nor an inducement to try and make this work outside of the law.

To complicate matters, the participants of many card game events are under the legal age to gamble in many places.

Nowadays, most jurisdictions (at least in the U.S.) allow betting on college sports, where the expectation is that competitors are least 18 years of age.

Whether or not taking bets on such events would fly with gaming regulators is not considered here. This is about proving a concept (and having fun while doing it).

Don’t take anything in this project too seriously as far as making money at gambling on trading card games goes.

This is a big “what if” sort of project.

Assumptions

With our objectives in mind, and our limitations outlined, we’ll make the following assumptions for this project:

All probabilistic modelling will be based on Bayesian (not Frequentist) inference.
We will briefly discuss, but largely ignore, the outcome of ties. We care only about win probabilities (and consequently, not win probabilities).
Win probabilities are expected to describe the win probability of matches; that is, “best two-out-of-three” matches in which the first competitor to win two games, wins the match. (This is the circumstance which often contributes to a draw between players: a time limit for the match it met with neither player having a decisive, tie-breaking win).
The outcomes we seek are not only probabilistic, but also commercial: this is about setting bet prices for potential bettors. As “the house”, we expect to make money in the long run. Our models, odds, and prices will reflect that desire.
As mentioned previously, we assume that we know who is playing and which deck they are using before the match. We know the identities of players and the decks they each use beforehand, thus, giving rise to our probabilities for each player to win and the consequent bet prices for each side of the match.
While I will take time to explain many of the theories and logic behind each step we take in this project, I will assume that readers have some familiarity with the mathematics of probability, statistical inference, the software systems we’ll use, and the games we are speaking about. Feel free to ask in the comments if you’re unsure about something!

Segments

The project is broken down into the following segments, each with its own dedicated page:

1.1 Bayesian Inference & Conditional Probability
1.2 Modelling Competitor + Strategy Probabilities
1.3 Win/Loss Odds & Betting Prices
1.4 Building Competitor A Model: Glicko Scores
1.5 Building a Strategy Model: Deck Quality and the Metagame
1.6 Fine-Tuning the TCG Sportsbook Model
1.7 Simulating a TCG World Tournament (1,000 Times!)
1.8 TCG Sportsbook Finished White Paper