Simulated Games, Part 1

"Estimated probability of victory in one-to-one combat: 48.027 %"


β€”Computron, mainly to himslef,Β Money is Everything

This is the first in a new series of notes in which we'll try to understand some general properties of the Transformers TCG metagame through the use of simulated games. Previously, we've utilized simulated flips to predict attack and defense bonuses, as well as probabilities of pattern-matching. We've used these simulations mainly as a tool to optimize our deck composition. However, In this new series we'll adopt a more drastic approach, and simulate entire games. Both teams will be swinging back and forth until one team is knocked out. Cards will be flipped during each battle, attack and defense totals calculated, damage dealt, etc.

To reach this goal, many basic assumptions need to be made. We can adapt our simulations over time to keep or drop these assumptions as needed. Designing these simulations, requires usΒ  to understand general game strategies well enough to encode them. We will need to quantify concepts like the risk we're willing to take in choosing our attacker, the expendability of our characters, how aggressively we intend to play, etc. We won't start by including all these factors at once, we do not attempt to encode human mistakes, or unusually clever play. Instead, we’ll try to describe the more deterministic aspects of a Transformers TCG match, as they represent the benchmark to which we'll compare the effect of exceptional plays, human mistakes, and card effects.

We've said that our main goal is trying to figure out the most general properties of the metagame to better understand the advantage or disadvantage that the different game strategies have when facing one another. In the Transformers TCG these game strategies are often represented, at the most simplistic level, by the combination of our team size, and the color composition of our deck. Of course the specific members of our team will play a major role in a real game, as well as our card selection. All these factors are intentionally removed from the initial scenario we're going to describe.Β 

Additionally, what kind of advantage we gain, or how much disadvantage we get, when the size of our team changes over the course of a game? Even when we're not combining five Aerialbots into a Superion, characters still get knocked out. Metroplex starts 1-tall, widens his team size, then reduces it again. Battle masters shrink their team to a narrower lineup of bots on steroids, etc. We can even change our team size through sibeboarding. Understanding the effect these changes have on the wheel turns is paramount.Β 

Finally, given the option, and after a first look at our opponent's lineup, should we decide go first? This decision alone can oftentimes sway a matchup in our favor or greatly decrease our chances of victory. These questions, their approximate answers, and further questions arising from these answers will be some of the subjects of this group of notes on simulated games.

Part I - Homogeneous Teams

Our first simulations will assume homogeneous teams, i.e. lineups of identical characters. We'll see that this assumption introduces an enormous simplification in determining the optimal sequence of attacks. Therefore, it looks like a good place to start. We'll consider five different team sizes, ranging from 1 to 5 characters, denoted 1 and 2-tall, 3, 4 and 5-wide. The game statistics of these characters are listed below, as well as five different supporting decks: mono blue, mostly blue, balanced, mostly orange, and mono orange. Any team-deck combination is considered, for a total of 25 different archetypes. For each possible matchup, i.e. archetype A vs. archetype B, 20'000 entire games are simulated, and both their outcome and duration in turns are recorded. In 10'000 games it is assumed that A plays first (A is player 1, or P1) while B goes second (B is player 2, or P2 in the following). The remaining 10'000 games assume that B is P1, while A is P2. This accounts for 600 different matches, and 6 million simulated games.

Team Size

As of wave 3, a look at how base attack and defense vary with the star cost of a character doesn't show any clear scaling. On the other hand, health shows a good correlation, allowing us to evenly split a total health of about 30Γ·35 among the characters on our team. We'll keep the base defense constant, and equal to 2, and assume a mild increase of the base attack value from 2 to 6. Therefore, for each team size, we'll assume characters stats as listed in table 1. Setting an upper bound of 6 on the base attack value (as well as other assumptions we're making here) will have important effects on the outcomes of our simulations. Therefore, it's worth keeping these inferences regarding the relationship between team size and health, and attack value in mind when evaluating our results.

Deck Composition

Choosing archetypal deck compositions is even more ambiguous. In our first simulations, we'll start with the options listed in table 2. Notice that we're not simulating the specific effect of battle cards. On the contrary, we'll describe their effect in average terms. Each deck provides its team with average values of Bold and Tough as listed in the table. We'll assume that this value is also granted on turn 1. This is done for several reasons: 1) In a very aggressive/defensive strategy, we might choose characters with innate values of Bold/Tough; 2) This assumption makes for an easier simulation (and simplicity is all this first scan is going to be about); 3) We'll remove this turn 1 bonus later on, as a tool to discuss the advantage/disadvantage of choosing to go first. We should keep all these assumptions in mind in interpreting the results we're going to get.

Table 1

Table 2

Simulated Strategies (or lack thereof)

In a real game, it's reasonable to assume that both players have good information of both their average attack bonus, and their opponent's average defense bonus. And even when facing an unknown opponent, a quick check of their scrap pile should provide us with enough information to make an educated guess of their deck’s average defense. Therefore, it's reasonable to assume that we always have at least an approximate knowledge of the probability of knocking out a defending character, given the attacker we choose. We also know how much this will expose our attacking character to the possibility of being knocked out later on, and this is where the expendability of our characters comes into play. At the same time, we often prioritize knocking down the least expendable of our opponent's characters. These objectives are often at odds with each other. Therefore, in simulated strategies, priorities need to be quantified in terms of their relative weights. These weights will be the in silico version of what we'd call a play style.

Combining these, and other criteria together in programming the attack sequence will be the subject of future simulations. As of now, we'll consider the simplest scenario, arising from considering homogeneous teams, and only averaged effects of battle cards. In this case, the previous criteria lead to one simple rule: The least damaged untapped character always attacks the most damaged viable defender. As this initial set of simulations serves as a proof of principle for this modeling system, we'll start with this very simple case, and leave more diverse combat scenarios for later.

Simulated Games

Without further ado, here are the results of our first set of simulations. Matches are listed according to the deck composition of P1 (listed from mono blue to mono orange). They are sorted together for different team sizes (from 1-tall to 5-wide, click on the figures to scroll through the results). In each case, two panels are shown. The left panel shows the winning rate of P1 for all possible matchups. The possible team sizes of P2 are sorted by rows, their possible deck compositions by columns. The corresponding entry in the table shows the winning rate (%) of P1 vs. P2. The right panel is analogous, but it shows the average duration (in turns) of the same matches.

Mono Blue (from 1-tall to 5-wide)

6 πŸ„±πŸ„± , 28 πŸ„± , 6 πŸ…†

Figure 1

P1 is playing the mono blue deck in these charts. P2 plays the archetype identified by the corresponding team size row, and by the corresponding deck composition column. Click on these charts to increase the team size of P1. The first group of charts show the winning rate of P1 after 10'000 simulated games. The second group shows the average duration in turns of the simulated games. A gray circle denotes that some games might not end within 50 turns. No statistics are listed for these matches. A red circle corresponds to matches that never end within 50 turns.

Mostly Blue (from 1-tall to 5-wide)

3 πŸ„±πŸ„± , 23 πŸ„± , 3 πŸ„ΎπŸ„± , 3 πŸ„Ύ, 8 πŸ…†

Figure 2

P1 is playing the mostly blue deck in these charts. P2 plays the archetype identified by the corresponding team size row, and by the corresponding deck composition column. Click on these charts to increase the team size of P1. The first group of charts show the winning rate of P1 after 10'000 simulated games. The second group shows the average duration in turns of the simulated games. A gray circle denotes that some games might not end within 50 turns. No statistics are listed for these matches. A red circle corresponds to matches that never end within 50 turns.

Balanced (from 1-tall to 5-wide)

6 πŸ„ΎπŸ„± , 12 πŸ„Ύ , 12 πŸ„± , 10 πŸ…†

Figure 3

P1 is playing the blue and orange, balanced deck in these charts. P2 plays the archetype identified by the corresponding team size row, and by the corresponding deck composition column. Click on these charts to increase the team size of P1. The first group of charts show the winning rate of P1 after 10'000 simulated games. The second group shows the average duration in turns of the simulated games. A gray circle denotes that some games might not end within 50 turns. No statistics are listed for these matches. A red circle corresponds to matches that never end within 50 turns.

Mostly Orange (from 1-tall to 5-wide)

3 πŸ„ΎπŸ„Ύ , 23 πŸ„Ύ , 3 πŸ„ΎπŸ„± , 3 πŸ„± , 8 πŸ…†

Figure 4

P1 is playing the mostly orange deck in these charts. P2 plays the archetype identified by the corresponding team size row, and by the corresponding deck composition column. Click on these charts to increase the team size of P1. The first group of charts show the winning rate of P1 after 10'000 simulated games. The second group shows the average duration in turns of the simulated games. A gray circle denotes that some games might not end within 50 turns. No statistics are listed for these matches.

Mono Orange (from 1-tall to 5-wide)

6 πŸ„ΎπŸ„Ύ , 28 πŸ„Ύ , 6 πŸ…†

Figure 5

P1 is playing the mono orange deck in these charts. P2 plays the archetype identified by the corresponding team size row, and by the corresponding deck composition column. Click on these charts to increase the team size of P1. The first group of charts show the winning rate of P1 after 10'000 simulated games. The second group shows the average duration in turns of the simulated games. A gray circle denotes that some games might not end within 50 turns. No statistics are listed for these matches.

Given the amount of information already contained in these charts, the analysis of this dataset will be left to forthcoming notes. But there are some features we can already start recognizing. For example, the well known fact that wider aggro archetypes beats tall aggro archetypes, is already recovered βˆ’even under our initial, and very rudimentary, conditions and assumptions. Other times these numbers show predictions that seem to contradict our real life experience. Anyone who has played one of those games with Metroplex when they felt like a humongous punching bag, will recognize that there's no space for those losses in these charts. 1-tall, balanced archetypes would give us the upper hand in almost any matchup under our current assumptions. What we're seeing at work in this case, is the effect of having set a fairly low upper bound on the highest attack total. And that's not very representative of characters with higher than normal stats, or Bold counts greater than 3. (A similar argument holds about defense and Tough.) The analysis of these extreme scenarios, will be the subject of more detailed simulations, especially because these extreme cases are exactly the ones we often try to achieve. The shift induced by incremental changes in our parameters, should prove to be very informative of the key features that may determine a real advantage in a given matchup.

But the most evident feature in these charts is probably the high number of games that don't end within 50 turns (gray and red circles, as discussed in the captions of our figures). This is especially common in blue vs. blue archetypes, and corresponds to the known risk that boosting defense values too much would represent. Pierce is clearly the way around this problem, and the subject of more sophisticated future simulations.