Subscribe

Get updates via e-mail:

« Chess 2 Update | Main | Codex Design Diary: The "NPE" »
Friday
Apr182014

Game Balance And Yomi

To celebrate the release of Yomi on iPad, I'll tell you some stories about balancing Yomi. First I'll give you two myths about game balance, then I'll tell you about tier lists and matchup charts, and then a bunch of specific balance problems we had to solve in Yomi.

Game Balance Myth 1: If it's too well balanced, it's boring.

I understand where this one comes from. Game balance is really hard, so if you had a cast of characters (or RTS races, or card decks, or whatever) and some of them were too good vs other ones, what should you do? The easiest thing is to smooth out anything one that has anything that's too different. Make things more and more homogeneous until it's fair. Yeah that's one approach, but it makes things boring because you're losing out on the fun asymmetry is supposed to offer. The harder way is to try to preserve as much asymmetry as possible AND to make it fair. When we do things the hard way, the good way, it doesn't make things boring. Furthermore, balanced just means the matchups are fair. It doesn't say anything about the dynamics of how interesting the game is. A balanced game could be boring or interesting and if you had a really interesting game, it's better if it's balanced well than if it isn't.

Game Balance Myth 2: Sirlin only cares about balance.

From the outside, I can see why someone would think that because I work on games that require a lot of balance work. But the testers who work with me would laugh at this. I'm the one always pushing back on balance changes because other things are more important: good flavor (mechanics expressing the right personality), good dynamics, and elegance. I want fewer words, fewer elements, things to be as simple as we can get away with, and for characters to feel right. If you allow balance to rank higher than those things, you get a terrible feeling game. You get stuff like the huge guy made of rock having fewer hit points than the young ninja girl. If you make only balance changes that respect all the constraints I mentioned, it's hard work but you can still have a balanced game.

Measuring Balance

At first, I think it's best to get tier lists from testers. That where they put all the characters in a few tiers (groups) to say which characters are all pretty much tied for strongest, which are tied for next strongest, etc. The goal isn't to eliminate tiers, because even you had a 100.00% perfectly magically balanced game, testers would still say there are tiers because of their imperfect perceptions, and that's fine. Tiers help you get a sense of what's going on with balance though.

A helpful format is:

God Tier (S rank). Any character here is brokenly good, above the maximum level that should be allowed, and obsoletes the other characters.
Top Tier (A rank). The group of strongest characters. Being here doesn't mean there's any problem.
Mid Tier (B rank). These characters are noticeably weaker than the top tier, but still very useable.
Bottom Tier (C rank). These characters are noticeably weaker than the mid tier. They are still useable.
Garbage Tier (F rank). Any character here is too weak to bother with. Something really went wrong and they need a boost to become a real part of the game again.

Players are going to disagree and argue, but there will also be some low-hanging fruit here. Even if everyone is arguing about whether CharacterX is high or mid, they might pretty much all agree that CharacterY is garbage or CharacterZ is God tier. The first thing to fix here is to nerf anything in God tier (since even a single thing there ruins the game). The next thing is to buff anything in garbage tier. After that, try to compress the tiers so that being a tier below only means you're barely worse, not like hugely worse.

Matchup Charts

The next level of zooming in on balance is a matchup chart. (I think many asymmetric games don't even do this part?) That's where you create a grid of every character vs every character and then give a rating to how difficult the matchup is. The notation is stuff like 6-4 or 7-3 which means if two experts played 10 games, we expect the expert using CharacterA to win 6 (and opponent using CharacterX wins 4), for example.

It's actually best not to use numerical data to determine these numbers. Yes, really. It's faster and more accurate to get to the bottom of things by relying on expert opinions, and then having those experts argue and then play each other to sort out disagreements. Think of matchup chart numbers as a kind of shorthand for this:

10-0. Not possible to lose when you play how you should, which you can always do.
9-1. Horrifically bad matchup. Impossible to lose unless something very unlucky happens.
8-2. Really hard for the other player. Multiple "miracles" required each game for the disadvantaged player to win.
7-3. Very hard for the other player. Clear disadvantage for them, but they can still win.
6-4. Somewhat advantage for you. Pretty close overall.
5.5-4.5. Very close match, but you can slightly detect an advantage.
5-5. No advantage to either character.

I want to emphasize just how important it is to get expert opinions on this, rather than adding up numbers from matches. Experts can get a good sense of what's going on in a match much, much sooner than data will reflect. I mean like months or years sooner, even. Imagine two experts played a certain matchup 20 times and the more they played, the more unfair it got. In our example, there is a certain way of playing that the other character just can't deal with and both players are coming to realize that truth more and more. It's entirely possible that they (correctly!) declare it an 8-2 matchup even though their results are no where near that bad. Lots of their games were played before they fully understood what's going on. And if we lump in the data from anyone other than experts, it's likely to be worse than ignoring it because they probably aren't playing the match well enough.

With 20 characters, that's 210 matchups (190 non-mirror matchups) so if every non-mirror matchup was played 20 times, that's 3,800 games. Wow is that a lot to even do a first pass with the numerical method. And you get extremely bad data if you do. Let's say a matchup is really 5-5 and you're lucky enough to have found two expert players of those characters with about equal skill. The chance that result will be 10 games to 10 is just 18%. Finding catastrophically wrong results (the chance of a player winning 14 games or more, indicating a 7-3 MU or worse) is 12%. You're really better off just asking the experts, letting them argue, and letting them sort it out by playtesting, and that's what we do. Also, game balance changes a lot during development and the intuition of experts can keep up with that, but a numerical method would need to keep starting from scratch after every change. That happens hundreds of times.

That said, if you like numerical analysis of balance, there's like 35,000 games played in the dataset for the posts about Yomi Season 1 and Season 2 online rankings. (And many more than that since then.)

Here's Yomi's matchup chart as of today. Of course it slightly changes as players gain more and more understanding, but it's fairly stable:

This chart currently has 0 matchups of 7-3 or worse anywhere in the 210 matchups. The highest point total of any character is only +6.5 with a cast of 20, and the lowest is only -6. To put it into perspective, I'm not aware of any asymmetric game with 10+ sides that is even close to that. Let's develop some perspective about these kinds of numbers by looking at other games.

Japanese Super Turbo Street Fighter chart, Arcadia (source)

Note that this chart doesn't even have the character O.Sagat in it. It really should to give a more accurate picture, because O.Sagat is undeniably top tier and has a ton of extremely lopsided matchups. The US version of this chart puts him at +30. Even without those lopsided matchups that aren't in the chart, the top character here has a whopping +27 point total advantage, and the bottom character has a huge -22.5 point total disadvantage. That's with a game with 16 characters, so having more than +16 or -16 is pretty big. This is a from a beloved tournament game that's been played almost 20 years now and is considered pretty balanced. Balance is hard.

SF HD Remix Consensus Chart (source)

Here's a matchup chart for SF HD Remix. I was in charge of that game, and we improved on the balance of Super Turbo Street Fighter by making very specific tweaks based on the decade+ of tournaments for ST.

So here, the top characters are +10 overall while the bottom is still -22.5. (I actually really disagree with the bottom 5 characters' matchups there, all of them are generally more favorable than that chart says and I think the players polled there had not yet adapted to the many new tools of those characters).

Here's Street Fighter 3: 3rd Strike's matchup chart, according to Japanese players (source). (It's very similar to the US matchup chart). 

Oh my, the top character is +32 points and bottom is -36 in a game with only 19 characters. Well that's incredibly bad.

More Than Just The Total

Even though so far I've only mentioned the overall point spread, it's important to point out that's not actually a good indicator. That just gives you a really zoomed out view of the whole thing, so a very bad overall spread does tell you something, but looking more closely can tell you a lot more. Do any characters have no bad matchups? What percentage of the chart is made up of lopsided matchups? Like imagine a game where the top character is +0 and bottom character is -0, but all matchups are 8-2. That's a pretty badly balanced game because every single matchup is incredibly unfair, lol. So you have to look past the overall totals. Lots of effort in the development of Street Fighter HD Remix was to specifically correct many of the bad matchups, and it worked out very well (example improved matchups: Guile vs. Dhalsim, Cammy vs. Blanka, Cammy or Fei Long vs Honda, Zangief vs. Vega, Everyone vs. O.Sagat.)

For another example of looking at specific matchups, check out GGXX Accent Core. Here's Ogawa's ratings for the character Eddie as of July 2009 in Arcadia Magazine. Ogawa is the best player in the world of Eddie.

6:4 – Sol, Ky, May, Millia, Chipp, Faust, Baiken, Axl, Venom, Testament, Slayer, I-No, Bridget, Robo, Anji, Jam
6.5:3.5 – Zappa, Dizzy
7:3 – Potemkin, Johnny, A.B.A, Order-Sol
In other words, out of 17 non-mirror matchups, Eddie has four 7-3s and no matchup at all worse than 6-4. That's considered "way way too powerful."
"But these are all fighting games," you say. Fighting games are good source of data for asymmetric games with lots of characters. They give you a sense of what kind of balance or imbalance is still ok. But let's look at some card games now.

Summoner Wars. This chart is only for the first 10 factions, compiled from tournament players (source):

This chart used to have 36% of non-mirror matchups as 7-3 or worse. As of today, it's only 22% of non-mirror matchups at 7-3 or worse, so I guess some new gameplay tech was figured out in some matchups. That's sort of an unfortunate number of bad matchups still though. And +13 is pretty high for only 10 sides, but it's not as bad as +30 with 16 characters or +32 with 19 characters that we saw in other games, above.

Magic: the Gathering. I actually can't find data on this, but I would LOVE to see a matchup chart of the top 10 decks in a given type 2 format. Yeah I know what you're going to say to that. First, it's "not fair" to pick 10 decks when a format can really only support like 4 or 5 decks that have a chance. And second, that even if you were to restrict the chart to those 4 or 5, that of course they have lots of bad matchups against each other because it's the nature of customizable card games to have bad matchups.

The thing is, it's still fair to measure. If you sit down to the table in a competitive card game, or in a fighting game, or any game really and you already have a huge 8-2 disadvantage then that sucks. And it's nice to know just how much of that there is in a game. Furthermore, it actually really is fair to pick the top 10 decks. Yomi has 20 characters, but think about back when it had 10. That's 55 cards per character (or should we only count it as 14 different cards? I don't know, but that would make this even more favorable.). Anyway that's 550 cards total for 10 characters. If it's basically impossible to make a cardpool of a customizable game of 550 cards that results in 10 decks, NONE of which have 7-3 matchups against each other...then you can see why I made Yomi a fixed deck game. It's precisely to get such a large number of distinct decks that are all fair against each other.

So if someone can create an MtG matchup chart of the top 10 decks in a format, it would be really interesting to see. I actually don't know how many lopsided matchups it would have, so I'm curious. (Goblins vs a deck with protection from red that existed in the same format?) And incidentally, this is why I had to make Codex work differently than any other CCG, specifically to prevent the tons of unfair matchups that are common in the genre.

BattleCON. It's cool that there's another fighting game inspired card game out there. There's a whole lot of characters in BattleCON, almost 50. I was not able to find a consensus matchup chart for the game though. I did find this attempt by the player community to create one based on numerical data. It actually illustrates my point very well about why it's bad to add up match results rather than get the experts to come together. That chart has THE MAJORITY of matchups listed as 100% win or 0% win. Yes, really.

That doesn't speak to the balance being catastrophically bad though, it just means that mountains of more data would be needed to get anywhere with that method. (And even then, how much would be players playing those matches suboptimally?) So how many 7-3 matchups does BattleCON have? I honestly have no idea and I wasn't able to find information on this. Thing is, the only way to eliminate those is to have a whole lot of experts develop that chart and then start iterating on improvements. I can't tell you if that's a priority of BattleCON or not, but it seems like this data should be out there if it was? shrug.

I don't want to speak on behalf of what BattleCON's priorities are or aren't, because I honestly don't know. But lots of games, such as League of Legends, put releasing tons of characters at a much higher priority than making sure all the characters and matchups they currently have are fair against each other. Is that "better"? That's up to you, but it practically guarantees lots of matchup problems and past a certain point, doesn't exactly improve things, in my opinion.

Back to Yomi. Check out that matchup chart again:

No 7-3 matchups, in a 20 character game with 210 matchups. (Maybe a few players think there's 1 or 2? Even in that case, it's just very, very few and would still be fewer than games I'm aware of.) This doesn't happen by magic or even luck. It takes...years. We started testing the Yomi expansion characters online on fantasystrike.com on over 3.5 *years* ago. We added rules enforcement for them about a year ago. The matchup chart had a lot of bad matchups that we addressed one by one over all this time. Here's some stories so you can see some of the specifics.

Yomi Balance Stories

Bal-Bas-Beta, Pesky Balance Robot. BBB is a really interesting character in that he's so different from all the others. He has a mechanic called Long Range that lets him push the opponent far away and then force the opponent to guess right just to "get back in" and be able to deal damage again. Meanwhile, BBB can still damage the opponent with a few moves from Long Range. Think of Dhalsim keeping you out in Street Fighter. All of this makes BBB the most complicated character to understand how he works, but he's not actually that difficult to play once you understand him. That said, he had various unfair matchups over the years and it's difficult to fix them because any change to how Long Range works affects a lot of stuff. Here's the current version:

  

There have been many versions of Long Range, different end conditions and timings for it. I'll skip the details of all that, but each one solved more problems than the previous one at least. But...

At one point, Troq (a grappler) had way too much trouble getting in. By giving him a new property called "Troq Armor" on his Jack which acts like "super armor" in a fighting game, it gave him one more answer to being at range. It turns out this makes a big difference and helps that match a ton.


 

On the flipside, Jaina completely wrecked BBB. Specifically her ability to play her 2 over and over forever while buying it back and left BBB with not nearly enough options. It cost her the same amount of life to buyback her 2 as the block damage it deals. In the other 19 matchups in the game, this is pretty much ok, but not vs BBB. We had to change her block damage numbers just for this matchup.

Oh and also Long Range used to end if you dealt blocked damage to BBB. That was ok in pretty much all matchups except Jaina too. Now BBB simply doesn't take any blocked damage at range. And one more thing: Long Range used to end if BBB took damage from abilities, which means Jaina's Smoldering Embers ability wrecked him too. Keep-away robot hates fire girl. Or at least he did for a long time.

Zane

Zane is sort of like Bison in Street Fighter: all offense and bad defense. For this reason, he didn't have a fast reversal move like a dragon punch. In the Yomi game system, he really needs SOMETHING though to deal with fast attacks, so he has a move called Crash Bomb that can still hit the opponent even if the opponent does a fast move. It's a bad reversal basically. Except, it was accidentally the best reversal in the game because of how it worked.

That was a total failure of design right there. I fixed that to work how it should and be a bad reversal that you use if you need to, not your main source of damage every game. Wow did Zane players hate that. But Zane players didn't really understand that it's a bad idea to improve balance when it's based on a wrong thing to begin with. If you do that, you end up with a well balanced game full of really bad flavor (or bad dynamics). So step 1 is correct the design issue, which I did. Here's the current version:

 

 

In order to give some power back, I then gave him "Meaty Attacks." That's a dumb term from fighting games that means attacks you do against an opponent who is getting up from a knockdown. Think of it this way: if you have a really slow move, usually you'd get hit out of it. But if you do that slow move as a "meaty attack" (while the opponent is knocked down) then you can get past the startup of it while the opponent is still on the ground. By the time they stand up, you're already to the point in your attack where you can actually hit them, so your attack FEELS fast to the opponent. They're forced to stand up right as your punch hits. Zane's meaty attacks ability speeds up his normal moves to a very fast speed 1.0.

 

Here's the problem: how easily Zane can knock you down becomes hugely important. The more he can, the more he can do speed 1.0 attacks against you. Tiny changes in how much or little he can knockdown result in large swings in power. It was difficult to figure out the amount of knockdown he needs to be fair. (Answer: very little! It's super strong.)

Another problem: Zane meaty attacks vs. whoever is the slowest in the game. It turns out Gloria and Quince generally have the slowest moves. Gloria (a healing character) is one of the most complicated characters in the game. She needs to be slow to make up for the many various tricks she has. We tried some structural changes to help this matchup, but in the end, we just had to make her Queen speed 1.0 specifically for this matchup.

Quince is the other slow character. Like Gloria, he is complicated and has tricks to make up for that speed. It turns out that vs Zane, he can just use his tricks. Quince can literally do unavoidable unblockables against Zane because of Zane's bad defense, but it costs a lot and isn't that frequent. Zane can more frequently do fast attacks vs Quince that Quince has a lot of trouble with. That kind of evens out so no change was needed there.

Persephone's Mistress's Command in Puzzle StrikePersephone. Persphone is one of the other most complicated characters to play (3 hardest are Persephone, Gloria, and Quince imo). Her Mistress's Command move was very, very tricky to get right in Puzzle Strike. It had more changes than most chips and was still changing right up until shipping. In Yomi, it started out controlling the opponent's entire turn when it hit. Over time, it controlled less and less. Now it controls just up to the point where they reveal their combat card; you can choose that for the controlled opponent, but you can't make them play a bluff card or a whole useless combo, or power-up for Aces for them. It turns out that despite all these nerfs, the move is still incredibly powerful.

The more powerful older version of the move cost two Aces to use. These days it only costs one Ace. She can do it more often, but it's not as crushing when she does. Also, there is kind of a neat property it has where if you happen to land it, yeah that's good, but if you really set things up exactly right (which takes work and requires backup from another ability) then you can completely lock them down. It's hard to really pull that off, but it's possible, and it seems fitting for a dominatrix. Her matchups are still pretty fair, it's just that when you lose to her, you REALLY lose hard. Here's her current Mistress's Command, by the way:


 

Vendetta

Here's a simpler story. Vendetta's innate was kind of complicated. Then it was kind of complicated in a different way. Then in yet a different way. Then it was just too junky so I made it very simple by deleting most of it but keeping the part that lets him poke a lot because that's the point of his character. Then he was too weak for a long time. So we increased his damage, and now he's fine. Done.

Midori

Midori can transform into a Dragon, and when he does, his Dragon attacks beat all dodges. That was always fine until Quince came along and had special dodges that he relies on to even function. Remember, Quince is slow, so that means Midori's Dragon attack Queen being fast means "Quince: you basically can't do anything." Dragon Form now only turns off normal dodges, so it doesn't turn off Quince's special dodges. This affects a couple other matchups, but the main thing is it fixes the problem in Midori vs. Quince.

More On Gloria

The healing character again. This time, it's her move called Healing Sphere. This is an incredibly strong ability that buffs Gloria as long as she can keep her Sphere going. It's inspired by Dark Phoenix's similar move in Marvel vs. Capcom 3. The ability lets her draw extra cards, but you should keep in mind that she can use those cards as part of her "engine" to recur previously used cards and to heal. Here's Healing Sphere:


 

Gloria used to lose her sphere when she was knocked down. I fought long and hard to keep it that way. Some characters can knock her down more easily than others, which is exactly WHY I wanted it that way. It's good when matchups are different and things are more diverse rather than more homogeneous. There's a poisonous word in our testing community called "variance" though where testers frame boring homogeneity as a desirable property. Wow is that not the goal.

Ideally, some characters really are worse at removing her sphere but they are able to make up for it in different ways. In practice, this caused unfair matchups that we just weren't able to fix. So of the 100 times people wanted to make the game more samey, this is the one I had to give in on. It's now removed when she's thrown. Characters do vary quite a lot in how good their throws are so there is still texture that's different across matchups, but all characters can throw her one way or another. Some characters just had an awful time knocking her down. Also, after this was changed Gloria's Queen throw became absurdly too good. It was a fast throw, which is normally "very good" but in the hands of a character you need to throw to disable her incredible Healing Sphere buff, Gloria having a throw that always beats yours was crazy. So that's fixed too.

Lum / Argagarg / Persephone / Gloria

Here's a tangled mess, check this out. Argagarg's main thing is blocking. Even when he blocks, you slowly die. He has a super block that powers him up so you die even faster when he blocks some more. Lum is Gambling Panda who likes to attack a lot. Blocking beats attacking. That's a tough match. Lum can also do out-of-combat damage with his poker tricks, so maybe that can make up for the difference? Usually the answer would be yes, but Argagarg also has a counter. He can use that to prevent Lum from doing much with his poker abilities.

This was claimed to be one of the most unfair matchups in the game. Maybe we can buff Lum? Actually no. Lum is one of the best characters now. All characters are almost an identical power level because the tiers are soooo compressed, but Lum happens to be near the top of that very small range of power. So buffing him is kind of bad. Can we nerf Argagarg? His other matchups are currently fair, so nerfing them would make him too weak. So now we have to make Lum better in one match, but not better in any other? That's pretty hard to do.

Many things were proposed. Many problems with all of them (sometimes flavor problems, or elegance problems, or logistics, or balance problems in other matchups). Then more problems came up with Argagarg. People started claiming that Argagarg was too weak vs Gloria and possibly too weak vs Persephone. Gloria wants to build up a card engine that lets her heal a lot and then convert that to a win with her Overdose move. That's a super effective strategy against a stalling character like Argagarg. Remember, Gloria has a powerful Healing Sphere ability, so Argagarg wants to counter that. Except...even if he does, Gloria has the ability to fetch cards from her discard pile to get it back anyway. Persephone also has the ability to fetch things from her discard pile, making Argagarg's counter unusually weak in that match.

That's now THREE matchups that are all in trouble, all because of Argagarg's counter. But in one of the three (Lum vs Argagarg) we want the counter to be WEAKER. In the other two matchups (vs Gloria and vs Persephone), we want the counter to be STRONGER. This is the kind of thing that makes game balance truly tricky. It looks pretty much unsolvable. This constellation of issues came up for weeks and I kept reminding everyone, "currently, there is literally no workable suggestion on the table." When we finally had an idea that worked at all, it was immediately better than the literally nothing else we had.

 

If Argagarg's counter sends the countered card to the bottom of their deck, that makes it much stronger vs Gloria and vs Persephone. They can no longer immediately recur that countered card to their hand, so that helps a lot. What about Lum? Well, it's actually barely weaker against Lum, which is what we want. Lum can't recur cards from the discard pile, so it's not about that. It's that he might draw that card again. It goes to the bottom of the deck, but he can reshuffle any time from powering up and he draws more cards than most other characters, so he has a better chance than most would of running Argagarg out of counters.

Lum could also choose not to power up and still get that card back. I changed his innate slightly so that sometimes it lets him draw the bottom card of his deck. Lum players can try to maximize that to get his previously countered back in their hand again.

We wanted another slight buff though, something that buffs Lum in a way that doesn't matter in any matchup except when he fights Argagarg. Does such a thing exist? Yes it does. Lum's 10 throw also has his poker ability on it, the one we just talked about. He doesn't really want to use it as a throw, but vs Argagarg he is ok with doing that. So we buffed the damage on the 10 throw only, giving it the name Extra Juice and allowing him to pump it up a bit more, so that he can sneak more damage in just for that matchup. So a player can try for either playing too many copies of the card per game for Argagarg to counter them all, or try for just more throw damage to make Argagarg block less, and to threaten dealing lethal damage a little bit earlier.

Conclusion

Game balance is hard, and it takes a long time. From tier lists to matchup charts, you have to iterate on it a lot to really have any hope of good balance. In Yomi, we did all that work and I don't really know of other games (with 10+ sides) with a matchup chart that close, though maybe there is one somewhere? I'm really happy with what we've been able to achieve, in any case, and I hope you all enjoy the results.

Balance itself doesn't make a game fun, though. We actually didn't talk at all about why Yomi is fun or strategic. I think it has a lot going for it both those areas, and that it's an excellent strategy game for tournaments and casual play, but it took all these words just to cover the balance part! We'll have to cover the strategy stuff another time. In the meantime, you could check out the hours of streaming of the finals of our online tournament last January. There were 18 weeks of qualifying tournaments to compete in these finals!

Again, I hope you all enjoy Yomi. It's available for iPad right now, and in print-and-play form as well. If you get the new iPad version, remember to rate the app! Thanks.