mikebjohanson's comments

mikebjohanson · on March 16, 2017

There has been quite a bit of work (and several papers) on using CFR for games with >2 players. It is not theoretically guaranteed to converge to a Nash, and usually doesn't in practice either (there's one case in a toy game, in 3p Kuhn poker, where it does converge). However, even though it doesn't converge to Nash, the algorithm is still well defined, and the resulting strategies appear strong in competitions against other computer adversaries, as demonstrated in the Annual Computer Poker Competition.

mikebjohanson · on March 16, 2017

Nash equilibria are still guaranteed to exist. But it's only the 2p zero-sum perfect recall case where an equilibrium has useful properties, like being robust against any opponent strategy, including a worst-case opponent who knows your strategy. In a multiplayer (> 2 players) game, opponents can collude against you and playing your part of a Nash gives no useful guarantees on performance. Even if they aren't colluding, in poker games, the presence of a bad player just before you in turn order can hurt your EV worse than their own. And even if all players independently compute their own Nash equilibria (there can be many) and use the piece for their position, then that combination of strategies may not itself be a Nash equilibria.

xapata · on March 17, 2017

According to Wikipedia, and my memory from school, Nash equilibrium doesn't exist unless all players are aware of the optimal strategy and are seeking it.

jumpCastle · on March 16, 2017

Thanks for the explanation. I will also appreciate a reference to a good survey paper if you are in the mood :)

mikebjohanson · on March 16, 2017

I don't know of a survey paper on CFR for multiplayer, but it's been showing up in conference papers and theses.

Here's a link to a shorter conference paper where CFR does converge to a Nash, in 3p Kuhn poker. It describes a family of equilibria, where one player (the second to act, IIRC) has a parameter that can't affect their own EV (...or else it wouldn't be a Nash), but does determine how much the other two players win/lose from each other. This illustrates the problem in equilibria for multiplayer games: if you're players 1 or 3, then even if you are playing a Nash, and everyone else does too (albeit different equilibria), then you can still lose. https://webdocs.cs.ualberta.ca/~games/poker/publications/AAM...

For a longer read, the best I know of is probably Rich Gibson's PhD thesis. He focussed on CFR for multiplayer games.

https://webdocs.cs.ualberta.ca/~games/poker/publications/gib...

jumpCastle · on March 16, 2017

Thanks again. So I guess for multiplayer games a better approach would be reinforcement learning with no game theoretic heuristics?

xapata · on March 17, 2017

Game theory is always applicable to games :-)

Nash equilibrium is just one aspect of some games.

jumpCastle · on March 17, 2017

Game theory is domain specific. Generic methods in AI tend to dominate domain knowledge over time. Although I agree that other game-theoretic techniques might help here.

xapata · on March 17, 2017

Game theory is specific to the domain of agents optimizing outcome in adversarial, cooperative, or (rarely) solitary systems. That's a pretty big domain.

mikebjohanson · on March 16, 2017

Hi - I'm one of the authors on this paper. We had professional designers do the figures for this paper; Fig3 in particular was far better than what we could do ourselves.

andr3w321 · on March 16, 2017

Just working my way through the pseudocode to try and gain a little insight. I think there may be a small typo. Line 29 VALUE should be VALUES?

mikebjohanson · on Feb 3, 2017

Sure. Like I mentioned in a later post, I'm an author on several of the CFR papers. It is related to RL, and there are a few ways of interpreting CFR. If you have an RL background, then CFR is kind of "RL using an advantage function, but it converges in adversarial games." If you have a multiarm bandits background, then CFR is kind of "An independent multiarm bandit at each decision point, all learning at the same time."

CFR differs from standard RL in a few ways:

- Exploring the tree. CFR has many variants, many of which focus on using sampling to trade off between fast, narrow, noisy updates versus slow, broad, precise updates. The most sampling-heavy variant, called Outcome Sampling, traces a single path through the tree, as you would in RL. However, it doesn't converge nearly as quickly as other variants, like Vanilla (explore the whole tree on each iteration) or Chance Sampling (sample the cards, explore the whole action space). Since those variants consider all paths through the tree, you don't have an explore/exploit tradeoff anymore, as you normally would in RL.

- The "Counterfactual" part. Regret updates are weighted by the other agents' probability of taking you to a decision point. Or, equivalently: updates are weighted by the probability that you reach a decision, if you are trying to reach that decision (and all of your action probabilities along the way are 1). If you use Outcome Sampling (described above) then by sampling all actions that weight becomes 1, and thus less important. But for fast learning, the weighting becomes important.

- The strategy doesn't converge, but the average strategy does. Even in the limit, in an adversarial setting where all players are learning, the strategy produced from the regret values doesn't converge to Nash. If it was a single-player problem (either no opponents, or static opponents that can be treated as part of the environment), then as in RL, the strategy would converge to an optimal response to the environment. But in an adversarial setting, you also need the average strategy, which is the only part guaranteed to converge to optimal.

fryguy · on Feb 3, 2017

Have you read the book https://en.wikipedia.org/wiki/Blondie24. It seems like both of your strategies have similar high-level approaches of self-play and learning, although yours is much more nuanced. They are using an neural network that uses an evolutionary algorithm to update the neural network based on which ai's win against the other ones in self-play.

eutectic · on Feb 3, 2017

Is there a version of CFR for differentiable learners like neural networks?

gwern · on Feb 3, 2017

The averaging part makes it sound like the usual RL self-play against regular checkpoints of oneself.

mikebjohanson · on Feb 3, 2017

Hey, thanks for the mention! I've read hacker news for years, this seems like a good reason do de-lurk. If anyone has questions, I'm happy to answer them. I wrote that summary for a pretty broad target audience, and the technical details are fun to explore.

mikebjohanson · on Feb 3, 2017

Not quite: it really does only go to Nash in a 2p zero sum game.

In a multiplayer zero-sum game, there's no theoretical proof that it should go to Nash. In the tiny 3p game of 3p Kuhn poker (3 players, 4 card deck, enough chips for one bet) we found that it does go to Nash (here's the paper: https://webdocs.cs.ualberta.ca/~games/poker/publications/AAM...). But in a slightly larger toy game, 3p Leduc poker, we empirically found that it does not. Like you suggested, we did this with best response calculations. If you take the strategy profile CFR produces and then compute a best response in each position, those best responses each improve over the strategy profile. If you plot that loss over time while CFR is running, it becomes clear that it does not converge to Nash: it eventually bottoms out and does not improve further.

However - in practice, the strategies produced by CFR for multiplayer games still appear to be quite strong, even if not Nash. This was the approach used by the University of Alberta for years in the Annual Computer Poker Competition, and it crushed the competition there. In 3p Limit Hold'em, the CFR strategies also held up well against human pros in informal testing by the U of A.

Nash equilibrium isn't a compelling goal in a multiplayer game anyways. In a 2p game it theoretically guarantees that you earn at least the game value (i.e., do no worse than tie if you rotate seats every game). In a multiplayer game, even if you could compute one, there's no such theoretical guarantee, when all other players at the table are independently choosing strategies. Even if everyone at the table uses an independently computed Nash equilibrium, that set of strategies is not necessarily itself a Nash equilibrium (in 2p zero sum games, it is).

To summarize - for multiplayer, it doesn't necessarily go to Nash. But you might not care if it does or not, and it works great in practice even though it doesn't go to Nash.

In a non-zero-sum game, you're right that CFR converges to a correlated equilibrium. But we can also theoretically bound how close it gets to a Nash for the non-zero-sum game, and empirically measure how close it gets by using best response analysis. The answer is - if it's only slightly non-zeo-sum (e.g., subtracting the rake in a poker game for negative sum, or pretending to add money to the pot to encourage aggressive play) then CFR still gets very close to Nash.

VodkaHaze · on Feb 4, 2017

Thanks!