Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Best-response Play in Partially Observable Card Games

Frans A. Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. Best-response Play in Partially Observable Card Games. In Benelearn 2005: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, pp. 45–50, February 2005.

Download

pdf [115.6kB]  

Abstract

We address the problem of how to play optimally against a fixedopponent in a two-player card game with partial information likepoker. A game theoretic approach to this problem would specify a pairof stochastic policies that are best-responses to each other, i.e., aNash equilibrium. Although such a Nash-optimal policy guarantees alower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeatedplay), then we can model the problem as a partially observable Markovdecision process (POMDP) from the perspective of one agent, and solveit by dynamic programming. In particular, for a large class of cardgames including poker, the derived POMDP consists of a finite numberof belief states and it can be solved exactly. The resulting policy isguaranteed to be optimal even against a Nash-optimal policy. Weprovide experimental results to support our claims, using a simplified8-card poker game in which Nash-policies can be computed efficiently.

BibTeX Entry

@InProceedings{Oliehoek05Benelearn,
    author =       {Frans A. Oliehoek and Matthijs T. J. Spaan and Nikos Vlassis},
    title =        {Best-response Play in Partially Observable Card Games},
    booktitle =    Benelearn05,
    month =        feb,
    year =         2005,
    pages =        {45--50},
    abstract = 	 {
We address the problem of how to play optimally against a fixed
opponent in a two-player card game with partial information like
poker. A game theoretic approach to this problem would specify a pair
of stochastic policies that are best-responses to each other, i.e., a
Nash equilibrium. Although such a Nash-optimal policy guarantees a
lower bound to the attainable payoff against any opponent, it may not 
necessarily be optimal against a fixed opponent. We show here that if 
the opponent's policy is fixed (either known or estimated by repeated
play), then we can model the problem as a partially observable Markov
decision process (POMDP) from the perspective of one agent, and solve
it by dynamic programming. In particular, for a large class of card
games including poker, the derived POMDP consists of a finite number
of belief states and it can be solved exactly. The resulting policy is
guaranteed to be optimal even against a Nash-optimal policy. We
provide experimental results to support our claims, using a simplified
8-card poker game in which Nash-policies can be computed efficiently.}
}

Generated by bib2html.pl (written by Patrick Riley) on Sat Dec 09, 2017 12:30:44 UTC