Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Learning in POMDPs with Monte Carlo Tree Search Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Learning in POMDPs with Monte Carlo Tree Search. In Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 1819–1827, August 2017. DownloadAbstractThe POMDP is a powerful framework for reason-ing under outcome and information uncertainty,but constructing an accurate POMDP model isdifficult. Bayes-Adaptive Partially ObservableMarkov Decision Processes (BA-POMDPs) ex-tend POMDPs to allow the model to be learnedduring execution. BA-POMDPs are a BayesianRL approach that, in principle, allows for anoptimal trade-off between exploitation and ex-ploration. Unfortunately, BA-POMDPs are cur-rently impractical to solve for any non-trivial do-main. In this paper, we extend the Monte-CarloTree Search method POMCP to BA-POMDPsand show that the resulting method, which wecall BA-POMCP, is able to tackle problems thatprevious solution methods have been unable tosolve. Additionally, we introduce several tech-niques that exploit the BA-POMDP structure toimprove the efficiency of BA-POMCP along withproof of their convergence. BibTeX Entry@inproceedings{Katt17ICML, title = {Learning in {POMDPs} with {Monte Carlo} Tree Search}, author = {Sammie Katt and Frans A. Oliehoek and Christopher Amato}, booktitle = ICML17, OPTseries = {Proceedings of Machine Learning Research}, year = 2017, pages = {1819--1827}, month = aug, wwwnote = {Long version including full proofs available at arXiv: \url{http://arxiv.org/abs/1806.05631}}, url = {http://proceedings.mlr.press/v70/katt17a/katt17a.pdf}, abstract={ The POMDP is a powerful framework for reason- ing under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) ex- tend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and ex- ploration. Unfortunately, BA-POMDPs are cur- rently impractical to solve for any non-trivial do- main. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several tech- niques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence. } }
Generated by
bib2html.pl
(written by Patrick Riley) on
Tue Nov 05, 2024 16:13:37 UTC |