Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Leveraging diverse offline data in POMDPs with unobserved confoundersOussama Azizi, Philip Boeken, Onno Zoeter, Frans A Oliehoek, and Matthijs T. J. Spaan. Leveraging diverse offline data in POMDPs with unobserved confounders. In Seventeenth European Workshop on Reinforcement Learning (EWRL), October 2024. DownloadAbstractIn many Reinforcement Learning (RL) applications, offline data is readily available before an algorithm is deployed. Often, however, data-collection policies have had access to information that is not recorded in the dataset, requiring the RL agent to take unobserved confounders into account. We focus on the setting where the confounders are i.i.d. and, without additional assumptions on the strength of the confounding, we derive tight bounds for the causal effects of the actions on the observations and reward. In particular, we show that these bounds are tight when we leverage multiple datasets collected from diverse behavioral policies. We incorporate these bounds into Posterior Sampling for Reinforcement Learning (PSRL) and demonstrate their efficacy experimentally. BibTeX Entry@inproceedings{Azizi24EWRL, title= {Leveraging diverse offline data in {POMDP}s with unobserved confounders}, author= {Oussama Azizi and Philip Boeken and Onno Zoeter and Frans A Oliehoek and Matthijs T. J. Spaan}, booktitle={Seventeenth European Workshop on Reinforcement Learning (EWRL)}, year= 2024, month = oct, OPTurl= {https://openreview.net/forum?id=uSheVlIgzc} keywords = {refereed}, abstract= {In many Reinforcement Learning (RL) applications, offline data is readily available before an algorithm is deployed. Often, however, data-collection policies have had access to information that is not recorded in the dataset, requiring the RL agent to take unobserved confounders into account. We focus on the setting where the confounders are i.i.d. and, without additional assumptions on the strength of the confounding, we derive tight bounds for the causal effects of the actions on the observations and reward. In particular, we show that these bounds are tight when we leverage multiple datasets collected from diverse behavioral policies. We incorporate these bounds into Posterior Sampling for Reinforcement Learning (PSRL) and demonstrate their efficacy experimentally.} }
Generated by
bib2html.pl
(written by Patrick Riley) on
Tue Nov 05, 2024 16:13:37 UTC |