AAMAS’22 paper: Bayesian RL to cooperate with humans

In our new paper Best-Response Bayesian Reinforcement Learning with BA-POMDPs for Centaurs, we investigate a machine whose actions can be overridden by the human. We show how Bayesian RL might lead to quick adaptation to unknown human preferences, as well as aiding the human to pursue its true goals in case of temporally inconsistent behaviors. All credits to Mert for all the hard work!

Announcing the Mercury Machine Learning Lab

As one of the scientific directors, I am co-leading the new Mercury Machine Learning Lab: a new ICAI lab in collaboration with the University of Amsterdam and booking.com.

At Delft, we will be looking for 2 PhDs and a postdoc, so keep an eye out on adverts or follow me on twiter if interested in applying reinforcement learning in a real world context!

AAMAS’21: Difference Rewards Policy Gradients

At next AAMAS, Jacopo Castellini, Sam Devlin, Rahul Savani and myself, will present our work on combining difference rewards and policy gradient methods.

Main idea: for differencing the function needs to be quite accurate. As such doing differencing on Q-functions (as COMA) might not be ideal. We instead perform the differencing on the reward function, which may be known and otherwise easier to learn (stationary). Our results show potential for great improvements especially for larger number of agents.

Are Multiple Agents the Solution, and not the Problem, to Non-Stationarity?

That is what we explore in our AAMAS’21 blue sky paper.

The idea is to explicitly model non-stationarity as part of an environmental shift game (ESG). This enables us to predict and even steer the shifts that would occur, while dealing with epistemic uncertainty in a robust manner.