Jacopo, Rahul, Sam and I won the best paper award at ALA’21!
-> check out the paper here.
Jacopo, Rahul, Sam and I won the best paper award at ALA’21!
-> check out the paper here.
Tomorrow, wed 5th of May, I will lead an informal discussion on multiagent RL. Details can be found here: https://aamas2021.soton.ac.uk/programme/detailed-programme/#Wednesday-M-INF
Looking forward to discuss!
At next AAMAS, Jacopo Castellini, Sam Devlin, Rahul Savani and myself, will present our work on combining difference rewards and policy gradient methods.
Main idea: for differencing the function needs to be quite accurate. As such doing differencing on Q-functions (as COMA) might not be ideal. We instead perform the differencing on the reward function, which may be known and otherwise easier to learn (stationary). Our results show potential for great improvements especially for larger number of agents.
That is what we explore in our AAMAS’21 blue sky paper.
The idea is to explicitly model non-stationarity as part of an environmental shift game (ESG). This enables us to predict and even steer the shifts that would occur, while dealing with epistemic uncertainty in a robust manner.
Our AAMAS’21 paper on loss bounds for influence-based abstraction is online.
In this paper, we derive conditions for ‘approximate influence predictors’ to give small value-loss when used in small (abstracted) MDPs. From these conditions we conclude that that learning such AIPs with cross-entropy loss seems sensible.
Do you have experience in multiagent reinforcement learning, game theory and/or other forms of interactive learning? Then have a look at this vacancy and contact me!
In this work we show how symmetries that can occur in MDPs can be exploited for more efficient deep reinforcement learning.
This paper shows that also in decentralized multiagent settings we can employ “prediction rewards” for active perception. (Intuitively leading to a type of voting that we try to optimize).
The camready version of Influence-Augmented Online Planning for Complex Environments is now available.
In this work, we show that by learning approximate representations of influence, we can speed up online planning (POMCP) sufficiently to get better performance when the time for online decision making is constrained.
Three of our papers were accepted at NeurIPS. For short descriptions, see my tweet.
(Updated) arxiv links will follow…