Below you will find vacancies in my research group.
I am truly humbled by the interest of many prospective students to come and work with me. However, besides any vacancies listed below,
-> I have no other vacancies. <-
If you are interested to work with me, please follow me on twitter or keep track of this website. Please do not email me; I am afraid I cannot respond to these requests.
PhD in (theoretical) reinforcement learning (deadline July 5th)
Together with Julia Olkhovskaia (who will be the primary supervisor), I am looking for a fully paid, 4-year PhD student. This project focuses on advancing the field of online learning theory, with a special emphasis on enhancing the scalability of algorithms operating with constrained feedback.
Online learning involves a series of interactions between a learner and an environment, where both choose actions in each round, bringing reward for the learner, who also receives some form of feedback. The objective for the learner is to maximize the total reward over time. Although extensive research has established statistical performance boundaries in various online learning scenarios, these are based on specific assumptions about the decision-making options and the feedback available to the learner. Current algorithms either meet or nearly meet these performance thresholds, but a comprehensive understanding of how different decision sets and feedback mechanisms affect performance is still lacking.
The goal of this PhD project is to broaden our understanding and improve the methodologies in online learning in several key areas.
- One research direction involves the contextual bandit problem, where the context available for each decision may be incomplete or corrupted. This is a realistic scenario often encountered in practice, yet theoretical approaches to this problem are not well-developed. The strategy includes learning in comparison to the optimal policy that operates with the same level of information, and also considering a scenario where the learner has enough data to predict the full context.
- Another area of investigation is online reinforcement learning within large state spaces subject to adversarial losses and bandit feedback. This problem assumes a linear Markov decision process (MDP) model, where each state-action pair is associated with a known feature representation that linearly determines the transitions and the losses. There remains a significant disparity between the existing regret upper bounds and the lower bounds.
Please apply here.
Student Projects
I am happy to supervise master projects for current TU Delft students, please look here.
Internships
I am afraid that I cannot offer internships. Please do not email me requesting internships; I am afraid I cannot respond to these requests.