Modern AI systems often need the ability to make sequential decisions in an unknown, uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data. Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and has been used to design intelligent agents that achieve high-level performance in challenging applications such as Go, computer games, robotic manipulation, health care, and education.
This course provides an introduction to reinforcement learning covering a range of problem formulations, algorithms, and theory. The four main themes of the course are (1) Markov decision processes (Bellman equations/optimality, planning, UCB, unknown environments, linear quadratic control, exploration, imitation learning), (2) bandits (epsilon-greedy, UCB, Thompson sampling, contextual bandits, linear bandits, exploration in MDPs), and (3) methods for large-scale systems (policy gradient methods, deep RL, Monte Carlo tree search, Q-learning). There will also be an Embedded Ethics lecture on ethical issues arising in reinforcement learning. The assignments will focus on a mix of algorithmic and statistical principles, along with their programming implementations.