Metadata
Title
Hado van Hasselt Seminar at the Department of Engineering Science, University of Oxford
Category
general
UUID
90317956eaed409faab5d448bf7fa96b
Source URL
https://aims.robots.ox.ac.uk/seminars/hado-van-hasselt-seminar
Parent URL
https://aims.robots.ox.ac.uk/seminars
Crawl Time
2026-03-23T02:54:31+00:00
Rendered Raw Markdown

Hado van Hasselt Seminar at the Department of Engineering Science, University of Oxford

Source: https://aims.robots.ox.ac.uk/seminars/hado-van-hasselt-seminar Parent: https://aims.robots.ox.ac.uk/seminars

AIMS Seminar - Friday 15th November @11:00am

Overconfidence in Model-based and Model-free Reinforcement Learning

Abstract:

In reinforcement learning, model-free algorithms directly learn values and/or policies.  Conversely, model-based algorithms learn some model of the environment. I will discuss commonalities between model-based algorithms and algorithms that, instead, replay past experiences to be consumed by an otherwise-model-free update rule.  I will argue that model-based algorithms can be useful, but perhaps are not optimally utilized in the way they seem to be most commonly used.  I will present several alternatives. Several insights are related to the idea of over-confidence, a theme that more generally emerges in reinforcement learning.  I will draw parallels to model-free algorithms that can similarly be over-confident in some cases and discuss some solutions and directions.

Bio:

Hado van Hasselt received his Ph.D. in AI from Utrecht University in the Netherlands.  After doing a post-doc with Rich Sutton at the University of Alberta, he moved back to Europe to join DeepMind in 2014.  Hado has been doing research in artificial intelligence and reinforcement learning, including "deep" reinforcement learning, for more than a decade.  His goal is to create better general algorithms that can learn how to solve sequential decision problems without relying on, potentially brittle, domain-specific hand-crafted solutions.