David Abel

Portrait
david_abel@brown.edu


Abstraction and RL: Literature Overview

03/09/2019

Abstraction is a pretty remarkable process; a fun example is given in the Pixar movie "Inside Out", in which the main characters find themselves in a room of abstraction (inside a character's head) where the objects in the room become more and more abstract, eventually leading to only simple 2d colored shapes (clip here):

But what exactly makes abstraction a useful practice? Why do our characters become these particular shapes, with these particular colors? Of course, abstraction is central to computer science: from Turing machines to object-oriented programming, abstraction is a necessary process that supports many of the foundational ideas of the field. From the perspective of reinforcement learning (RL), the more pressing question is: how can a learning agent discover and use abstractions to improve its capacity to make effective decisions in complex environments?

Algorithms for RL are well suited to benefit from abstraction. One route to confronting many of the central challenges of RL like exploration, credit assignment, and long horizon planning involves reasoning with a simpler, abstract model of the environment. For example, exploring "inside the cupboard", already requires the use of the abstract concepts "cupboard" and "inside of" that break the world into a cleanly explorable place. Planning in the right way usually involves making predictions at the right level of abstraction: if I were to throw a wine glass in a restaurant, I can predict that the glass will shatter (which is enough to convince myself I shouldn't throw it!). But, I likely cannot predict the precise location and number of shards of glass on the floor (and why would I need to?). In light of the promise of reasoning with a more compact model, RL research has long sought the appropriate algorithms for learning and using abstractions of different kinds. For an excellent recent overview of abstraction in RL, see On the Necessity of Abstraction by Konidaris (2018).

In this post, I'll give a birds eye view look at some of the literature on abstraction in RL (as of March 2019 or so). I tend to break abstraction (as studied in RL) into three categories:

  1. State Abstraction: How can we form abstract states that support effective decision making?
    Examples: State aggregation is the main method, though there are connections to feature discovery, basis function discovery, and certain kinds of value function approximation.
  2. Action Abstraction: How can we form abstract actions that support effective decision making?
    Examples: Options, macro-actions.
  3. Hierarchical Abstraction: How can we find the right hierarchical abstraction that can support effective decision making?
    Examples: MAXQ, HAMS, RMax-Q, Abstract MDPs.
For each category, I'll give a brief description, then highlight what I consider to 1) be the right papers to read to dive into each body of literature, and 2) a collection of other relevant work. I've almost certainly missed many important papers, so do reach out if you find that some are missing. It's worth noting that abstraction has been studied both for learning purposes and for planning, in which the full model of the world is given as input, and a policy/behavior is requested as output. Naturally, abstractions that are useful for planning can give rise to efficient methods for model-based RL, so some of the referenced work will be more focused on planning as it relates to RL. Lastly, lots of excellent work has focused on abstraction in multi-agent systems; this is not the focus here. Let's dive in!

State Abstraction

The traditional view of state abstraction is to define a function, \(\phi : S \rightarrow S_\phi\), that projects each state in the original MDPs state space to an abstract state (where usually \(|S_\phi| \ll |S|\)). The goal is to form a smaller state space that can still preserve the essential characteristics of the problem posed by the true MDP; the smaller state space, hopefully the lower the sample complexity/regret of RL.

The papers I would recommend starting with:

Chronology of other work to look at (shamelessly including a few of my recent papers in the area):


Action Abstraction

Action abstraction defines methods that enrich the action space of an MDP, typically by incorporating macro-actions that encode long horizon sequences of actions. The formalisms for encoding these higher level actions are diverse, though recently the options formalism [Sutton, Precup, Singh, 1999] has become most prominent. An option, \(o\), is a triple, \(I_o, \beta_o, \pi_o\), indicating:

  1. \(I_o \subset S\): An initation condition indicating the states in which the option can be executed.
  2. \(\beta_o \subset S\): A termination condition indicating where the option terminates.
  3. \(\pi_o : S \rightarrow A\): A policy.
An option effectively defines a skill that takes an agent from a precondition (\(I_o\)) to a post condition (\(\beta\)), according to \(\pi_o\). Typically \(\beta\) is presented as a conditional probability distribution, conditioned on \(S\), with support \([0,1]\), but we here take the deterministic view for simplicity. Much of recent work on action abstraction has concentrated on discovering useful options, clarifying how options impact learning and planning, and extending options to transfer across tasks.

The papers I would recommend starting with:

Chronology of other work to look at:


Hierarchical Abstraction

Hierarchical abstraction captures approaches that repeatedly abstract entities involved in RL, each time inducing a coarser entity than the previous iteration. The methods here are too diverse to exhaustively describe with one formalism, though again options have taken hold recently as the primary method for characterizing abstractions. In this case of a hierarchy, this means we define a list of options, \(O^{(n)} = \{O^1, O^2, \ldots, O^n\}\), where the options at level \(i\), denoted \(O^{i}\), take on policies that output options over level \(i-1\) options. Again, the formalisms are diverse, but this is the main idea: repeatedly abstract using whatever formalism we'd like, until all that's left is a simple characterization of the problem of relevance. The main focus of the literature has been, like state and action abstraction, better understanding how to discover and compute the right abstractions efficiently, clarify when and why abstraction helps, and identify hierarchies that preserve near-optimal behavior.

The papers I would recommend starting with:

Chronology of other work to look at:

I hope folks find this helpful. Let me know if you find errors or that the list is missing some papers (I'm sure I've missed some important ones!)