Abstraction and RL: Literature Overview
03/09/2019
Abstraction is a pretty remarkable process; a fun example is given in the Pixar movie "Inside Out", in which the main characters find themselves in a room of abstraction (inside a character's head) where the objects in the room become more and more abstract, eventually leading to only simple 2d colored shapes (clip here):
But what exactly makes abstraction a useful practice? Why do our characters become these particular shapes, with these particular colors? Of course, abstraction is central to computer science: from Turing machines to object-oriented programming, abstraction is a necessary process that supports many of the foundational ideas of the field. From the perspective of reinforcement learning (RL), the more pressing question is: how can a learning agent discover and use abstractions to improve its capacity to make effective decisions in complex environments?
Algorithms for RL are well suited to benefit from abstraction. One route to confronting many of the central challenges of RL like exploration, credit assignment, and long horizon planning involves reasoning with a simpler, abstract model of the environment. For example, exploring "inside the cupboard", already requires the use of the abstract concepts "cupboard" and "inside of" that break the world into a cleanly explorable place. Planning in the right way usually involves making predictions at the right level of abstraction: if I were to throw a wine glass in a restaurant, I can predict that the glass will shatter (which is enough to convince myself I shouldn't throw it!). But, I likely cannot predict the precise location and number of shards of glass on the floor (and why would I need to?). In light of the promise of reasoning with a more compact model, RL research has long sought the appropriate algorithms for learning and using abstractions of different kinds. For an excellent recent overview of abstraction in RL, see On the Necessity of Abstraction by Konidaris (2018).
In this post, I'll give a birds eye view look at some of the literature on abstraction in RL (as of March 2019 or so). I tend to break abstraction (as studied in RL) into three categories:
- State Abstraction: How can we form abstract states that support effective decision making?
Examples: State aggregation is the main method, though there are connections to feature discovery, basis function discovery, and certain kinds of value function approximation.
- Action Abstraction: How can we form abstract actions that support effective decision making?
Examples: Options, macro-actions.
- Hierarchical Abstraction: How can we find the right hierarchical abstraction that can support effective decision making?
Examples: MAXQ, HAMS, RMax-Q, Abstract MDPs.
For each category, I'll give a brief description, then highlight what I consider to 1) be the right papers to read to dive into each body of literature, and 2) a collection of other relevant work. I've almost certainly missed many important papers, so do reach out if you find that some are missing.
Samuel Laferriere pointed out that it can be unclear at times what precisely differentiates these three categories. Indeed, the boundaries between these three are blurry, and especially so if we account for the variance in their definitions/treatments in existing literature. Still, I tend to make the following distinction:
- State abstraction is defined in terms of functions that transform one state space into another
- Action abstraction is defined as modifying the action space in some way.
- Hierarchical abstraction is the repetition of either or both of these processes to form multi-level structures.
It's worth noting that abstraction has been studied both for learning purposes and for planning, in which the full model of the world is given as input, and a policy/behavior is requested as output. Naturally, abstractions that are useful for planning can give rise to efficient methods for model-based RL, so some of the referenced work will be more focused on planning as it relates to RL. Lastly, lots of excellent work has focused on abstraction in multi-agent systems; this is not the focus here.
Let's dive in!
State Abstraction
The traditional view of state abstraction is to define a function, \(\phi : S \rightarrow S_\phi\), that projects each state in the original MDPs state space to an abstract state (where usually \(|S_\phi| \ll |S|\)). The goal is to form a smaller state space that can still preserve the essential characteristics of the problem posed by the true MDP; the smaller state space, hopefully the lower the sample complexity/regret of RL.
The papers I would recommend starting with:
Chronology of other work to look at (shamelessly including a few of my recent papers in the area):
- Approximations of Dynamic Programs, I by Whitt (1978).
- Adaptive aggregation methods for infinite horizon dynamic programming by Bertsekas and Castanon (1988).
- Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces by Moore (1991)
- Reinforcement Learning with Soft State Aggregation by Singh, Jakkolla, and Jordan (1995).
- Instance-Based State Identification for Reinforcement Learning by McCallum (1995)
- Flexible decomposition algorithms for weakly coupled Markov decision problems by Parr (1998)
- The Complexity of Model Aggregation by Goldsmith and Sloan (2000).
- Q-Cut – Dynamic Discovery of Sub-goals in Reinforcement Learning by Menache, Mannor, and Shimkin (2002)
- Approximate Equivalence of Markov Decision Processes by Even-Dar and Mansour (2003)
- Approximate Homomorphisms: A framework for non-exact minimization in Markov Decision Processes by Ravindran and Barto (2003)
- Relational State Abstractions for Reinforcement Learning by Morales (2004).
- Adaptive state space partitioning for reinforcement learning by Lee and Lau (2004).
- Metrics for Finite Markov Decision Processes by Ferns, Panangaden, and Precup (2004).
- Metrics for Finite Markov Decision Processes with Infinite State Spaces by Ferns, Panangaden, and Precup (2005).
- State Abstraction Discovery from Irrelevant State Variables by Jong and Stone (2005).
- Transferring State Abstractions Between MDPs by Li, Walsh, and Littman (2006).
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation by Van Roy (2006).
- Automatic Task Decomposition and State Abstraction from Demonstration by Cobo, Isbell, and Thomas (2008).
- Selecting the State-Representation in Reinforcement Learning by Maillard, Munos, and Ryabko (2014).
- State Aggregation in Monte Carlo Tree Search by Hostetler, Fern, and Dietterich (2014).
- Improving UCT Planning via Approximate Homomorphisms by Jiang, Singh, and Lewis (2014).
- Extreme State Aggregation Beyond MDPs by Hutter (2014).
- Abstraction Selection in Model-Based Reinforcement Learning by Jiang, Kulesza, and Singh (2015).
- Near Optimal Behavior via Approximate State Abstraction by Abel, Hershkowitz, and Littmam (2016).
- Regularizing Reinforcement Learning with State Abstraction by Akrour, Veiga, Peters, and Neumann (2018).
- State Abstractions for Lifelong Reinforcement Learning by Abel et al. (2018).
- Approximate Exploration through State Abstraction by Taiga, Courbille, and Bellemare (2018).
- State Abstraction as Compression in Apprenticeship Learning by Abel et al. (2019).
- Performance Guarantees for Homomorphisms Beyond Markov Decision Processes by Majeed and Hutter (2019).
Action Abstraction
Action abstraction defines methods that enrich the action space of an MDP, typically by incorporating macro-actions that encode long horizon sequences of actions. The formalisms for encoding these higher level actions are diverse, though recently the options formalism [Sutton, Precup, Singh, 1999] has become most prominent. An option, \(o\), is a triple, \(I_o, \beta_o, \pi_o\), indicating:
- \(I_o \subset S\): An initation condition indicating the states in which the option can be executed.
- \(\beta_o \subset S\): A termination condition indicating where the option terminates.
- \(\pi_o : S \rightarrow A\): A policy.
An option effectively defines a skill that takes an agent from a precondition (\(I_o\)) to a post condition (\(\beta\)), according to \(\pi_o\). Typically \(\beta\) is presented as a conditional probability distribution, conditioned on \(S\), with support \([0,1]\), but we here take the deterministic view for simplicity. Much of recent work on action abstraction has concentrated on discovering useful options, clarifying how options impact learning and planning, and extending options to transfer across tasks.
The papers I would recommend starting with:
Chronology of other work to look at:
- Roles of Macro-Actions in Accelerating Reinforcement Learning by McGovern, Sutton, and Fagg (1996).
- PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning by Pickett and Barto (2002).
- SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes by Ravindran and Barto (2003).
- Skill characterization based on betweenness by Simsek and Barto (2008).
- Efficient Skill Learning using Abstraction Selection by Konidaris and Barto (2009).
- Skill discovery in continuous reinforcement learning domains using skill chaining by Konidaris and Barto (2009).
- Learning Parameterized Skills by Da Silva, Konidaris, and Barto (2012).
- Time-Regularized Interrupting Options by Mankowitz, Mann, and Mannor (2014).
- Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes by Topin et al. (2015).
- PAC Inspired Option Discovery in Lifelong Reinforcement Learning by Brunskill and Li (2014).
- A Laplacian Framework for Option Discovery in Reinforcement Learning by Machado, Bellemare, and Bowling (2017).
- Exploration – Exploitation in MDPs with Options by Fruit and Lazaric (2017).
- Regret Minimization in MDPs with Options without Prior Knowledge by Fruit and Lazaric (2017).
- Adaptive Skills Adaptive Partitions (ASAP) by Mankowitz, Mann, and Mannor (2016).
- Learning with Options that Terminate Off-Policy by Harutyunyan et al. (2017).
- The Option Critic by Bacon, Harb, and Precup (2017).
- When Waiting Is Not an Option: Learning Options with a Deliberation Cost by Harb, Bacon, Klissarov, and Precup (2018).
- Natural Option Critic by Tiwari and Thomas (2019).
Hierarchical Abstraction
Hierarchical abstraction captures approaches that repeatedly abstract entities involved in RL, each time inducing a coarser entity than the previous iteration. The methods here are too diverse to exhaustively describe with one formalism, though again options have taken hold recently as the primary method for characterizing abstractions. In this case of a hierarchy, this means we define a list of options, \(O^{(n)} = \{O^1, O^2, \ldots, O^n\}\), where the options at level \(i\), denoted \(O^{i}\), take on policies that output options over level \(i-1\) options. Again, the formalisms are diverse, but this is the main idea: repeatedly abstract using whatever formalism we'd like, until all that's left is a simple characterization of the problem of relevance. The main focus of the literature has been, like state and action abstraction, better understanding how to discover and compute the right abstractions efficiently, clarify when and why abstraction helps, and identify hierarchies that preserve near-optimal behavior.
The papers I would recommend starting with:
Chronology of other work to look at:
- Hierarchical Learning in Stochastic Domains: Preliminary Results by Kaelbling (1993)
- HQ-Learning by Wiering and Schmidhuber (1997)
- Discovering Hierarchy in Reinforcement Learning with HEXQ by Hengst (2002)
- Model Minimization in Hierarchical Reinforcement Learning by Ravindran and Barto (2002).
- Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization by Bakker and Schmidhuber (2004)
- Transfer in variable-reward hierarchical reinforcement learning by Mehta, Natarajan, Tadepalli, and Fern (2008)
- Hierarchical Model-Based Reinforcement Learning: R-Max + Max-Q by Jong and Stone (2008)
- Bayesian Hierarchical Reinforcement Learning by Cao and Ray (2012)
- Optimal Behavior Hierarchy by Solway et al. (2014)
- Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation by Kulkarni, Narasimhan, Saeedi, and Tenenbaum (2016)
- Constructing Abstraction Hierarchies Using a Skill-Symbol Loop by Konidaris (2016)
- Markovian State and Action Abstractions for MDPs via Hierarchical MCTS by Bai, Srivastava, and Russell (2016).
- FeUdal Networks for Hierarchical Reinforcement Learning by Vezhnevets et. al. (2017)
- Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions by Bai and Russell (2017)
- Stochastic Neural Networks for Hierarchical Reinforcement Learning by Florensa, Duan, and Abbeel (2017)
- From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning by Konidaris, Kaelbling, Lozano-Perez (2018)
- On the Necessity of Abstraction by Konidaris (2018).
- Hierarchical Imitation and Reinforcement Learning by Le et. al. (2018)
- Data-Efficient Hierarchical Reinforcement Learning by Nachum et al. (2018).
- Learning Multi-Level Hierarchies with Hindsight by Levy et al. (2019).
- Combined Reinforcement Learning via Abstract Representations by Francois-Lavet et al. (2019).
- Near-Optimal Representation Learning for Hierarchical Reinforcement Learning by Nachum et al. (2019)
I hope folks find this helpful. Let me know if you find errors or that the list is missing some papers (I'm sure I've missed some important ones!)