simple_rl.planning package

Submodules

simple_rl.planning.BeliefSparseSamplingClass module

class simple_rl.planning.BeliefSparseSamplingClass.BeliefSparseSampling(gen_model, gamma, tol, max_reward, state, name='bss')[source]

Bases: object

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes (Kearns et al)

Assuming that you don't have access to the underlying transition dynamics, but do have access to a naiive generative model of the underlying MDP, this algorithm performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states in the MDP.

plan_from_state(state)[source]
Args:
state (State): the current state in the MDP
Returns:
action (str): near-optimal action to perform from state
run(verbose=True)[source]

simple_rl.planning.BoundedRTDPClass module

BoundedRTDPClass.py: Contains the Bounded-RTPDP solver class.

class simple_rl.planning.BoundedRTDPClass.BoundedRTDP(mdp, lower_values_init, upper_values_init, tau=10.0, name='BRTDP')[source]

Bases: simple_rl.planning.PlannerClass.Planner

Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees (McMahan et al)

The Bounded RTDP solver can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function.

plan(state=None, horizon=100)[source]

Main function of the Planner class. Args:

state (State) horizon (int)
Returns:
policy (defaultdict)
policy(state)[source]
Args:
state (State)
Returns:
action (str)
run_sample_trial(verbose=False)[source]

simple_rl.planning.MCTSClass module

MCTSClass.py: Class for a basic Monte Carlo Tree Search Planner.

class simple_rl.planning.MCTSClass.MCTS(mdp, name='mcts', explore_param=1.4142135623730951, rollout_depth=20, num_rollouts_per_step=10)[source]

Bases: simple_rl.planning.PlannerClass.Planner

plan(cur_state, horizon=20)[source]
Args:
cur_state (State) horizon (int)
Returns:
(list): List of actions
policy(state)[source]
Args:
state (State)
Returns:
(str)

simple_rl.planning.PlannerClass module

class simple_rl.planning.PlannerClass.Planner(mdp, name='planner')[source]

Bases: object

Abstract class for a Planner.

simple_rl.planning.ValueIterationClass module

class simple_rl.planning.ValueIterationClass.ValueIteration(mdp, name='value_iter', delta=0.0001, max_iterations=500, sample_rate=3)[source]

Bases: simple_rl.planning.PlannerClass.Planner

get_gamma()[source]
get_max_q_actions(state)[source]
Args:
state (State)
Returns:
(list): List of actions with the max q value in the given @state.
get_num_backups_in_recent_run()[source]
get_num_states()[source]
get_q_value(s, a)[source]
Args:
s (State) a (str): action
Returns:
(float): The Q estimate given the current value function @self.value_func.
get_states()[source]
get_value(s)[source]
Args:
s (State)
Returns:
(float)
plan(state=None, horizon=100)[source]
Args:
state (State) horizon (int)
Returns:
(list): List of actions
policy(state)[source]
Args:
state (State)
Returns:
(str): Action
Summary:
For use in a FixedPolicyAgent.
print_value_func()[source]
run_vi()[source]
Returns:
(tuple):
  1. (int): num iterations taken.
  2. (float): value.
Summary:
Runs ValueIteration and fills in the self.value_func.

Module contents

Implementations of standard planning algorithms:

PlannerClass: Abstract class for a planner ValueIterationClass: Value Iteration. MCTSClass: Monte Carlo Tree Search.