simple_rl.planning package¶

Submodules¶

simple_rl.planning.BeliefSparseSamplingClass module¶

class simple_rl.planning.BeliefSparseSamplingClass.BeliefSparseSampling(gen_model, gamma, tol, max_reward, state, name='bss')[source]¶

Bases: object

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes (Kearns et al)

Assuming that you don't have access to the underlying transition dynamics, but do have access to a naiive generative model of the underlying MDP, this algorithm performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states in the MDP.

plan_from_state(state)[source]¶

Args:: state (State): the current state in the MDP
Returns:: action (str): near-optimal action to perform from state

run(verbose=True)[source]¶

simple_rl.planning.BoundedRTDPClass module¶

BoundedRTDPClass.py: Contains the Bounded-RTPDP solver class.

class simple_rl.planning.BoundedRTDPClass.BoundedRTDP(mdp, lower_values_init, upper_values_init, tau=10.0, name='BRTDP')[source]¶

Bases: simple_rl.planning.PlannerClass.Planner

Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees (McMahan et al)

The Bounded RTDP solver can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function.

plan(state=None, horizon=100)[source]¶

Main function of the Planner class. Args:

state (State) horizon (int)

Returns:: policy (defaultdict)

policy(state)[source]¶

Args:: state (State)
Returns:: action (str)

run_sample_trial(verbose=False)[source]¶

simple_rl.planning.MCTSClass module¶

MCTSClass.py: Class for a basic Monte Carlo Tree Search Planner.

class simple_rl.planning.MCTSClass.MCTS(mdp, name='mcts', explore_param=1.4142135623730951, rollout_depth=20, num_rollouts_per_step=10)[source]¶

Bases: simple_rl.planning.PlannerClass.Planner

plan(cur_state, horizon=20)[source]¶

Args:: cur_state (State) horizon (int)
Returns:: (list): List of actions

policy(state)[source]¶

Args:: state (State)
Returns:: (str)

simple_rl.planning.PlannerClass module¶

class simple_rl.planning.PlannerClass.Planner(mdp, name='planner')[source]¶

Bases: object

Abstract class for a Planner.

simple_rl.planning.ValueIterationClass module¶

class simple_rl.planning.ValueIterationClass.ValueIteration(mdp, name='value_iter', delta=0.0001, max_iterations=500, sample_rate=3)[source]¶

Bases: simple_rl.planning.PlannerClass.Planner

get_gamma()[source]¶

get_max_q_actions(state)[source]¶

Args:: state (State)
Returns:: (list): List of actions with the max q value in the given @state.

get_num_backups_in_recent_run()[source]¶

get_num_states()[source]¶

get_q_value(s, a)[source]¶

Args:: s (State) a (str): action
Returns:: (float): The Q estimate given the current value function @self.value_func.

get_states()[source]¶

get_value(s)[source]¶

Args:: s (State)
Returns:: (float)

plan(state=None, horizon=100)[source]¶

Args:: state (State) horizon (int)
Returns:: (list): List of actions

policy(state)[source]¶

Args:: state (State)
Returns:: (str): Action
Summary:: For use in a FixedPolicyAgent.

print_value_func()[source]¶

run_vi()[source]¶

Returns:

(tuple):

(int): num iterations taken.
(float): value.

Summary:

Runs ValueIteration and fills in the self.value_func.

Module contents¶

Implementations of standard planning algorithms:

PlannerClass: Abstract class for a planner ValueIterationClass: Value Iteration. MCTSClass: Monte Carlo Tree Search.

simple_rl.planning package¶

Submodules¶

simple_rl.planning.BeliefSparseSamplingClass module¶

simple_rl.planning.BoundedRTDPClass module¶

simple_rl.planning.MCTSClass module¶

simple_rl.planning.PlannerClass module¶

simple_rl.planning.ValueIterationClass module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page