simple_rl.planning package¶
Submodules¶
simple_rl.planning.BeliefSparseSamplingClass module¶
-
class
simple_rl.planning.BeliefSparseSamplingClass.
BeliefSparseSampling
(gen_model, gamma, tol, max_reward, state, name='bss')[source]¶ Bases:
object
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes (Kearns et al)
Assuming that you don't have access to the underlying transition dynamics, but do have access to a naiive generative model of the underlying MDP, this algorithm performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states in the MDP.
simple_rl.planning.BoundedRTDPClass module¶
BoundedRTDPClass.py: Contains the Bounded-RTPDP solver class.
-
class
simple_rl.planning.BoundedRTDPClass.
BoundedRTDP
(mdp, lower_values_init, upper_values_init, tau=10.0, name='BRTDP')[source]¶ Bases:
simple_rl.planning.PlannerClass.Planner
Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees (McMahan et al)
The Bounded RTDP solver can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function.
simple_rl.planning.MCTSClass module¶
MCTSClass.py: Class for a basic Monte Carlo Tree Search Planner.
-
class
simple_rl.planning.MCTSClass.
MCTS
(mdp, name='mcts', explore_param=1.4142135623730951, rollout_depth=20, num_rollouts_per_step=10)[source]¶
simple_rl.planning.PlannerClass module¶
simple_rl.planning.ValueIterationClass module¶
-
class
simple_rl.planning.ValueIterationClass.
ValueIteration
(mdp, name='value_iter', delta=0.0001, max_iterations=500, sample_rate=3)[source]¶ Bases:
simple_rl.planning.PlannerClass.Planner
-
get_max_q_actions
(state)[source]¶ - Args:
- state (State)
- Returns:
- (list): List of actions with the max q value in the given @state.
-
get_q_value
(s, a)[source]¶ - Args:
- s (State) a (str): action
- Returns:
- (float): The Q estimate given the current value function @self.value_func.
-
plan
(state=None, horizon=100)[source]¶ - Args:
- state (State) horizon (int)
- Returns:
- (list): List of actions
-
Module contents¶
Implementations of standard planning algorithms:
PlannerClass: Abstract class for a planner ValueIterationClass: Value Iteration. MCTSClass: Monte Carlo Tree Search.