simple_rl package

Subpackages

Submodules

simple_rl.run_experiments module

Code for running experiments where RL agents interact with an MDP.

Instructions:
  1. Create an MDP.
  2. Create agents.
  3. Set experiment parameters (instances, episodes, steps).
  4. Call run_agents_on_mdp(agents, mdp) (or the lifelong/markov game equivalents).

-> Runs all experiments and will open a plot with results when finished.

Author: David Abel (cs.brown.edu/~dabel/)

simple_rl.run_experiments.choose_mdp(mdp_name, env_name='Asteroids-v0')[source]
Args:
mdp_name (str): one of {gym, grid, chain, taxi, ...} gym_env_name (str): gym environment name, like 'CartPole-v0'
Returns:
(MDP)
simple_rl.run_experiments.evaluate_agent(agent, mdp, instances=10)[source]
Args:
agent (simple_rl.Agent) mdp (simple_rl.MDP) instances (int)
Returns:
(float): Avg. cumulative discounted reward.
simple_rl.run_experiments.main()[source]
simple_rl.run_experiments.parse_args()[source]
simple_rl.run_experiments.play_markov_game(agent_ls, markov_game_mdp, instances=10, episodes=100, steps=30, verbose=False, open_plot=True)[source]
Args:
agent_list (list of Agents): See agents/AgentClass.py (and friends). markov_game_mdp (MarkovGameMDP): See mdp/markov_games/MarkovGameMDPClass.py. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of times to run each agent (for confidence intervals). verbose (bool) open_plot (bool): If true opens plot.
simple_rl.run_experiments.reproduce_from_exp_file(exp_name, results_dir_name='results', open_plot=True)[source]
Args:
exp_name (str) results_dir_name (str) open_plot (bool)
Summary:
Extracts the agents, MDP, and parameters from the file and runs the experiment. Stores data in "results_dir_name/exp_name/reproduce_i/*", where "i" is determined based on the existence of earlier "reproduce" files.
simple_rl.run_experiments.run_agents_lifelong(agents, mdp_distr, samples=5, episodes=1, steps=100, clear_old_results=True, open_plot=True, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False, cumulative_plot=True, dir_for_plot='results')[source]
Args:

agents (list) mdp_distr (MDPDistribution) samples (int) episodes (int) steps (int) clear_old_results (bool) open_plot (bool) verbose (bool) track_disc_reward (bool): If true records and plots discounted reward, discounted over episodes. So, if

each episode is 100 steps, then episode 2 will start discounting as though it's step 101.

reset_at_terminal (bool) resample_at_terminal (bool) cumulative_plot (bool) dir_for_plot (str)

Summary:
Runs each agent on the MDP distribution according to the given parameters. If @mdp_distr has a non-zero horizon, then gamma is set to 1 and @steps is ignored.
simple_rl.run_experiments.run_agents_on_mdp(agents, mdp, instances=5, episodes=100, steps=200, clear_old_results=True, rew_step_count=1, track_disc_reward=False, open_plot=True, verbose=False, reset_at_terminal=False, cumulative_plot=True, dir_for_plot='results', experiment_name_prefix='')[source]
Args:
agents (list of Agents): See agents/AgentClass.py (and friends). mdp (MDP): See mdp/MDPClass.py for the abstract class. Specific MDPs in tasks/*. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of steps per episode. clear_old_results (bool): If true, removes all results files in the relevant results dir. rew_step_count (int): Number of steps before recording reward. track_disc_reward (bool): If true, track (and plot) discounted reward. open_plot (bool): If true opens the plot at the end. verbose (bool): If true, prints status bars per episode/instance. reset_at_terminal (bool): If true sends the agent to the start state after terminal. cumulative_plot (bool): If true makes a cumulative plot, otherwise plots avg. reward per timestep. dir_for_plot (str): Path experiment_name_prefix (str): Adds this to the end of the usual experiment name.
Summary:
Runs each agent on the given mdp according to the given parameters. Stores results in results/<agent_name>.csv and automatically generates a plot and opens it.
simple_rl.run_experiments.run_single_agent_on_mdp(agent, mdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]
Summary:
Main loop of a single MDP experiment.
Returns:
(tuple): (bool:reached terminal, int: num steps taken, float: cumulative discounted reward)
simple_rl.run_experiments.run_single_belief_agent_on_pomdp(belief_agent, pomdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]
Args:
belief_agent: pomdp: episodes: steps: experiment: verbose: track_disc_reward: reset_at_terminal: resample_at_terminal:

Returns:

Module contents

simple_rl
abstraction/
action_abs/ state_abs/ ...
agents/
AgentClass.py QLearningAgentClass.py RandomAgentClass.py RMaxAgentClass.py ...
experiments/
ExperimentClass.py ExperimentParameters.py
mdp/
MDPClass.py StateClass.py
planning/
BeliefSparseSamplingClass.py MCTSClass.py PlannerClass.py ValueIterationClass.py
pomdp/
BeliefMDPClass.py BeliefStateClass.py BeliefUpdaterClass.py POMDPClass.py
tasks/
chain/
ChainMDPClass.py ChainStateClass.py
grid_world/
GridWorldMPDClass.py GridWorldStateClass.py

...

utils/
chart_utils.py make_mdp.py

run_experiments.py

Author and Maintainer: David Abel (david_abel.github.io) Last Updated: August 27th, 2018 Contact: david_abel@brown.edu License: Apache