simple_rl package¶

Subpackages¶

Submodules¶

simple_rl.run_experiments module¶

Code for running experiments where RL agents interact with an MDP.

Instructions:

Create an MDP.
Create agents.
Set experiment parameters (instances, episodes, steps).
Call run_agents_on_mdp(agents, mdp) (or the lifelong/markov game equivalents).

-> Runs all experiments and will open a plot with results when finished.

Author: David Abel (cs.brown.edu/~dabel/)

simple_rl.run_experiments.choose_mdp(mdp_name, env_name='Asteroids-v0')[source]¶

Args:: mdp_name (str): one of {gym, grid, chain, taxi, ...} gym_env_name (str): gym environment name, like 'CartPole-v0'
Returns:: (MDP)

simple_rl.run_experiments.evaluate_agent(agent, mdp, instances=10)[source]¶

Args:: agent (simple_rl.Agent) mdp (simple_rl.MDP) instances (int)
Returns:: (float): Avg. cumulative discounted reward.

simple_rl.run_experiments.main()[source]¶

simple_rl.run_experiments.parse_args()[source]¶

simple_rl.run_experiments.play_markov_game(agent_ls, markov_game_mdp, instances=10, episodes=100, steps=30, verbose=False, open_plot=True)[source]¶

Args:: agent_list (list of Agents): See agents/AgentClass.py (and friends). markov_game_mdp (MarkovGameMDP): See mdp/markov_games/MarkovGameMDPClass.py. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of times to run each agent (for confidence intervals). verbose (bool) open_plot (bool): If true opens plot.

simple_rl.run_experiments.reproduce_from_exp_file(exp_name, results_dir_name='results', open_plot=True)[source]¶

Args:: exp_name (str) results_dir_name (str) open_plot (bool)
Summary:: Extracts the agents, MDP, and parameters from the file and runs the experiment. Stores data in "results_dir_name/exp_name/reproduce_i/*", where "i" is determined based on the existence of earlier "reproduce" files.

simple_rl.run_experiments.run_agents_lifelong(agents, mdp_distr, samples=5, episodes=1, steps=100, clear_old_results=True, open_plot=True, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False, cumulative_plot=True, dir_for_plot='results')[source]¶

Args:

agents (list) mdp_distr (MDPDistribution) samples (int) episodes (int) steps (int) clear_old_results (bool) open_plot (bool) verbose (bool) track_disc_reward (bool): If true records and plots discounted reward, discounted over episodes. So, if

each episode is 100 steps, then episode 2 will start discounting as though it's step 101.

reset_at_terminal (bool) resample_at_terminal (bool) cumulative_plot (bool) dir_for_plot (str)

Summary:

Runs each agent on the MDP distribution according to the given parameters. If @mdp_distr has a non-zero horizon, then gamma is set to 1 and @steps is ignored.

simple_rl.run_experiments.run_agents_on_mdp(agents, mdp, instances=5, episodes=100, steps=200, clear_old_results=True, rew_step_count=1, track_disc_reward=False, open_plot=True, verbose=False, reset_at_terminal=False, cumulative_plot=True, dir_for_plot='results', experiment_name_prefix='')[source]¶

Args:: agents (list of Agents): See agents/AgentClass.py (and friends). mdp (MDP): See mdp/MDPClass.py for the abstract class. Specific MDPs in tasks/*. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of steps per episode. clear_old_results (bool): If true, removes all results files in the relevant results dir. rew_step_count (int): Number of steps before recording reward. track_disc_reward (bool): If true, track (and plot) discounted reward. open_plot (bool): If true opens the plot at the end. verbose (bool): If true, prints status bars per episode/instance. reset_at_terminal (bool): If true sends the agent to the start state after terminal. cumulative_plot (bool): If true makes a cumulative plot, otherwise plots avg. reward per timestep. dir_for_plot (str): Path experiment_name_prefix (str): Adds this to the end of the usual experiment name.
Summary:: Runs each agent on the given mdp according to the given parameters. Stores results in results/<agent_name>.csv and automatically generates a plot and opens it.

simple_rl.run_experiments.run_single_agent_on_mdp(agent, mdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]¶

Summary:: Main loop of a single MDP experiment.
Returns:: (tuple): (bool:reached terminal, int: num steps taken, float: cumulative discounted reward)

simple_rl.run_experiments.run_single_belief_agent_on_pomdp(belief_agent, pomdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]¶

Args:: belief_agent: pomdp: episodes: steps: experiment: verbose: track_disc_reward: reset_at_terminal: resample_at_terminal:

Returns:

Module contents¶

simple_rl

abstraction/

action_abs/ state_abs/ ...

agents/

AgentClass.py QLearningAgentClass.py RandomAgentClass.py RMaxAgentClass.py ...

experiments/

ExperimentClass.py ExperimentParameters.py

mdp/

MDPClass.py StateClass.py

planning/

BeliefSparseSamplingClass.py MCTSClass.py PlannerClass.py ValueIterationClass.py

pomdp/

BeliefMDPClass.py BeliefStateClass.py BeliefUpdaterClass.py POMDPClass.py

tasks/

chain/: ChainMDPClass.py ChainStateClass.py
grid_world/: GridWorldMPDClass.py GridWorldStateClass.py

...

utils/

chart_utils.py make_mdp.py

run_experiments.py

Author and Maintainer: David Abel (david_abel.github.io) Last Updated: August 27th, 2018 Contact: david_abel@brown.edu License: Apache

simple_rl package¶

Subpackages¶

Submodules¶

simple_rl.run_experiments module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page