simple_rl package¶
Subpackages¶
Submodules¶
simple_rl.run_experiments module¶
Code for running experiments where RL agents interact with an MDP.
- Instructions:
- Create an MDP.
- Create agents.
- Set experiment parameters (instances, episodes, steps).
- Call run_agents_on_mdp(agents, mdp) (or the lifelong/markov game equivalents).
-> Runs all experiments and will open a plot with results when finished.
Author: David Abel (cs.brown.edu/~dabel/)
-
simple_rl.run_experiments.
choose_mdp
(mdp_name, env_name='Asteroids-v0')[source]¶ - Args:
- mdp_name (str): one of {gym, grid, chain, taxi, ...} gym_env_name (str): gym environment name, like 'CartPole-v0'
- Returns:
- (MDP)
-
simple_rl.run_experiments.
evaluate_agent
(agent, mdp, instances=10)[source]¶ - Args:
- agent (simple_rl.Agent) mdp (simple_rl.MDP) instances (int)
- Returns:
- (float): Avg. cumulative discounted reward.
-
simple_rl.run_experiments.
play_markov_game
(agent_ls, markov_game_mdp, instances=10, episodes=100, steps=30, verbose=False, open_plot=True)[source]¶ - Args:
- agent_list (list of Agents): See agents/AgentClass.py (and friends). markov_game_mdp (MarkovGameMDP): See mdp/markov_games/MarkovGameMDPClass.py. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of times to run each agent (for confidence intervals). verbose (bool) open_plot (bool): If true opens plot.
-
simple_rl.run_experiments.
reproduce_from_exp_file
(exp_name, results_dir_name='results', open_plot=True)[source]¶ - Args:
- exp_name (str) results_dir_name (str) open_plot (bool)
- Summary:
- Extracts the agents, MDP, and parameters from the file and runs the experiment. Stores data in "results_dir_name/exp_name/reproduce_i/*", where "i" is determined based on the existence of earlier "reproduce" files.
-
simple_rl.run_experiments.
run_agents_lifelong
(agents, mdp_distr, samples=5, episodes=1, steps=100, clear_old_results=True, open_plot=True, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False, cumulative_plot=True, dir_for_plot='results')[source]¶ - Args:
agents (list) mdp_distr (MDPDistribution) samples (int) episodes (int) steps (int) clear_old_results (bool) open_plot (bool) verbose (bool) track_disc_reward (bool): If true records and plots discounted reward, discounted over episodes. So, if
each episode is 100 steps, then episode 2 will start discounting as though it's step 101.reset_at_terminal (bool) resample_at_terminal (bool) cumulative_plot (bool) dir_for_plot (str)
- Summary:
- Runs each agent on the MDP distribution according to the given parameters. If @mdp_distr has a non-zero horizon, then gamma is set to 1 and @steps is ignored.
-
simple_rl.run_experiments.
run_agents_on_mdp
(agents, mdp, instances=5, episodes=100, steps=200, clear_old_results=True, rew_step_count=1, track_disc_reward=False, open_plot=True, verbose=False, reset_at_terminal=False, cumulative_plot=True, dir_for_plot='results', experiment_name_prefix='')[source]¶ - Args:
- agents (list of Agents): See agents/AgentClass.py (and friends). mdp (MDP): See mdp/MDPClass.py for the abstract class. Specific MDPs in tasks/*. instances (int): Number of times to run each agent (for confidence intervals). episodes (int): Number of episodes for each learning instance. steps (int): Number of steps per episode. clear_old_results (bool): If true, removes all results files in the relevant results dir. rew_step_count (int): Number of steps before recording reward. track_disc_reward (bool): If true, track (and plot) discounted reward. open_plot (bool): If true opens the plot at the end. verbose (bool): If true, prints status bars per episode/instance. reset_at_terminal (bool): If true sends the agent to the start state after terminal. cumulative_plot (bool): If true makes a cumulative plot, otherwise plots avg. reward per timestep. dir_for_plot (str): Path experiment_name_prefix (str): Adds this to the end of the usual experiment name.
- Summary:
- Runs each agent on the given mdp according to the given parameters. Stores results in results/<agent_name>.csv and automatically generates a plot and opens it.
-
simple_rl.run_experiments.
run_single_agent_on_mdp
(agent, mdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]¶ - Summary:
- Main loop of a single MDP experiment.
- Returns:
- (tuple): (bool:reached terminal, int: num steps taken, float: cumulative discounted reward)
-
simple_rl.run_experiments.
run_single_belief_agent_on_pomdp
(belief_agent, pomdp, episodes, steps, experiment=None, verbose=False, track_disc_reward=False, reset_at_terminal=False, resample_at_terminal=False)[source]¶ - Args:
- belief_agent: pomdp: episodes: steps: experiment: verbose: track_disc_reward: reset_at_terminal: resample_at_terminal:
Returns:
Module contents¶
- simple_rl
- abstraction/
- action_abs/ state_abs/ ...
- agents/
- AgentClass.py QLearningAgentClass.py RandomAgentClass.py RMaxAgentClass.py ...
- experiments/
- ExperimentClass.py ExperimentParameters.py
- mdp/
- MDPClass.py StateClass.py
- planning/
- BeliefSparseSamplingClass.py MCTSClass.py PlannerClass.py ValueIterationClass.py
- pomdp/
- BeliefMDPClass.py BeliefStateClass.py BeliefUpdaterClass.py POMDPClass.py
- tasks/
- chain/
- ChainMDPClass.py ChainStateClass.py
- grid_world/
- GridWorldMPDClass.py GridWorldStateClass.py
...
- utils/
- chart_utils.py make_mdp.py
run_experiments.py
Author and Maintainer: David Abel (david_abel.github.io) Last Updated: August 27th, 2018 Contact: david_abel@brown.edu License: Apache