simple_rl.agents package

Subpackages

Submodules

simple_rl.agents.AgentClass module

AgentClass.py: Class for a basic RL Agent

class simple_rl.agents.AgentClass.Agent(name, actions, gamma=0.99)[source]

Bases: object

Abstract Agent class.

act(state, reward)[source]
Args:
state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:
(str): action.
end_of_episode()[source]
Summary:
Resets the agents prior pointers.
get_name()[source]
get_parameters()[source]
Returns:
(dict) key=param_name (str) --> val=param_val (object).
policy(state)[source]
reset()[source]
Summary:
Resets the agent back to its tabula rasa config.
set_name(name)[source]

simple_rl.agents.BeliefAgentClass module

class simple_rl.agents.BeliefAgentClass.BeliefAgent(name, actions, gamma=0.99)[source]

Bases: simple_rl.agents.AgentClass.Agent

act(belief_state, reward)[source]
Args:
belief_state (BeliefState) reward (float)
Returns:
action (str)
policy(belief_state)[source]
Args:
belief_state (BeliefState)
Returns:
action (str)

simple_rl.agents.DelayedQAgentClass module

DelayedQAgentClass.py: Class for Delayed Q-Learning from [Strehl et al. 2006].

Author: Yuu Jinnai (ddyuudd@gmail.com)

class simple_rl.agents.DelayedQAgentClass.DelayedQAgent(actions, init_q=None, name='Delayed-Q', gamma=0.99, m=5, epsilon1=0.1)[source]

Bases: simple_rl.agents.AgentClass.Agent

Delayed-Q Learning Agent (Strehl, A.L., Li, L., Wiewiora, E., Langford, J. and Littman, M.L., 2006. PAC model-free reinforcement learning).

act(state, reward, learning=True)[source]
Args:
state (State) reward (float)
Summary:
The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)
end_of_episode()[source]
Summary:
Resets the agents prior pointers.
get_action_distr(state, beta=0.2)[source]
Args:
state (State) beta (float): Softmax temperature parameter.
Returns:
(list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)
get_max_q_action(state)[source]
Args:
state (State)
Returns:
(str): denoting the action with the max q value in the given @state.
get_max_q_value(state)[source]
Args:
state (State)
Returns:
(float): denoting the max q value in the given @state.
get_parameters()[source]
Returns:
(dict) key=param_name (str) --> val=param_val (object).
get_q_value(state, action)[source]
Args:
state (State) action (str)
Returns:
(float): denoting the q value of the (@state, @action) pair.
greedy_q_policy(state)[source]
Args:
state (State)
Returns:
(str): action.
reset()[source]

Summary: Resets the agent back to its tabula rasa config.

set_q_function(q_func)[source]

Set initial Q-function. For PAC-MDP, initial Q(s, a) should be an upper bound of Q*(s, a).

set_vmax()[source]

Initialize Q-values to be Vmax.

update(state, action, reward, next_state)[source]
Args:
state (State) action (str) reward (float) next_state (State)
Summary:
Updates the internal Q Function according to the Bellman Equation. (Classic Q Learning update)

simple_rl.agents.DoubleQAgentClass module

DoubleQAgentClass.py: Class for an RL Agent acting according to Double Q Learning from:

Hasselt, H. V. (2010). Double Q-learning. In Advances in Neural Information Processing Systems (pp. 2613-2621).

Author: David Abel

class simple_rl.agents.DoubleQAgentClass.DoubleQAgent(actions, name='Double-Q', alpha=0.05, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]

Bases: simple_rl.agents.QLearningAgentClass.QLearningAgent

Class for an agent using Double Q Learning.

act(state, reward)[source]
Args:
state (State) reward (float)
Summary:
The central method called during each time step. Retrieves the action according to the current policy and performs updates.
get_avg_q_value(state, action)[source]
Args:
state (State) action (str)
Returns:
(float): denoting the avg. q value of the (@state, @action) pair.
get_max_q_action(state, q_func_id=None)[source]
Args:
state (State) q_func_id (str): either "A" or "B"
Returns:
(str): denoting the action with the max q value in the given @state.
get_max_q_value(state, q_func_id=None)[source]
Args:
state (State) q_func_id (str): either "A" or "B"
Returns:
(float): denoting the max q value in the given @state.
get_q_value(state, action, q_func_id=None)[source]
Args:
state (State) action (str) q_func_id (str): either "A", "B", or defaults to taking the average.
Returns:
(float): denoting the q value of the (@state, @action) pair relative to
the specified q function.
reset()[source]

Summary: Resets the agent back to its tabula rasa config.

update(state, action, reward, next_state)[source]
Args:
state (State) action (str) reward (float) next_state (State)
Summary:
Updates the internal Q Function according to the Double Q update:

simple_rl.agents.FixedPolicyAgentClass module

FixedPolicyAgentClass.py: Class for a basic RL Agent

class simple_rl.agents.FixedPolicyAgentClass.FixedPolicyAgent(policy, name='fixed-policy')[source]

Bases: simple_rl.agents.AgentClass.Agent

Agent Class with a fixed policy.

NAME = 'fixed-policy'
act(state, reward)[source]
Args:
state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:
(str): action.
set_policy(new_policy)[source]

simple_rl.agents.QLearningAgentClass module

QLearningAgentClass.py: Class for a basic QLearningAgent

class simple_rl.agents.QLearningAgentClass.QLearningAgent(actions, name='Q-learning', alpha=0.1, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]

Bases: simple_rl.agents.AgentClass.Agent

Implementation for a Q Learning Agent

act(state, reward, learning=True)[source]
Args:
state (State) reward (float)
Returns:
(str)
Summary:
The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)
end_of_episode()[source]
Summary:
Resets the agents prior pointers.
epsilon_greedy_q_policy(state)[source]
Args:
state (State)
Returns:
(str): action.
get_action_distr(state, beta=0.2)[source]
Args:
state (State) beta (float): Softmax temperature parameter.
Returns:
(list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)
get_max_q_action(state)[source]
Args:
state (State)
Returns:
(str): denoting the action with the max q value in the given @state.
get_max_q_value(state)[source]
Args:
state (State)
Returns:
(float): denoting the max q value in the given @state.
get_parameters()[source]
Returns:
(dict) key=param_name (str) --> val=param_val (object).
get_q_value(state, action)[source]
Args:
state (State) action (str)
Returns:
(float): denoting the q value of the (@state, @action) pair.
get_value(state)[source]
Args:
state (State)
Returns:
(float)
reset()[source]

Summary: Resets the agent back to its tabula rasa config.

soft_max_policy(state)[source]
Args:
state (State): Contains relevant state information.
Returns:
(str): action.
update(state, action, reward, next_state)[source]
Args:
state (State) action (str) reward (float) next_state (State)
Summary:
Updates the internal Q Function according to the Bellman Equation. (Classic Q Learning update)

simple_rl.agents.RMaxAgentClass module

RMaxAgentClass.py: Class for an RMaxAgent from [Brafman and Tennenholtz 2003].

Notes:
  • Assumes WLOG reward function codomain is [0,1] (so RMAX is 1.0)
class simple_rl.agents.RMaxAgentClass.RMaxAgent(actions, gamma=0.95, horizon=4, s_a_threshold=1, name='RMax-h')[source]

Bases: simple_rl.agents.AgentClass.Agent

Implementation for an R-Max Agent [Brafman and Tennenholtz 2003]

act(state, reward)[source]
Args:
state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:
(str): action.
get_max_q_action(state, horizon=None)[source]
Args:
state (State) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:
(str): The string associated with the action with highest Q value.
get_max_q_value(state, horizon=None)[source]
Args:
state (State) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:
(float): The Q value of the best action in this state.
get_num_known_sa()[source]
get_q_value(state, action, horizon=None)[source]
Args:
state (State) action (str) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:
(float)
is_known(s, a)[source]
reset()[source]
Summary:
Resets the agent back to its tabula rasa config.
update(state, action, reward, next_state)[source]
Args:
state (State) action (str) reward (float) next_state (State)
Summary:
Updates T and R.

simple_rl.agents.RandomAgentClass module

RandomAgentClass.py: Class for a randomly acting RL Agent

class simple_rl.agents.RandomAgentClass.RandomAgent(actions, name='')[source]

Bases: simple_rl.agents.AgentClass.Agent

Class for a random decision maker.

act(state, reward)[source]
Args:
state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:
(str): action.

Module contents

Implementations of standard RL agents:

AgentClass: Contains the basic skeleton of an RL Agent. QLearningAgentClass: Q-Learning. LinearQAgentClass: Q-Learning with a Linear Approximator. RandomAgentClass: Random actor. RMaxAgentClass: R-Max. LinUCBAgentClass: Contextual Bandit Algorithm.