simple_rl.agents package¶

Subpackages¶

Submodules¶

simple_rl.agents.AgentClass module¶

AgentClass.py: Class for a basic RL Agent

class simple_rl.agents.AgentClass.Agent(name, actions, gamma=0.99)[source]¶

Bases: object

Abstract Agent class.

act(state, reward)[source]¶

Args:: state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:: (str): action.

end_of_episode()[source]¶

Summary:: Resets the agents prior pointers.

get_name()[source]¶

get_parameters()[source]¶

Returns:: (dict) key=param_name (str) --> val=param_val (object).

policy(state)[source]¶

reset()[source]¶

Summary:: Resets the agent back to its tabula rasa config.

set_name(name)[source]¶

simple_rl.agents.BeliefAgentClass module¶

class simple_rl.agents.BeliefAgentClass.BeliefAgent(name, actions, gamma=0.99)[source]¶

Bases: simple_rl.agents.AgentClass.Agent

act(belief_state, reward)[source]¶

Args:: belief_state (BeliefState) reward (float)
Returns:: action (str)

policy(belief_state)[source]¶

Args:: belief_state (BeliefState)
Returns:: action (str)

simple_rl.agents.DelayedQAgentClass module¶

DelayedQAgentClass.py: Class for Delayed Q-Learning from [Strehl et al. 2006].

Author: Yuu Jinnai (ddyuudd@gmail.com)

class simple_rl.agents.DelayedQAgentClass.DelayedQAgent(actions, init_q=None, name='Delayed-Q', gamma=0.99, m=5, epsilon1=0.1)[source]¶

Bases: simple_rl.agents.AgentClass.Agent

Delayed-Q Learning Agent (Strehl, A.L., Li, L., Wiewiora, E., Langford, J. and Littman, M.L., 2006. PAC model-free reinforcement learning).

act(state, reward, learning=True)[source]¶

Args:: state (State) reward (float)
Summary:: The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)

end_of_episode()[source]¶

Summary:: Resets the agents prior pointers.

get_action_distr(state, beta=0.2)[source]¶

Args:: state (State) beta (float): Softmax temperature parameter.
Returns:: (list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)

get_max_q_action(state)[source]¶

Args:: state (State)
Returns:: (str): denoting the action with the max q value in the given @state.

get_max_q_value(state)[source]¶

Args:: state (State)
Returns:: (float): denoting the max q value in the given @state.

get_parameters()[source]¶

Returns:: (dict) key=param_name (str) --> val=param_val (object).

get_q_value(state, action)[source]¶

Args:: state (State) action (str)
Returns:: (float): denoting the q value of the (@state, @action) pair.

greedy_q_policy(state)[source]¶

Args:: state (State)
Returns:: (str): action.

reset()[source]¶: Summary: Resets the agent back to its tabula rasa config.

set_q_function(q_func)[source]¶: Set initial Q-function. For PAC-MDP, initial Q(s, a) should be an upper bound of Q*(s, a).

set_vmax()[source]¶: Initialize Q-values to be Vmax.

update(state, action, reward, next_state)[source]¶

Args:: state (State) action (str) reward (float) next_state (State)
Summary:: Updates the internal Q Function according to the Bellman Equation. (Classic Q Learning update)

simple_rl.agents.DoubleQAgentClass module¶

DoubleQAgentClass.py: Class for an RL Agent acting according to Double Q Learning from:

Hasselt, H. V. (2010). Double Q-learning. In Advances in Neural Information Processing Systems (pp. 2613-2621).

Author: David Abel

class simple_rl.agents.DoubleQAgentClass.DoubleQAgent(actions, name='Double-Q', alpha=0.05, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]¶

Bases: simple_rl.agents.QLearningAgentClass.QLearningAgent

Class for an agent using Double Q Learning.

act(state, reward)[source]¶

Args:: state (State) reward (float)
Summary:: The central method called during each time step. Retrieves the action according to the current policy and performs updates.

get_avg_q_value(state, action)[source]¶

Args:: state (State) action (str)
Returns:: (float): denoting the avg. q value of the (@state, @action) pair.

get_max_q_action(state, q_func_id=None)[source]¶

Args:: state (State) q_func_id (str): either "A" or "B"
Returns:: (str): denoting the action with the max q value in the given @state.

get_max_q_value(state, q_func_id=None)[source]¶

Args:: state (State) q_func_id (str): either "A" or "B"
Returns:: (float): denoting the max q value in the given @state.

get_q_value(state, action, q_func_id=None)[source]¶

Args:

state (State) action (str) q_func_id (str): either "A", "B", or defaults to taking the average.

Returns:

(float): denoting the q value of the (@state, @action) pair relative to: the specified q function.

reset()[source]¶: Summary: Resets the agent back to its tabula rasa config.

update(state, action, reward, next_state)[source]¶

Args:: state (State) action (str) reward (float) next_state (State)
Summary:: Updates the internal Q Function according to the Double Q update:

simple_rl.agents.FixedPolicyAgentClass module¶

FixedPolicyAgentClass.py: Class for a basic RL Agent

class simple_rl.agents.FixedPolicyAgentClass.FixedPolicyAgent(policy, name='fixed-policy')[source]¶

Bases: simple_rl.agents.AgentClass.Agent

Agent Class with a fixed policy.

NAME = 'fixed-policy'¶

act(state, reward)[source]¶

Args:: state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:: (str): action.

set_policy(new_policy)[source]¶

simple_rl.agents.QLearningAgentClass module¶

QLearningAgentClass.py: Class for a basic QLearningAgent

class simple_rl.agents.QLearningAgentClass.QLearningAgent(actions, name='Q-learning', alpha=0.1, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]¶

Bases: simple_rl.agents.AgentClass.Agent

Implementation for a Q Learning Agent

act(state, reward, learning=True)[source]¶

Args:: state (State) reward (float)
Returns:: (str)
Summary:: The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)

end_of_episode()[source]¶

Summary:: Resets the agents prior pointers.

epsilon_greedy_q_policy(state)[source]¶

Args:: state (State)
Returns:: (str): action.

get_action_distr(state, beta=0.2)[source]¶

Args:: state (State) beta (float): Softmax temperature parameter.
Returns:: (list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)

get_max_q_action(state)[source]¶

Args:: state (State)
Returns:: (str): denoting the action with the max q value in the given @state.

get_max_q_value(state)[source]¶

Args:: state (State)
Returns:: (float): denoting the max q value in the given @state.

get_parameters()[source]¶

Returns:: (dict) key=param_name (str) --> val=param_val (object).

get_q_value(state, action)[source]¶

Args:: state (State) action (str)
Returns:: (float): denoting the q value of the (@state, @action) pair.

get_value(state)[source]¶

Args:: state (State)
Returns:: (float)

reset()[source]¶: Summary: Resets the agent back to its tabula rasa config.

soft_max_policy(state)[source]¶

Args:: state (State): Contains relevant state information.
Returns:: (str): action.

update(state, action, reward, next_state)[source]¶

Args:: state (State) action (str) reward (float) next_state (State)
Summary:: Updates the internal Q Function according to the Bellman Equation. (Classic Q Learning update)

simple_rl.agents.RMaxAgentClass module¶

RMaxAgentClass.py: Class for an RMaxAgent from [Brafman and Tennenholtz 2003].

Notes:

Assumes WLOG reward function codomain is [0,1] (so RMAX is 1.0)

class simple_rl.agents.RMaxAgentClass.RMaxAgent(actions, gamma=0.95, horizon=4, s_a_threshold=1, name='RMax-h')[source]¶

Bases: simple_rl.agents.AgentClass.Agent

Implementation for an R-Max Agent [Brafman and Tennenholtz 2003]

act(state, reward)[source]¶

Args:: state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:: (str): action.

get_max_q_action(state, horizon=None)[source]¶

Args:: state (State) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:: (str): The string associated with the action with highest Q value.

get_max_q_value(state, horizon=None)[source]¶

Args:: state (State) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:: (float): The Q value of the best action in this state.

get_num_known_sa()[source]¶

get_q_value(state, action, horizon=None)[source]¶

Args:: state (State) action (str) horizon (int): Indicates the level of recursion depth for computing Q.
Returns:: (float)

is_known(s, a)[source]¶

reset()[source]¶

Summary:: Resets the agent back to its tabula rasa config.

update(state, action, reward, next_state)[source]¶

Args:: state (State) action (str) reward (float) next_state (State)
Summary:: Updates T and R.

simple_rl.agents.RandomAgentClass module¶

RandomAgentClass.py: Class for a randomly acting RL Agent

class simple_rl.agents.RandomAgentClass.RandomAgent(actions, name='')[source]¶

Bases: simple_rl.agents.AgentClass.Agent

Class for a random decision maker.

act(state, reward)[source]¶

Args:: state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
Returns:: (str): action.

Module contents¶

Implementations of standard RL agents:

AgentClass: Contains the basic skeleton of an RL Agent. QLearningAgentClass: Q-Learning. LinearQAgentClass: Q-Learning with a Linear Approximator. RandomAgentClass: Random actor. RMaxAgentClass: R-Max. LinUCBAgentClass: Contextual Bandit Algorithm.

simple_rl.agents package¶

Subpackages¶

Submodules¶

simple_rl.agents.AgentClass module¶

simple_rl.agents.BeliefAgentClass module¶

simple_rl.agents.DelayedQAgentClass module¶

simple_rl.agents.DoubleQAgentClass module¶

simple_rl.agents.FixedPolicyAgentClass module¶

simple_rl.agents.QLearningAgentClass module¶

simple_rl.agents.RMaxAgentClass module¶

simple_rl.agents.RandomAgentClass module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page