simple_rl.agents package¶
Subpackages¶
Submodules¶
simple_rl.agents.AgentClass module¶
AgentClass.py: Class for a basic RL Agent
simple_rl.agents.BeliefAgentClass module¶
-
class
simple_rl.agents.BeliefAgentClass.
BeliefAgent
(name, actions, gamma=0.99)[source]¶
simple_rl.agents.DelayedQAgentClass module¶
DelayedQAgentClass.py: Class for Delayed Q-Learning from [Strehl et al. 2006].
Author: Yuu Jinnai (ddyuudd@gmail.com)
-
class
simple_rl.agents.DelayedQAgentClass.
DelayedQAgent
(actions, init_q=None, name='Delayed-Q', gamma=0.99, m=5, epsilon1=0.1)[source]¶ Bases:
simple_rl.agents.AgentClass.Agent
Delayed-Q Learning Agent (Strehl, A.L., Li, L., Wiewiora, E., Langford, J. and Littman, M.L., 2006. PAC model-free reinforcement learning).
-
act
(state, reward, learning=True)[source]¶ - Args:
- state (State) reward (float)
- Summary:
- The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)
-
get_action_distr
(state, beta=0.2)[source]¶ - Args:
- state (State) beta (float): Softmax temperature parameter.
- Returns:
- (list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)
-
get_max_q_action
(state)[source]¶ - Args:
- state (State)
- Returns:
- (str): denoting the action with the max q value in the given @state.
-
get_max_q_value
(state)[source]¶ - Args:
- state (State)
- Returns:
- (float): denoting the max q value in the given @state.
-
get_q_value
(state, action)[source]¶ - Args:
- state (State) action (str)
- Returns:
- (float): denoting the q value of the (@state, @action) pair.
-
simple_rl.agents.DoubleQAgentClass module¶
DoubleQAgentClass.py: Class for an RL Agent acting according to Double Q Learning from:
Hasselt, H. V. (2010). Double Q-learning. In Advances in Neural Information Processing Systems (pp. 2613-2621).
Author: David Abel
-
class
simple_rl.agents.DoubleQAgentClass.
DoubleQAgent
(actions, name='Double-Q', alpha=0.05, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]¶ Bases:
simple_rl.agents.QLearningAgentClass.QLearningAgent
Class for an agent using Double Q Learning.
-
act
(state, reward)[source]¶ - Args:
- state (State) reward (float)
- Summary:
- The central method called during each time step. Retrieves the action according to the current policy and performs updates.
-
get_avg_q_value
(state, action)[source]¶ - Args:
- state (State) action (str)
- Returns:
- (float): denoting the avg. q value of the (@state, @action) pair.
-
get_max_q_action
(state, q_func_id=None)[source]¶ - Args:
- state (State) q_func_id (str): either "A" or "B"
- Returns:
- (str): denoting the action with the max q value in the given @state.
-
get_max_q_value
(state, q_func_id=None)[source]¶ - Args:
- state (State) q_func_id (str): either "A" or "B"
- Returns:
- (float): denoting the max q value in the given @state.
-
simple_rl.agents.FixedPolicyAgentClass module¶
FixedPolicyAgentClass.py: Class for a basic RL Agent
-
class
simple_rl.agents.FixedPolicyAgentClass.
FixedPolicyAgent
(policy, name='fixed-policy')[source]¶ Bases:
simple_rl.agents.AgentClass.Agent
Agent Class with a fixed policy.
-
NAME
= 'fixed-policy'¶
-
simple_rl.agents.QLearningAgentClass module¶
QLearningAgentClass.py: Class for a basic QLearningAgent
-
class
simple_rl.agents.QLearningAgentClass.
QLearningAgent
(actions, name='Q-learning', alpha=0.1, gamma=0.99, epsilon=0.1, explore='uniform', anneal=False)[source]¶ Bases:
simple_rl.agents.AgentClass.Agent
Implementation for a Q Learning Agent
-
act
(state, reward, learning=True)[source]¶ - Args:
- state (State) reward (float)
- Returns:
- (str)
- Summary:
- The central method called during each time step. Retrieves the action according to the current policy and performs updates given (s=self.prev_state, a=self.prev_action, r=reward, s'=state)
-
get_action_distr
(state, beta=0.2)[source]¶ - Args:
- state (State) beta (float): Softmax temperature parameter.
- Returns:
- (list of floats): The i-th float corresponds to the probability mass associated with the i-th action (indexing into self.actions)
-
get_max_q_action
(state)[source]¶ - Args:
- state (State)
- Returns:
- (str): denoting the action with the max q value in the given @state.
-
get_max_q_value
(state)[source]¶ - Args:
- state (State)
- Returns:
- (float): denoting the max q value in the given @state.
-
get_q_value
(state, action)[source]¶ - Args:
- state (State) action (str)
- Returns:
- (float): denoting the q value of the (@state, @action) pair.
-
simple_rl.agents.RMaxAgentClass module¶
RMaxAgentClass.py: Class for an RMaxAgent from [Brafman and Tennenholtz 2003].
- Notes:
- Assumes WLOG reward function codomain is [0,1] (so RMAX is 1.0)
-
class
simple_rl.agents.RMaxAgentClass.
RMaxAgent
(actions, gamma=0.95, horizon=4, s_a_threshold=1, name='RMax-h')[source]¶ Bases:
simple_rl.agents.AgentClass.Agent
Implementation for an R-Max Agent [Brafman and Tennenholtz 2003]
-
act
(state, reward)[source]¶ - Args:
- state (State): see StateClass.py reward (float): the reward associated with arriving in state @state.
- Returns:
- (str): action.
-
get_max_q_action
(state, horizon=None)[source]¶ - Args:
- state (State) horizon (int): Indicates the level of recursion depth for computing Q.
- Returns:
- (str): The string associated with the action with highest Q value.
-
get_max_q_value
(state, horizon=None)[source]¶ - Args:
- state (State) horizon (int): Indicates the level of recursion depth for computing Q.
- Returns:
- (float): The Q value of the best action in this state.
-
simple_rl.agents.RandomAgentClass module¶
RandomAgentClass.py: Class for a randomly acting RL Agent
-
class
simple_rl.agents.RandomAgentClass.
RandomAgent
(actions, name='')[source]¶ Bases:
simple_rl.agents.AgentClass.Agent
Class for a random decision maker.
Module contents¶
Implementations of standard RL agents:
AgentClass: Contains the basic skeleton of an RL Agent. QLearningAgentClass: Q-Learning. LinearQAgentClass: Q-Learning with a Linear Approximator. RandomAgentClass: Random actor. RMaxAgentClass: R-Max. LinUCBAgentClass: Contextual Bandit Algorithm.