epsilon_greedy_policy¶

Module epsilon_greedy_policy. Implements epsilon-greedy policy with various decay options

class epsilon_greedy_policy.EpsilonDecayOption(value)¶: Options for reducing epsilon

class epsilon_greedy_policy.EpsilonGreedyConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶: Configuration class for EpsilonGreedyPolicy

class epsilon_greedy_policy.EpsilonGreedyPolicy(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶

Epsilon-greedy policy implementation

__call__(q_table: QTable, state: State) → int¶

Execute the policy

Parameters

q_table (The q-table to use) –
state (The state observed) –

Return type

An integer representing the action index

__init__(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶

Constructor. Initialize a policy with the given options

Parameters

eps (The initial epsilon) –
n_actions (How many actions the environment assumes) –
decay_op (How to decay epsilon) –
max_eps (The maximum epsilon) –
min_eps (The minimum epsilon) –
epsilon_decay_factor (A decay factor used when decay_op = CONSTANT_RATE) –
user_defined_decrease_method (A user defined callable to decay epsilon) –

__str__() → str¶

Returns the name of the policy

Return type: A string representing the name of the policy

actions_after_episode(episode_idx: int, **options) → None¶

Any actions the policy should execute after the episode ends

Parameters

episode_idx (The episode index) –
options (Any options passed by the client code) –

Return type

None

classmethod from_config(config: EpsilonGreedyConfig)¶

Construct a policy from the given configuration

Parameters: config (The configuration to use) –
Return type: An instance of EpsilonGreedyPolicy class

on_state(state: State) → int¶

Returns the optimal action on the current state

Parameters: state (The state observed) –
Return type: An integer representing the action index