epsilon_greedy_policy

Module epsilon_greedy_policy. Implements epsilon-greedy policy with various decay options

class epsilon_greedy_policy.EpsilonDecayOption(value)

Options for reducing epsilon

class epsilon_greedy_policy.EpsilonGreedyConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)

Configuration class for EpsilonGreedyPolicy

class epsilon_greedy_policy.EpsilonGreedyPolicy(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)

Epsilon-greedy policy implementation

__call__(q_table: QTable, state: State) int

Execute the policy

Parameters
  • q_table (The q-table to use) –

  • state (The state observed) –

Return type

An integer representing the action index

__init__(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)

Constructor. Initialize a policy with the given options

Parameters
  • eps (The initial epsilon) –

  • n_actions (How many actions the environment assumes) –

  • decay_op (How to decay epsilon) –

  • max_eps (The maximum epsilon) –

  • min_eps (The minimum epsilon) –

  • epsilon_decay_factor (A decay factor used when decay_op = CONSTANT_RATE) –

  • user_defined_decrease_method (A user defined callable to decay epsilon) –

__str__() str

Returns the name of the policy

Return type

A string representing the name of the policy

actions_after_episode(episode_idx: int, **options) None

Any actions the policy should execute after the episode ends

Parameters
  • episode_idx (The episode index) –

  • options (Any options passed by the client code) –

Return type

None

classmethod from_config(config: EpsilonGreedyConfig)

Construct a policy from the given configuration

Parameters

config (The configuration to use) –

Return type

An instance of EpsilonGreedyPolicy class

on_state(state: State) int

Returns the optimal action on the current state

Parameters

state (The state observed) –

Return type

An integer representing the action index