epsilon_greedy_policy¶
Module epsilon_greedy_policy. Implements epsilon-greedy policy with various decay options
- class epsilon_greedy_policy.EpsilonDecayOption(value)¶
Options for reducing epsilon
- class epsilon_greedy_policy.EpsilonGreedyConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Configuration class for EpsilonGreedyPolicy
- class epsilon_greedy_policy.EpsilonGreedyPolicy(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Epsilon-greedy policy implementation
- __call__(q_table: QTable, state: State) int ¶
Execute the policy
- Parameters
q_table (The q-table to use) –
state (The state observed) –
- Return type
An integer representing the action index
- __init__(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Constructor. Initialize a policy with the given options
- Parameters
eps (The initial epsilon) –
n_actions (How many actions the environment assumes) –
decay_op (How to decay epsilon) –
max_eps (The maximum epsilon) –
min_eps (The minimum epsilon) –
epsilon_decay_factor (A decay factor used when decay_op = CONSTANT_RATE) –
user_defined_decrease_method (A user defined callable to decay epsilon) –
- __str__() str ¶
Returns the name of the policy
- Return type
A string representing the name of the policy
- actions_after_episode(episode_idx: int, **options) None ¶
Any actions the policy should execute after the episode ends
- Parameters
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- classmethod from_config(config: EpsilonGreedyConfig)¶
Construct a policy from the given configuration
- Parameters
config (The configuration to use) –
- Return type
An instance of EpsilonGreedyPolicy class
- on_state(state: State) int ¶
Returns the optimal action on the current state
- Parameters
state (The state observed) –
- Return type
An integer representing the action index