epsilon_greedy_q_estimator¶

Module epsilon_greedy_q_estimator. Implements a q-estimator by assuming linear function approximation

class epsilon_greedy_q_estimator.EpsilonGreedyQEstimatorConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None, gamma: float = 1.0, alpha: float = 1.0, env: Optional[Env] = None)¶: Configuration class for EpsilonGreedyQEstimator

class epsilon_greedy_q_estimator.EpsilonGreedyQEstimator(config: EpsilonGreedyQEstimatorConfig)¶

Q-function estimator using an epsilon-greedy policy for action selection

Constructor. Initialize the estimator with a given configuration

initialize() → None¶

Initialize the underlying weights

on_state(state: State) → Action¶

Returns the action on the given state

q_hat_value(state_action_vec: StateActionVec) → float¶

Returns the :math: hat{q} approximate value for the given state-action vector