epsilon_greedy_q_estimator¶
Module epsilon_greedy_q_estimator. Implements a q-estimator by assuming linear function approximation
- class epsilon_greedy_q_estimator.EpsilonGreedyQEstimatorConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None, gamma: float = 1.0, alpha: float = 1.0, env: Optional[Env] = None)¶
Configuration class for EpsilonGreedyQEstimator
- class epsilon_greedy_q_estimator.EpsilonGreedyQEstimator(config: EpsilonGreedyQEstimatorConfig)¶
Q-function estimator using an epsilon-greedy policy for action selection
- __init__(config: EpsilonGreedyQEstimatorConfig)¶
Constructor. Initialize the estimator with a given configuration
- Parameters
config (The instance configuration) –
- initialize() None ¶
Initialize the underlying weights
- Return type
None
- on_state(state: State) Action ¶
Returns the action on the given state
- Parameters
state (The state observed) –
- Return type
An environment specific Action type
- q_hat_value(state_action_vec: StateActionVec) float ¶
Returns the :math: hat{q} approximate value for the given state-action vector
- Parameters
state_action_vec (The state-action tiled vector) –
- Return type
float