epsilon_greedy_q_estimator

Module epsilon_greedy_q_estimator. Implements a q-estimator by assuming linear function approximation

class epsilon_greedy_q_estimator.EpsilonGreedyQEstimatorConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None, gamma: float = 1.0, alpha: float = 1.0, env: Optional[Env] = None)

Configuration class for EpsilonGreedyQEstimator

class epsilon_greedy_q_estimator.EpsilonGreedyQEstimator(config: EpsilonGreedyQEstimatorConfig)

Q-function estimator using an epsilon-greedy policy for action selection

__init__(config: EpsilonGreedyQEstimatorConfig)

Constructor. Initialize the estimator with a given configuration

Parameters

config (The instance configuration) –

initialize() None

Initialize the underlying weights

Return type

None

on_state(state: State) Action

Returns the action on the given state

Parameters

state (The state observed) –

Return type

An environment specific Action type

q_hat_value(state_action_vec: StateActionVec) float

Returns the :math: hat{q} approximate value for the given state-action vector

Parameters

state_action_vec (The state-action tiled vector) –

Return type

float