semi_gradient_sarsa¶
Module semi_gradient_sarsa. Implements episodic semi-gradient SARSA for estimating the state-action value function. the im[plementation follows the algorithm at page 244 in the book by Sutton and Barto: Reinforcement Learning An Introduction second edition 2020
- class semi_gradient_sarsa.SemiGradSARSAConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)¶
Configuration class for semi-gradient SARSA algorithm
- class semi_gradient_sarsa.SemiGradSARSA(config: SemiGradSARSAConfig)¶
SemiGradSARSA class. Implements the semi-gradient SARSA algorithm as described
- __init__(config: SemiGradSARSAConfig) None ¶
- _do_train(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- _init() None ¶
Any initializations needed before starting the training
- Return type
None
- _validate() None ¶
Validate the state of the agent. Is called before any training begins to check that the starting state is sane
- Return type
None
- _weights_update(env: Env, state: State, action: Action, reward: float, next_state: State, next_action: Action, t: float = 1.0) None ¶
Update the weights due to the fact that the episode is finished
- Parameters
env (The environment instance that the training takes place) –
state (The current state) –
action (The action we took at state) –
reward (The reward observed when taking the given action when at the given state) –
next_state (The observed new state) –
next_action (The action to be executed in next_state) –
- Return type
None
- _weights_update_episode_done(env: Env, state: State, action: Action, reward: float, t: float = 1.0) None ¶
Update the weights of the underlying Q-estimator
- Parameters
state (The current state it is assumed to be a raw state) –
reward (The reward observed when taking the given action when at the given state) –
action (The action we took at the state) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Any actions after the training episode ends
- Parameters
env (The training environment) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None ¶
Any actions to perform before the episode begins
- Parameters
env (The instance of the training environment) –
episode_idx (The training episode index) –
options (Any keyword options passed by the client code) –
- Return type
None
- actions_before_training(env: Env, **options) None ¶
Specify any actions necessary before training begins
- Parameters
env (The environment to train on) –
options (Any key-value options passed by the client) –
- Return type
None
- on_episode(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- play(env: Env, stop_criterion: Criterion) None ¶
Play the agent on the environment. This should produce a distorted dataset
- Parameters
env (The environment to) –
stop_criterion (The criteria to use to stop) –
- Return type
None