q_learning¶

Simple Q-learning algorithm

class q_learning.QLearnConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)¶: Configuration for Q-learning

class q_learning.QLearning(algo_config: QLearnConfig)¶

Q-learning algorithm implementation

__init__(algo_config: QLearnConfig)¶

Constructor. Construct an instance of the algorithm by passing the configuration parameters

_do_train(env: Env, episode_idx: int, **option) → EpisodeInfo¶

Train the algorithm on the episode

Parameters

Return type

An instance of EpisodeInfo

_update_q_table(state: int, action: int, n_actions: int, reward: float, next_state: Optional[int] = None) → None¶

Update the tabular state-action function

Parameters

Return type

None

actions_after_episode_ends(env: Env, episode_idx: int, **options) → None¶

Execute any actions the algorithm needs after the episode ends

Parameters

Return type

None

actions_before_training(env: Env, **options) → None¶

Any actions before training begins

Parameters

Return type

None

on_episode(env: Env, episode_idx: int, **options) → EpisodeInfo¶

Train the algorithm on the episode

Parameters

Return type

An instance of EpisodeInfo

play(env: Env, stop_criterion: Criterion) → None¶

Play the agent on the environment. This should produce a distorted dataset

Parameters

Return type

None