q_learning

Simple Q-learning algorithm

class q_learning.QLearnConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)

Configuration for Q-learning

class q_learning.QLearning(algo_config: QLearnConfig)

Q-learning algorithm implementation

__init__(algo_config: QLearnConfig)

Constructor. Construct an instance of the algorithm by passing the configuration parameters

Parameters

algo_config (The configuration parameters) –

_do_train(env: Env, episode_idx: int, **option) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The index of the training episode) –

  • options (Any keyword based options passed by the client code) –

Return type

An instance of EpisodeInfo

_update_q_table(state: int, action: int, n_actions: int, reward: float, next_state: Optional[int] = None) None

Update the tabular state-action function

Parameters
  • state (State observed) –

  • action (The action taken) –

  • n_actions (Number of actions in the data set) –

  • reward (The reward observed) –

  • next_state (The next state observed) –

Return type

None

actions_after_episode_ends(env: Env, episode_idx: int, **options) None

Execute any actions the algorithm needs after the episode ends

Parameters
  • env (The environment that training occurs) –

  • episode_idx (The episode index) –

  • options (Any options passed by the client code) –

Return type

None

actions_before_training(env: Env, **options) None

Any actions before training begins

Parameters
  • env (The environment that training occurs) –

  • options (Any options passed by the client code) –

Return type

None

on_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The index of the training episode) –

  • options (Any keyword based options passed by the client code) –

Return type

An instance of EpisodeInfo

play(env: Env, stop_criterion: Criterion) None

Play the agent on the environment. This should produce a distorted dataset

Parameters
  • env (The environment to) –

  • stop_criterion (The criteria to use to stop) –

Return type

None