q_learning¶
Simple Q-learning algorithm
- class q_learning.QLearnConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)¶
Configuration for Q-learning
- class q_learning.QLearning(algo_config: QLearnConfig)¶
Q-learning algorithm implementation
- __init__(algo_config: QLearnConfig)¶
Constructor. Construct an instance of the algorithm by passing the configuration parameters
- Parameters
algo_config (The configuration parameters) –
- _do_train(env: Env, episode_idx: int, **option) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- _update_q_table(state: int, action: int, n_actions: int, reward: float, next_state: Optional[int] = None) None ¶
Update the tabular state-action function
- Parameters
state (State observed) –
action (The action taken) –
n_actions (Number of actions in the data set) –
reward (The reward observed) –
next_state (The next state observed) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Execute any actions the algorithm needs after the episode ends
- Parameters
env (The environment that training occurs) –
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training(env: Env, **options) None ¶
Any actions before training begins
- Parameters
env (The environment that training occurs) –
options (Any options passed by the client code) –
- Return type
None
- on_episode(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- play(env: Env, stop_criterion: Criterion) None ¶
Play the agent on the environment. This should produce a distorted dataset
- Parameters
env (The environment to) –
stop_criterion (The criteria to use to stop) –
- Return type
None