a2c

a2c.calculate_discounted_returns(rewards: array, discounts: array, n_workers: int = 1) array

Calculate the discounted returns from the episode rewards

Parameters
  • rewards (The list of rewards) –

  • discounts (The discount factor) –

  • n_workers (The number of workers) –

a2c.create_discounts_array(end: int, base: float, start=0, endpoint=False)

Create an array of floating point numbers in [start, end) with the given base

Parameters
  • end

  • base

  • start

  • endpoint

class a2c.A2CConfig(gamma: float = 0.99, tau: float = 0.1, beta: Optional[float] = None, policy_loss_weight: float = 1.0, value_loss_weight: float = 1.0, max_grad_norm: float = 1.0, n_iterations_per_episode: int = 100, n_workers: int = 1, batch_size: int = 0, normalize_advantages: bool = True, device: str = 'cpu', action_sampler: Optional[Callable] = None, a2cnet: Optional[Module] = None, save_model_path: Optional[Path] = None, optimizer_config: Optional[PyTorchOptimizerConfig] = None)

Configuration for A2C algorithm

class a2c._ActResult(logprobs: torch.Tensor, values: torch.Tensor, actions: torch.Tensor, entropies: torch.Tensor)
class a2c.A2C(config: A2CConfig)
__init__(config: A2CConfig)
_do_train(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode. In fact this method simply plays the environment to collect batches

Parameters
  • env (The environment to train on) –

  • episode_idx (The index of the training episode) –

  • options (Any keyword based options passed by the client code) –

Return type

An instance of EpisodeInfo

classmethod from_path(config: A2CConfig, path: Path)

Load the A2C model parameters from the given path

Parameters
  • config (The configuration of the algorithm) –

  • path (The path to load the parameters) –

Return type

An instance of A2C class

on_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The index of the training episode) –

  • options (Any keyword based options passed by the client code) –

Return type

An instance of EpisodeInfo

parameters() Any

The parameters of the underlying model

Return type

An array with the model parameters