On/Off Policy Trainer Documentation

mlagents.trainers.trainer.on_policy_trainer

OnPolicyTrainer Objects

class OnPolicyTrainer(RLTrainer)

The PPOTrainer is an implementation of the PPO algorithm.

init

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an on-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

add_policy

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

Arguments:

parsed_behavior_id: Behavior identifiers that the policy should belong to.
policy: Policy to associate with name_behavior_id.

mlagents.trainers.trainer.off_policy_trainer

OffPolicyTrainer Objects

class OffPolicyTrainer(RLTrainer)

The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks.

init

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an off-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

save_model

 | save_model() -> None

Saves the final training model to memory Overrides the default to save the replay buffer.

save_replay_buffer

 | save_replay_buffer() -> None

Save the training buffer's update buffer to a pickle file.

load_replay_buffer

 | load_replay_buffer() -> None

Loads the last saved replay buffer from a file.

add_policy

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

mlagents.trainers.trainer.rl_trainer

RLTrainer Objects

class RLTrainer(Trainer)

This class is the base class for trainers that use Reward Signals.

end_episode

 | end_episode() -> None

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_optimizer

 | @abc.abstractmethod
 | create_optimizer() -> TorchOptimizer

Creates an Optimizer object

save_model

 | save_model() -> None

Saves the policy associated with this trainer.

advance

 | advance() -> None

Steps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories.

mlagents.trainers.trainer.trainer

Trainer Objects

class Trainer(abc.ABC)

This class is the base class for the mlagents_envs.trainers

init

 | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1)

Responsible for collecting experiences and training a neural network model.

Arguments:

brain_name: Brain name of brain to be trained.
trainer_settings: The parameters for the trainer (dictionary).
training: Whether the trainer is set for training.
artifact_path: The directory within which to store artifacts from this trainer
reward_buff_cap:

stats_reporter

 | @property
 | stats_reporter()

Returns the stats reporter associated with this Trainer.

parameters

 | @property
 | parameters() -> TrainerSettings

Returns the trainer parameters of the trainer.

get_max_steps

 | @property
 | get_max_steps() -> int

Returns the maximum number of steps. Is used to know when the trainer should be stopped.

Returns:

The maximum number of steps of the trainer

get_step

 | @property
 | get_step() -> int

Returns the number of steps the trainer has performed

Returns:

the step count of the trainer

threaded

 | @property
 | threaded() -> bool

Whether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.)

should_still_train

 | @property
 | should_still_train() -> bool

Returns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached.

reward_buffer

 | @property
 | reward_buffer() -> Deque[float]

Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer.

Returns:

the reward buffer.

save_model

 | @abc.abstractmethod
 | save_model() -> None

Saves model file(s) for the policy or policies associated with this trainer.

end_episode

 | @abc.abstractmethod
 | end_episode()

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_policy

 | @abc.abstractmethod
 | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy

Creates a Policy object

add_policy

 | @abc.abstractmethod
 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

get_policy

 | get_policy(name_behavior_id: str) -> Policy

Gets policy associated with name_behavior_id

Arguments:

name_behavior_id: Fully qualified behavior name

Returns:

Policy associated with name_behavior_id

advance

 | @abc.abstractmethod
 | advance() -> None

Advances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues).

publish_policy_queue

 | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None

Adds a policy queue to the list of queues to publish to when this Trainer makes a policy update

Arguments:

policy_queue: Policy queue to publish to.

subscribe_trajectory_queue

 | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None

Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from.

Arguments:

trajectory_queue: Trajectory queue to read from.

mlagents.trainers.settings

deep_update_dict

deep_update_dict(d: Dict, update_d: Mapping) -> None

Similar to dict.update(), but works for nested dicts of dicts as well.

RewardSignalSettings Objects

@attr.s(auto_attribs=True)
class RewardSignalSettings()

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes.

ParameterRandomizationSettings Objects

@attr.s(auto_attribs=True)
class ParameterRandomizationSettings(abc.ABC)

str

 | __str__() -> str

Helper method to output sampler stats to console.

structure

 | @staticmethod
 | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings"

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes.

unstructure

 | @staticmethod
 | unstructure(d: "ParameterRandomizationSettings") -> Mapping

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure().

apply

 | @abc.abstractmethod
 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

ConstantSettings Objects

@attr.s(auto_attribs=True)
class ConstantSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

UniformSettings Objects

@attr.s(auto_attribs=True)
class UniformSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

GaussianSettings Objects

@attr.s(auto_attribs=True)
class GaussianSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

MultiRangeUniformSettings Objects

@attr.s(auto_attribs=True)
class MultiRangeUniformSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

CompletionCriteriaSettings Objects

@attr.s(auto_attribs=True)
class CompletionCriteriaSettings()

CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start.

need_increment

 | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float]

Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value.

Lesson Objects

@attr.s(auto_attribs=True)
class Lesson()

Gathers the data of one lesson for one environment parameter including its name, the condition that must be fulfilled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum.

EnvironmentParameterSettings Objects

@attr.s(auto_attribs=True)
class EnvironmentParameterSettings()

EnvironmentParameterSettings is an ordered list of lessons for one environment parameter.

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"]

Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

TrainerSettings Objects

@attr.s(auto_attribs=True)
class TrainerSettings(ExportableSettings)

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

CheckpointSettings Objects

@attr.s(auto_attribs=True)
class CheckpointSettings()

prioritize_resume_init

 | prioritize_resume_init() -> None

Prioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume

RunOptions Objects

@attr.s(auto_attribs=True)
class RunOptions(ExportableSettings)

from_argparse

 | @staticmethod
 | from_argparse(args: argparse.Namespace) -> "RunOptions"

Takes an argparse.Namespace as specified in parse_command_line, loads input configuration files from file paths, and converts to a RunOptions instance.

Arguments:

args: collection of command-line parameters passed to mlagents-learn

Returns:

RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.