docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    On/Off Policy Trainer Documentation

    mlagents.trainers.trainer.on_policy_trainer

    OnPolicyTrainer Objects

    class OnPolicyTrainer(RLTrainer)
    

    The PPOTrainer is an implementation of the PPO algorithm.

    __init__

     | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)
    

    Responsible for collecting experiences and training an on-policy model.

    Arguments:

    • behavior_name: The name of the behavior associated with trainer config
    • reward_buff_cap: Max reward history to track in the reward buffer
    • trainer_settings: The parameters for the trainer.
    • training: Whether the trainer is set for training.
    • load: Whether the model should be loaded.
    • seed: The seed the model will be initialized with
    • artifact_path: The directory within which to store artifacts from this trainer.

    add_policy

     | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None
    

    Adds policy to trainer.

    Arguments:

    • parsed_behavior_id: Behavior identifiers that the policy should belong to.
    • policy: Policy to associate with name_behavior_id.

    mlagents.trainers.trainer.off_policy_trainer

    OffPolicyTrainer Objects

    class OffPolicyTrainer(RLTrainer)
    

    The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks.

    __init__

     | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)
    

    Responsible for collecting experiences and training an off-policy model.

    Arguments:

    • behavior_name: The name of the behavior associated with trainer config
    • reward_buff_cap: Max reward history to track in the reward buffer
    • trainer_settings: The parameters for the trainer.
    • training: Whether the trainer is set for training.
    • load: Whether the model should be loaded.
    • seed: The seed the model will be initialized with
    • artifact_path: The directory within which to store artifacts from this trainer.

    save_model

     | save_model() -> None
    

    Saves the final training model to memory Overrides the default to save the replay buffer.

    save_replay_buffer

     | save_replay_buffer() -> None
    

    Save the training buffer's update buffer to a pickle file.

    load_replay_buffer

     | load_replay_buffer() -> None
    

    Loads the last saved replay buffer from a file.

    add_policy

     | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None
    

    Adds policy to trainer.

    mlagents.trainers.trainer.rl_trainer

    RLTrainer Objects

    class RLTrainer(Trainer)
    

    This class is the base class for trainers that use Reward Signals.

    end_episode

     | end_episode() -> None
    

    A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

    create_optimizer

     | @abc.abstractmethod
     | create_optimizer() -> TorchOptimizer
    

    Creates an Optimizer object

    save_model

     | save_model() -> None
    

    Saves the policy associated with this trainer.

    advance

     | advance() -> None
    

    Steps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories.

    mlagents.trainers.trainer.trainer

    Trainer Objects

    class Trainer(abc.ABC)
    

    This class is the base class for the mlagents_envs.trainers

    __init__

     | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1)
    

    Responsible for collecting experiences and training a neural network model.

    Arguments:

    • brain_name: Brain name of brain to be trained.
    • trainer_settings: The parameters for the trainer (dictionary).
    • training: Whether the trainer is set for training.
    • artifact_path: The directory within which to store artifacts from this trainer
    • reward_buff_cap:

    stats_reporter

     | @property
     | stats_reporter()
    

    Returns the stats reporter associated with this Trainer.

    parameters

     | @property
     | parameters() -> TrainerSettings
    

    Returns the trainer parameters of the trainer.

    get_max_steps

     | @property
     | get_max_steps() -> int
    

    Returns the maximum number of steps. Is used to know when the trainer should be stopped.

    Returns:

    The maximum number of steps of the trainer

    get_step

     | @property
     | get_step() -> int
    

    Returns the number of steps the trainer has performed

    Returns:

    the step count of the trainer

    threaded

     | @property
     | threaded() -> bool
    

    Whether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.)

    should_still_train

     | @property
     | should_still_train() -> bool
    

    Returns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached.

    reward_buffer

     | @property
     | reward_buffer() -> Deque[float]
    

    Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer.

    Returns:

    the reward buffer.

    save_model

     | @abc.abstractmethod
     | save_model() -> None
    

    Saves model file(s) for the policy or policies associated with this trainer.

    end_episode

     | @abc.abstractmethod
     | end_episode()
    

    A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

    create_policy

     | @abc.abstractmethod
     | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy
    

    Creates a Policy object

    add_policy

     | @abc.abstractmethod
     | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None
    

    Adds policy to trainer.

    get_policy

     | get_policy(name_behavior_id: str) -> Policy
    

    Gets policy associated with name_behavior_id

    Arguments:

    • name_behavior_id: Fully qualified behavior name

    Returns:

    Policy associated with name_behavior_id

    advance

     | @abc.abstractmethod
     | advance() -> None
    

    Advances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues).

    publish_policy_queue

     | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None
    

    Adds a policy queue to the list of queues to publish to when this Trainer makes a policy update

    Arguments:

    • policy_queue: Policy queue to publish to.

    subscribe_trajectory_queue

     | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None
    

    Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from.

    Arguments:

    • trajectory_queue: Trajectory queue to read from.

    mlagents.trainers.settings

    deep_update_dict

    deep_update_dict(d: Dict, update_d: Mapping) -> None
    

    Similar to dict.update(), but works for nested dicts of dicts as well.

    RewardSignalSettings Objects

    @attr.s(auto_attribs=True)
    class RewardSignalSettings()
    

    structure

     | @staticmethod
     | structure(d: Mapping, t: type) -> Any
    

    Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes.

    ParameterRandomizationSettings Objects

    @attr.s(auto_attribs=True)
    class ParameterRandomizationSettings(abc.ABC)
    

    __str__

     | __str__() -> str
    

    Helper method to output sampler stats to console.

    structure

     | @staticmethod
     | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings"
    

    Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes.

    unstructure

     | @staticmethod
     | unstructure(d: "ParameterRandomizationSettings") -> Mapping
    

    Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure().

    apply

     | @abc.abstractmethod
     | apply(key: str, env_channel: EnvironmentParametersChannel) -> None
    

    Helper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method.

    Arguments:

    • key: environment parameter to be sampled
    • env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

    ConstantSettings Objects

    @attr.s(auto_attribs=True)
    class ConstantSettings(ParameterRandomizationSettings)
    

    __str__

     | __str__() -> str
    

    Helper method to output sampler stats to console.

    apply

     | apply(key: str, env_channel: EnvironmentParametersChannel) -> None
    

    Helper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method.

    Arguments:

    • key: environment parameter to be sampled
    • env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

    UniformSettings Objects

    @attr.s(auto_attribs=True)
    class UniformSettings(ParameterRandomizationSettings)
    

    __str__

     | __str__() -> str
    

    Helper method to output sampler stats to console.

    apply

     | apply(key: str, env_channel: EnvironmentParametersChannel) -> None
    

    Helper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method.

    Arguments:

    • key: environment parameter to be sampled
    • env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

    GaussianSettings Objects

    @attr.s(auto_attribs=True)
    class GaussianSettings(ParameterRandomizationSettings)
    

    __str__

     | __str__() -> str
    

    Helper method to output sampler stats to console.

    apply

     | apply(key: str, env_channel: EnvironmentParametersChannel) -> None
    

    Helper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method.

    Arguments:

    • key: environment parameter to be sampled
    • env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

    MultiRangeUniformSettings Objects

    @attr.s(auto_attribs=True)
    class MultiRangeUniformSettings(ParameterRandomizationSettings)
    

    __str__

     | __str__() -> str
    

    Helper method to output sampler stats to console.

    apply

     | apply(key: str, env_channel: EnvironmentParametersChannel) -> None
    

    Helper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method.

    Arguments:

    • key: environment parameter to be sampled
    • env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

    CompletionCriteriaSettings Objects

    @attr.s(auto_attribs=True)
    class CompletionCriteriaSettings()
    

    CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start.

    need_increment

     | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float]
    

    Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value.

    Lesson Objects

    @attr.s(auto_attribs=True)
    class Lesson()
    

    Gathers the data of one lesson for one environment parameter including its name, the condition that must be fulfilled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum.

    EnvironmentParameterSettings Objects

    @attr.s(auto_attribs=True)
    class EnvironmentParameterSettings()
    

    EnvironmentParameterSettings is an ordered list of lessons for one environment parameter.

    structure

     | @staticmethod
     | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"]
    

    Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

    TrainerSettings Objects

    @attr.s(auto_attribs=True)
    class TrainerSettings(ExportableSettings)
    

    structure

     | @staticmethod
     | structure(d: Mapping, t: type) -> Any
    

    Helper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

    CheckpointSettings Objects

    @attr.s(auto_attribs=True)
    class CheckpointSettings()
    

    prioritize_resume_init

     | prioritize_resume_init() -> None
    

    Prioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume

    RunOptions Objects

    @attr.s(auto_attribs=True)
    class RunOptions(ExportableSettings)
    

    from_argparse

     | @staticmethod
     | from_argparse(args: argparse.Namespace) -> "RunOptions"
    

    Takes an argparse.Namespace as specified in parse_command_line, loads input configuration files from file paths, and converts to a RunOptions instance.

    Arguments:

    • args: collection of command-line parameters passed to mlagents-learn

    Returns:

    RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.

    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)