Class Agent

An agent is an actor that can observe its environment, decide on the best course of action using those observations, and execute those actions within the environment.

Inheritance

Object

Agent

Namespace: Unity.MLAgents

Syntax

public class Agent : MonoBehaviour, ISerializationCallbackReceiver, IActionReceiver, IHeuristicProvider

Remarks

Use the Agent class as the subclass for implementing your own agents. Add your Agent implementation to a GameObject in the Unity scene that serves as the agent's environment.

Agents in an environment operate in steps. At each step, an agent collects observations, passes them to its decision-making policy, and receives an action vector in response.

Agents make observations using ISensor implementations. The ML-Agents API provides implementations for visual observations (CameraSensor) raycast observations (RayPerceptionSensor), and arbitrary data observations (VectorSensor). You can add the CameraSensorComponent and RayPerceptionSensorComponent2D or RayPerceptionSensorComponent3D components to an agent's GameObject to use those sensor types. You can implement the CollectObservations(VectorSensor) function in your Agent subclass to use a vector observation. The Agent class calls this function before it uses the observation vector to make a decision. (If you only use visual or raycast observations, you do not need to implement CollectObservations(VectorSensor).)

Assign a decision making policy to an agent using a BehaviorParameters component attached to the agent's GameObject. The BehaviorType setting determines how decisions are made:

Default: decisions are made by the external process, when connected. Otherwise, decisions are made using inference. If no inference model is specified in the BehaviorParameters component, then heuristic decision making is used.
InferenceOnly: decisions are always made using the trained model specified in the BehaviorParameters component.
HeuristicOnly: when a decision is needed, the agent's Heuristic(ActionBuffers) function is called. Your implementation is responsible for providing the appropriate action.

To trigger an agent decision automatically, you can attach a DecisionRequester component to the Agent game object. You can also call the agent's RequestDecision() function manually. You only need to call RequestDecision() when the agent is in a position to act upon the decision. In many cases, this will be every FixedUpdate callback, but could be less frequent. For example, an agent that hops around its environment can only take an action when it touches the ground, so several frames might elapse between one decision and the need for the next.

Use the OnActionReceived(ActionBuffers) function to implement the actions your agent can take, such as moving to reach a goal or interacting with its environment.

When you call EndEpisode() on an agent or the agent reaches its MaxStep count, its current episode ends. You can reset the agent -- or remove it from the environment -- by implementing the OnEpisodeBegin() function. An agent also becomes done when the Academy resets the environment, which only happens when the Academy receives a reset signal from an external process via the Unity.MLAgents.Academy.Communicator.

The Agent class extends the Unity MonoBehaviour class. You can implement the standard MonoBehaviour functions as needed for your agent. Since an agent's observations and actions typically take place during the FixedUpdate phase, you should only use the MonoBehaviour.Update function for cosmetic purposes. If you override the MonoBehaviour methods, OnEnable() or OnDisable(), always call the base Agent class implementations.

You can implement the Heuristic(ActionBuffers) function to specify agent actions using your own heuristic algorithm. Implementing a heuristic function can be useful for debugging. For example, you can use keyboard input to select agent actions in order to manually control an agent's behavior.

Note that you can change the inference model assigned to an agent at any step by calling SetModel(String, NNModel, InferenceDevice).

See Agents and Reinforcement Learning in Unity in the Unity ML-Agents Toolkit manual for more information on creating and training agents.

For sample implementations of agent behavior, see the examples available in the Unity ML-Agents Toolkit on Github.

Fields

MaxStep

The maximum number of steps the agent takes before being done.

Declaration

public int MaxStep

Field Value

Type	Description
Int32	The maximum steps for an agent to take before it resets; or 0 for unlimited steps.

Remarks

The max step value determines the maximum length of an agent's episodes. Set to a positive integer to limit the episode length to that many steps. Set to 0 for unlimited episode length.

When an episode ends and a new one begins, the Agent object's OnEpisodeBegin() function is called. You can implement OnEpisodeBegin() to reset the agent or remove it from the environment. An agent's episode can also end if you call its EndEpisode() method or an external process resets the environment through the Academy.

Consider limiting the number of steps in an episode to avoid wasting time during training. If you set the max step value to a reasonable estimate of the time it should take to complete a task, then agents that haven’t succeeded in that time frame will reset and start a new training episode rather than continue to fail.

Examples

To use a step limit when training while allowing agents to run without resetting outside of training, you can set the max step to 0 in Initialize() if the Academy is not connected to an external process.

using Unity.MLAgents;

public class MyAgent : Agent
{
    public override void Initialize()
    {
        if (!Academy.Instance.IsCommunicatorOn)
        {
            this.MaxStep = 0;
        }
    }
}

Note: in general, you should limit the differences between the code you execute during training and the code you run during inference.

Properties

CompletedEpisodes

Returns the number of episodes that the Agent has completed (either EndEpisode() was called, or maxSteps was reached).

Declaration

public int CompletedEpisodes { get; }

Property Value

Type	Description
Int32	Current episode count.

StepCount

Returns the current step counter (within the current episode).

Declaration

public int StepCount { get; }

Property Value

Type	Description
Int32	Current step count.

Methods

AddReward(Single)

Increments the step and episode rewards by the provided value.

Declaration

public void AddReward(float increment)

Parameters

Type	Name	Description
Single	increment	Incremental reward value.

Remarks

Use a positive reward to reinforce desired behavior. You can use a negative reward to penalize mistakes. Use SetReward(Single) to set the reward assigned to the current step with a specific value rather than increasing or decreasing it.

Typically, you assign rewards in the Agent subclass's OnActionReceived(ActionBuffers) implementation after carrying out the received action and evaluating its success.

Rewards are used during reinforcement learning; they are ignored during inference.

See Agents - Rewards for general advice on implementing rewards and Reward Signals for information about mixing reward signals from curiosity and Generative Adversarial Imitation Learning (GAIL) with rewards supplied through this method.

CollectObservations(VectorSensor)

Implement CollectObservations() to collect the vector observations of the agent for the step. The agent observation describes the current environment from the perspective of the agent.

Declaration

public virtual void CollectObservations(VectorSensor sensor)

Parameters

Type	Name	Description
VectorSensor	sensor	The vector observations for the agent.

Remarks

An agent's observation is any environment information that helps the agent achieve its goal. For example, for a fighting agent, its observation could include distances to friends or enemies, or the current level of ammunition at its disposal.

You can use a combination of vector, visual, and raycast observations for an agent. If you only use visual or raycast observations, you do not need to implement a CollectObservations() function.

Add vector observations to the sensor parameter passed to this method by calling the VectorSensor helper methods:

You can use any combination of these helper functions to build the agent's vector of observations. You must build the vector in the same order each time CollectObservations() is called and the length of the vector must always be the same. In addition, the length of the observation must match the VectorObservationSize attribute of the linked Brain, which is set in the Editor on the Behavior Parameters component attached to the agent's GameObject.

For more information about observations, see Observations and Sensors.

EndEpisode()

Sets the done flag to true and resets the agent.

Declaration

public void EndEpisode()

Remarks

This should be used when the episode can no longer continue, such as when the Agent reaches the goal or fails at the task.

Declaration

public void EpisodeInterrupted()

Remarks

This should be used when the episode could continue, but has gone on for a sufficient number of steps.

GetCumulativeReward()

Retrieves the episode reward for the Agent.

Declaration

public float GetCumulativeReward()

Returns

Type	Description
Single	The episode reward.

GetObservations()

Returns a read-only view of the observations that were generated in CollectObservations(VectorSensor). This is mainly useful inside of a Heuristic(ActionBuffers) method to avoid recomputing the observations.

Declaration

public ReadOnlyCollection<float> GetObservations()

Returns

Type	Description
ReadOnlyCollection<Single>	A read-only view of the observations list.

GetStoredActionBuffers()

Gets the most recent ActionBuffer for this agent.

Declaration

public ActionBuffers GetStoredActionBuffers()

Returns

Type	Description
ActionBuffers	The most recent ActionBuffer for this agent

Heuristic(ActionBuffers)

Implement Heuristic(ActionBuffers) to choose an action for this agent using a custom heuristic.

Declaration

public virtual void Heuristic(in ActionBuffers actionsOut)

Parameters

Type	Name	Description
ActionBuffers	actionsOut	The ActionBuffers which contain the continuous and discrete action buffers to write to.

Implements

IHeuristicProvider.Heuristic(ActionBuffers)

Remarks

Implement this function to provide custom decision making logic or to support manual control of an agent using keyboard, mouse, game controller input, or a script.

Your heuristic implementation can use any decision making logic you specify. Assign decision values to the ContinuousActions and DiscreteActions arrays , passed to your function as a parameter. The same array will be reused between steps. It is up to the user to initialize the values on each call, for example by calling Array.Clear(actionsOut, 0, actionsOut.Length);. Add values to the array at the same indexes as they are used in your OnActionReceived(ActionBuffers) function, which receives this array and implements the corresponding agent behavior. See Actions for more information about agent actions. Note : Do not create a new float array of action in the Heuristic() method, as this will prevent writing floats to the original action array.

An agent calls this Heuristic() function to make a decision when you set its behavior type to HeuristicOnly. The agent also calls this function if you set its behavior type to Default when the Academy is not connected to an external training process and you do not assign a trained model to the agent.

To perform imitation learning, implement manual control of the agent in the Heuristic() function so that you can record the demonstrations required for the imitation learning algorithms. (Attach a Demonstration Recorder component to the agent's GameObject to record the demonstration session to a file.)

Even when you don’t plan to use heuristic decisions for an agent or imitation learning, implementing a simple heuristic function can aid in debugging agent actions and interactions with its environment.

Examples

The following example illustrates a Heuristic() function that provides WASD-style keyboard control for an agent that can move in two dimensions as well as jump. See Input Manager for more information about the built-in Unity input functions. You can also use the Input System package, which provides a more flexible and configurable input system.

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        var continuousActionsOut = actionsOut.ContinuousActions;
        continuousActionsOut[0] = Input.GetAxis("Horizontal");
        continuousActionsOut[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
        continuousActionsOut[2] = Input.GetAxis("Vertical");
    }

Initialize()

Implement Initialize() to perform one-time initialization or set up of the Agent instance.

Declaration

public virtual void Initialize()

Remarks

Initialize() is called once when the agent is first enabled. If, for example, the Agent object needs references to other [GameObjects] in the scene, you can collect and store those references here.

Note that OnEpisodeBegin() is called at the start of each of the agent's "episodes". You can use that function for items that need to be reset for each episode.

LazyInitialize()

Initializes the agent. Can be safely called multiple times.

Declaration

public void LazyInitialize()

Remarks

This function calls your Initialize() implementation, if one exists.

OnActionReceived(ActionBuffers)

Implement OnActionReceived() to specify agent behavior at every step, based on the provided action.

Declaration

public virtual void OnActionReceived(ActionBuffers actions)

Parameters

Type	Name	Description
ActionBuffers	actions	Struct containing the buffers of actions to be executed at this step.

Implements

IActionReceiver.OnActionReceived(ActionBuffers)

Remarks

An action is passed to this function in the form of an ActionBuffers. Your implementation must use the array to direct the agent's behavior for the current step.

You decide how many elements you need in the ActionBuffers to control your agent and what each element means. For example, if you want to apply a force to move an agent around the environment, you can arbitrarily pick three values in ActionBuffers.ContinuousActions array to use as the force components. During training, the agent's policy learns to set those particular elements of the array to maximize the training rewards the agent receives. (Of course, if you implement a Heuristic(ActionBuffers) function, it must use the same elements of the action array for the same purpose since there is no learning involved.)

An Agent can use continuous and/or discrete actions. Configure this along with the size of the action array, in the BrainParameters of the agent's associated BehaviorParameters component.

When an agent uses continuous actions, the values in the ActionBuffers.ContinuousActions array are floating point numbers. You should clamp the values to the range, -1..1, to increase numerical stability during training.

When an agent uses discrete actions, the values in the ActionBuffers.DiscreteActions array are integers that each represent a specific, discrete action. For example, you could define a set of discrete actions such as:

0 = Do nothing
1 = Move one space left
2 = Move one space right
3 = Move one space up
4 = Move one space down

When making a decision, the agent picks one of the five actions and puts the corresponding integer value in the ActionBuffers.DiscreteActions array. For example, if the agent decided to move left, the ActionBuffers.DiscreteActions parameter would be an array with a single element with the value 1.

You can define multiple sets, or branches, of discrete actions to allow an agent to perform simultaneous, independent actions. For example, you could use one branch for movement and another branch for throwing a ball left, right, up, or down, to allow the agent to do both in the same step.

The ActionBuffers.DiscreteActions array of an agent with discrete actions contains one element for each branch. The value of each element is the integer representing the chosen action for that branch. The agent always chooses one action for each branch.

When you use the discrete actions, you can prevent the training process or the neural network model from choosing specific actions in a step by implementing the WriteDiscreteActionMask(IDiscreteActionMask) method. For example, if your agent is next to a wall, you could mask out any actions that would result in the agent trying to move into the wall.

For more information about implementing agent actions see Agents - Actions.

OnAfterDeserialize()

Called by Unity immediately after deserializing this object.

Declaration

public void OnAfterDeserialize()

Remarks

The Agent class uses OnAfterDeserialize() for internal housekeeping. Call the base class implementation if you need your own custom deserialization logic.

See OnAfterDeserialize for more information.

Examples

public new void OnAfterDeserialize()
{
    base.OnAfterDeserialize();
    // additional deserialization logic...
}

OnBeforeSerialize()

Called by Unity immediately before serializing this object.

Declaration

public void OnBeforeSerialize()

Remarks

The Agent class uses OnBeforeSerialize() for internal housekeeping. Call the base class implementation if you need your own custom serialization logic.

See OnBeforeSerialize for more information.

Examples

public new void OnBeforeSerialize()
{
    base.OnBeforeSerialize();
    // additional serialization logic...
}

OnDisable()

Called when the attached GameObject becomes disabled and inactive.

Declaration

protected virtual void OnDisable()

Remarks

Always call the base Agent class version of this function if you implement OnDisable() in your own Agent subclasses.

Examples

protected override void OnDisable()
{
    base.OnDisable();
    // additional OnDisable logic...
}

Declaration

protected virtual void OnEnable()

Remarks

This function initializes the Agent instance, if it hasn't been initialized yet. Always call the base Agent class version of this function if you implement OnEnable() in your own Agent subclasses.

Examples

protected override void OnEnable()
{
    base.OnEnable();
    // additional OnEnable logic...
}

OnEpisodeBegin()

Implement OnEpisodeBegin() to set up an Agent instance at the beginning of an episode.

Declaration

public virtual void OnEpisodeBegin()

RequestAction()

Requests an action for this agent.

Declaration

public void RequestAction()

Remarks

Call RequestAction() to repeat the previous action returned by the agent's most recent decision. A new decision is not requested. When you call this function, the Agent instance invokes OnActionReceived(ActionBuffers) with the existing action vector.

You can use RequestAction() in situations where an agent must take an action every update, but doesn't need to make a decision as often. For example, an agent that moves through its environment might need to apply an action to keep moving, but only needs to make a decision to change course or speed occasionally.

You can add a DecisionRequester component to the agent's GameObject to drive the agent's decision making and action frequency. When you use this component, do not call RequestAction() separately.

Note that RequestDecision() calls RequestAction(); you do not need to call both functions at the same time.

RequestDecision()

Requests a new decision for this agent.

Declaration

public void RequestDecision()

Remarks

Call RequestDecision() whenever an agent needs a decision. You often want to request a decision every environment step. However, if an agent cannot use the decision every step, then you can request a decision less frequently.

You can add a DecisionRequester component to the agent's GameObject to drive the agent's decision making. When you use this component, do not call RequestDecision() separately.

Note that this function calls RequestAction(); you do not need to call both functions at the same time.

ScaleAction(Single, Single, Single)

Scales continuous action from [-1, 1] to arbitrary range.

Declaration

protected static float ScaleAction(float rawAction, float min, float max)

Parameters

Type	Name	Description
Single	rawAction	The input action value.
Single	min	The minimum output value.
Single	max	The maximum output value.

Returns

Type	Description
Single	The `rawAction` scaled from [-1,1] to [`min`, `max`].

SetModel(String, NNModel, InferenceDevice)

Updates the Model assigned to this Agent instance.

Declaration

public void SetModel(string behaviorName, NNModel model, InferenceDevice inferenceDevice = default(InferenceDevice))

Parameters

Type	Name	Description
String	behaviorName	The identifier of the behavior. This will categorize the agent when training.
NNModel	model	The model to use for inference.
InferenceDevice	inferenceDevice	Define the device on which the model will be run.

Remarks

If the agent already has an assigned model, that model is replaced with the the provided one. However, if you call this function with arguments that are identical to the current parameters of the agent, then no changes are made.

Note: the behaviorName parameter is ignored when not training. The model and inferenceDevice parameters are ignored when not using inference.

SetReward(Single)

Overrides the current step reward of the agent and updates the episode reward accordingly.

Declaration

public void SetReward(float reward)

Parameters

Type	Name	Description
Single	reward	The new value of the reward.

Remarks

This function replaces any rewards given to the agent during the current step. Use AddReward(Single) to incrementally change the reward rather than overriding it.

Typically, you assign rewards in the Agent subclass's OnActionReceived(ActionBuffers) implementation after carrying out the received action and evaluating its success.

Rewards are used during reinforcement learning; they are ignored during inference.

WriteDiscreteActionMask(IDiscreteActionMask)

Implement WriteDiscreteActionMask() to collects the masks for discrete actions. When using discrete actions, the agent will not perform the masked action.

Declaration

public virtual void WriteDiscreteActionMask(IDiscreteActionMask actionMask)

Parameters

Type	Name	Description
IDiscreteActionMask	actionMask	The action mask for the agent.

Implements

IActionReceiver.WriteDiscreteActionMask(IDiscreteActionMask)

Remarks

When using Discrete Control, you can prevent the Agent from using a certain action by masking it with SetActionEnabled(Int32, Int32, Boolean).

See Agents - Actions for more information on masking actions.

Class Agent

Inheritance

Namespace: Unity.MLAgents

Syntax

Remarks

Fields

MaxStep

Declaration

Field Value

Remarks

Examples

Properties

CompletedEpisodes

Declaration

Property Value

StepCount

Declaration

Property Value

Methods

AddReward(Single)

Declaration

Parameters

Remarks

CollectObservations(VectorSensor)

Declaration

Parameters

Remarks

EndEpisode()

Declaration

Remarks

See Also

EpisodeInterrupted()

Declaration

Remarks

See Also

GetCumulativeReward()

Declaration

Returns

GetObservations()

Declaration

Returns

GetStoredActionBuffers()

Declaration

Returns

Heuristic(ActionBuffers)

Declaration

Parameters

Implements

Remarks

Examples

See Also

Initialize()

Declaration

Remarks

LazyInitialize()

Declaration

Remarks

OnActionReceived(ActionBuffers)

Declaration

Parameters

Implements

Remarks

OnAfterDeserialize()

Declaration

Remarks

Examples

OnBeforeSerialize()

Declaration

Remarks

Examples

OnDisable()

Declaration

Remarks

Examples

See Also

OnEnable()

Declaration

Remarks

Examples

OnEpisodeBegin()