Class Agent
An agent is an actor that can observe its environment, decide on the best course of action using those observations, and execute those actions within the environment.
Namespace: Unity.MLAgents
Syntax
public class Agent : MonoBehaviour, ISerializationCallbackReceiver, IActionReceiver, IHeuristicProvider
Remarks
Use the Agent class as the subclass for implementing your own agents. Add your Agent implementation to a GameObject in the Unity scene that serves as the agent's environment.
Agents in an environment operate in steps. At each step, an agent collects observations, passes them to its decision-making policy, and receives an action vector in response.
Agents make observations using ISensor implementations. The ML-Agents API provides implementations for visual observations (CameraSensor) raycast observations (RayPerceptionSensor), and arbitrary data observations (VectorSensor). You can add the CameraSensorComponent and RayPerceptionSensorComponent2D or RayPerceptionSensorComponent3D components to an agent's GameObject to use those sensor types. You can implement the CollectObservations(VectorSensor) function in your Agent subclass to use a vector observation. The Agent class calls this function before it uses the observation vector to make a decision. (If you only use visual or raycast observations, you do not need to implement CollectObservations(VectorSensor).)
Assign a decision making policy to an agent using a BehaviorParameters component attached to the agent's GameObject. The BehaviorType setting determines how decisions are made:
- Default: decisions are made by the external process, when connected. Otherwise, decisions are made using inference. If no inference model is specified in the BehaviorParameters component, then heuristic decision making is used.
- InferenceOnly: decisions are always made using the trained model specified in the BehaviorParameters component.
- HeuristicOnly: when a decision is needed, the agent's Heuristic(ActionBuffers) function is called. Your implementation is responsible for providing the appropriate action.
To trigger an agent decision automatically, you can attach a DecisionRequester component to the Agent game object. You can also call the agent's RequestDecision() function manually. You only need to call RequestDecision() when the agent is in a position to act upon the decision. In many cases, this will be every FixedUpdate callback, but could be less frequent. For example, an agent that hops around its environment can only take an action when it touches the ground, so several frames might elapse between one decision and the need for the next.
Use the OnActionReceived(ActionBuffers) function to implement the actions your agent can take, such as moving to reach a goal or interacting with its environment.
When you call EndEpisode() on an agent or the agent reaches its MaxStep count, its current episode ends. You can reset the agent -- or remove it from the environment -- by implementing the OnEpisodeBegin() function. An agent also becomes done when the Academy resets the environment, which only happens when the Academy receives a reset signal from an external process via the Unity.MLAgents.Academy.Communicator.
The Agent class extends the Unity MonoBehaviour class. You can implement the standard MonoBehaviour functions as needed for your agent. Since an agent's observations and actions typically take place during the FixedUpdate phase, you should only use the MonoBehaviour.Update function for cosmetic purposes. If you override the MonoBehaviour methods, OnEnable() or OnDisable(), always call the base Agent class implementations.
You can implement the Heuristic(ActionBuffers) function to specify agent actions using your own heuristic algorithm. Implementing a heuristic function can be useful for debugging. For example, you can use keyboard input to select agent actions in order to manually control an agent's behavior.
Note that you can change the inference model assigned to an agent at any step by calling SetModel(String, NNModel, InferenceDevice).
See Agents and Reinforcement Learning in Unity in the Unity ML-Agents Toolkit manual for more information on creating and training agents.
For sample implementations of agent behavior, see the examples available in the Unity ML-Agents Toolkit on Github.
Fields
MaxStep
The maximum number of steps the agent takes before being done.
Declaration
public int MaxStep
Field Value
Type | Description |
---|---|
Int32 | The maximum steps for an agent to take before it resets; or 0 for unlimited steps. |
Remarks
The max step value determines the maximum length of an agent's episodes. Set to a positive integer to limit the episode length to that many steps. Set to 0 for unlimited episode length.
When an episode ends and a new one begins, the Agent object's OnEpisodeBegin() function is called. You can implement OnEpisodeBegin() to reset the agent or remove it from the environment. An agent's episode can also end if you call its EndEpisode() method or an external process resets the environment through the Academy.
Consider limiting the number of steps in an episode to avoid wasting time during training. If you set the max step value to a reasonable estimate of the time it should take to complete a task, then agents that haven’t succeeded in that time frame will reset and start a new training episode rather than continue to fail.
Examples
To use a step limit when training while allowing agents to run without resetting outside of training, you can set the max step to 0 in Initialize() if the Academy is not connected to an external process.
using Unity.MLAgents;
public class MyAgent : Agent
{
public override void Initialize()
{
if (!Academy.Instance.IsCommunicatorOn)
{
this.MaxStep = 0;
}
}
}
Note: in general, you should limit the differences between the code you execute during training and the code you run during inference.
Properties
CompletedEpisodes
Returns the number of episodes that the Agent has completed (either EndEpisode() was called, or maxSteps was reached).
Declaration
public int CompletedEpisodes { get; }
Property Value
Type | Description |
---|---|
Int32 | Current episode count. |
StepCount
Returns the current step counter (within the current episode).
Declaration
public int StepCount { get; }
Property Value
Type | Description |
---|---|
Int32 | Current step count. |
Methods
AddReward(Single)
Increments the step and episode rewards by the provided value.
Declaration
public void AddReward(float increment)
Parameters
Type | Name | Description |
---|---|---|
Single | increment | Incremental reward value. |
Remarks
Use a positive reward to reinforce desired behavior. You can use a negative reward to penalize mistakes. Use SetReward(Single) to set the reward assigned to the current step with a specific value rather than increasing or decreasing it.
Typically, you assign rewards in the Agent subclass's OnActionReceived(ActionBuffers) implementation after carrying out the received action and evaluating its success.
Rewards are used during reinforcement learning; they are ignored during inference.
See Agents - Rewards for general advice on implementing rewards and Reward Signals for information about mixing reward signals from curiosity and Generative Adversarial Imitation Learning (GAIL) with rewards supplied through this method.
CollectObservations(VectorSensor)
Implement CollectObservations()
to collect the vector observations of
the agent for the step. The agent observation describes the current
environment from the perspective of the agent.
Declaration
public virtual void CollectObservations(VectorSensor sensor)
Parameters
Type | Name | Description |
---|---|---|
VectorSensor | sensor | The vector observations for the agent. |
Remarks
An agent's observation is any environment information that helps the agent achieve its goal. For example, for a fighting agent, its observation could include distances to friends or enemies, or the current level of ammunition at its disposal.
You can use a combination of vector, visual, and raycast observations for an
agent. If you only use visual or raycast observations, you do not need to
implement a CollectObservations()
function.
Add vector observations to the sensor
parameter passed to
this method by calling the VectorSensor helper methods:
- AddObservation(Int32)
- AddObservation(Single)
- AddObservation(Vector3)
- AddObservation(Vector2)
- AddObservation(Quaternion)
- AddObservation(Boolean)
- AddObservation(IList<Single>)
- AddOneHotObservation(Int32, Int32)
You can use any combination of these helper functions to build the agent's
vector of observations. You must build the vector in the same order
each time CollectObservations()
is called and the length of the vector
must always be the same. In addition, the length of the observation must
match the VectorObservationSize
attribute of the linked Brain, which is set in the Editor on the
Behavior Parameters component attached to the agent's GameObject.
For more information about observations, see Observations and Sensors.
EndEpisode()
Sets the done flag to true and resets the agent.
Declaration
public void EndEpisode()
Remarks
This should be used when the episode can no longer continue, such as when the Agent reaches the goal or fails at the task.
See Also
EpisodeInterrupted()
Indicate that the episode is over but not due to the "fault" of the Agent. This has the same end result as calling EndEpisode(), but has a slightly different effect on training.
Declaration
public void EpisodeInterrupted()
Remarks
This should be used when the episode could continue, but has gone on for a sufficient number of steps.
See Also
GetCumulativeReward()
Retrieves the episode reward for the Agent.
Declaration
public float GetCumulativeReward()
Returns
Type | Description |
---|---|
Single | The episode reward. |
GetObservations()
Returns a read-only view of the observations that were generated in CollectObservations(VectorSensor). This is mainly useful inside of a Heuristic(ActionBuffers) method to avoid recomputing the observations.
Declaration
public ReadOnlyCollection<float> GetObservations()
Returns
Type | Description |
---|---|
ReadOnlyCollection<Single> | A read-only view of the observations list. |
GetStoredActionBuffers()
Gets the most recent ActionBuffer for this agent.
Declaration
public ActionBuffers GetStoredActionBuffers()
Returns
Type | Description |
---|---|
ActionBuffers | The most recent ActionBuffer for this agent |
Heuristic(ActionBuffers)
Implement Heuristic(ActionBuffers) to choose an action for this agent using a custom heuristic.
Declaration
public virtual void Heuristic(in ActionBuffers actionsOut)
Parameters
Type | Name | Description |
---|---|---|
ActionBuffers | actionsOut | The ActionBuffers which contain the continuous and discrete action buffers to write to. |
Implements
Remarks
Implement this function to provide custom decision making logic or to support manual control of an agent using keyboard, mouse, game controller input, or a script.
Your heuristic implementation can use any decision making logic you specify. Assign decision
values to the ContinuousActions and DiscreteActions
arrays , passed to your function as a parameter.
The same array will be reused between steps. It is up to the user to initialize
the values on each call, for example by calling Array.Clear(actionsOut, 0, actionsOut.Length);
.
Add values to the array at the same indexes as they are used in your
OnActionReceived(ActionBuffers) function, which receives this array and
implements the corresponding agent behavior. See Actions for more information
about agent actions.
Note : Do not create a new float array of action in the Heuristic()
method,
as this will prevent writing floats to the original action array.
An agent calls this Heuristic()
function to make a decision when you set its behavior
type to HeuristicOnly. The agent also calls this function if
you set its behavior type to Default when the
Academy is not connected to an external training process and you do not
assign a trained model to the agent.
To perform imitation learning, implement manual control of the agent in the Heuristic()
function so that you can record the demonstrations required for the imitation learning
algorithms. (Attach a Demonstration Recorder component to the agent's GameObject to
record the demonstration session to a file.)
Even when you don’t plan to use heuristic decisions for an agent or imitation learning, implementing a simple heuristic function can aid in debugging agent actions and interactions with its environment.
Examples
The following example illustrates a Heuristic()
function that provides WASD-style
keyboard control for an agent that can move in two dimensions as well as jump. See
Input Manager for more information about the built-in Unity input functions.
You can also use the Input System package, which provides a more flexible and
configurable input system.
public override void Heuristic(in ActionBuffers actionsOut)
{
var continuousActionsOut = actionsOut.ContinuousActions;
continuousActionsOut[0] = Input.GetAxis("Horizontal");
continuousActionsOut[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
continuousActionsOut[2] = Input.GetAxis("Vertical");
}
See Also
Initialize()
Implement Initialize()
to perform one-time initialization or set up of the
Agent instance.
Declaration
public virtual void Initialize()
Remarks
Initialize()
is called once when the agent is first enabled. If, for example,
the Agent object needs references to other [GameObjects] in the scene, you
can collect and store those references here.
Note that OnEpisodeBegin() is called at the start of each of the agent's "episodes". You can use that function for items that need to be reset for each episode.
LazyInitialize()
Initializes the agent. Can be safely called multiple times.
Declaration
public void LazyInitialize()
Remarks
This function calls your Initialize() implementation, if one exists.
OnActionReceived(ActionBuffers)
Implement OnActionReceived()
to specify agent behavior at every step, based
on the provided action.
Declaration
public virtual void OnActionReceived(ActionBuffers actions)
Parameters
Type | Name | Description |
---|---|---|
ActionBuffers | actions | Struct containing the buffers of actions to be executed at this step. |
Implements
Remarks
An action is passed to this function in the form of an ActionBuffers. Your implementation must use the array to direct the agent's behavior for the current step.
You decide how many elements you need in the ActionBuffers to control your agent and what each element means. For example, if you want to apply a force to move an agent around the environment, you can arbitrarily pick three values in ActionBuffers.ContinuousActions array to use as the force components. During training, the agent's policy learns to set those particular elements of the array to maximize the training rewards the agent receives. (Of course, if you implement a Heuristic(ActionBuffers) function, it must use the same elements of the action array for the same purpose since there is no learning involved.)
An Agent can use continuous and/or discrete actions. Configure this along with the size of the action array, in the BrainParameters of the agent's associated BehaviorParameters component.
When an agent uses continuous actions, the values in the ActionBuffers.ContinuousActions array are floating point numbers. You should clamp the values to the range, -1..1, to increase numerical stability during training.
When an agent uses discrete actions, the values in the ActionBuffers.DiscreteActions array are integers that each represent a specific, discrete action. For example, you could define a set of discrete actions such as:
0 = Do nothing
1 = Move one space left
2 = Move one space right
3 = Move one space up
4 = Move one space down
When making a decision, the agent picks one of the five actions and puts the corresponding integer value in the ActionBuffers.DiscreteActions array. For example, if the agent decided to move left, the ActionBuffers.DiscreteActions parameter would be an array with a single element with the value 1.
You can define multiple sets, or branches, of discrete actions to allow an agent to perform simultaneous, independent actions. For example, you could use one branch for movement and another branch for throwing a ball left, right, up, or down, to allow the agent to do both in the same step.
The ActionBuffers.DiscreteActions array of an agent with discrete actions contains one element for each branch. The value of each element is the integer representing the chosen action for that branch. The agent always chooses one action for each branch.
When you use the discrete actions, you can prevent the training process or the neural network model from choosing specific actions in a step by implementing the WriteDiscreteActionMask(IDiscreteActionMask) method. For example, if your agent is next to a wall, you could mask out any actions that would result in the agent trying to move into the wall.
For more information about implementing agent actions see Agents - Actions.
OnAfterDeserialize()
Called by Unity immediately after deserializing this object.
Declaration
public void OnAfterDeserialize()
Remarks
The Agent class uses OnAfterDeserialize() for internal housekeeping. Call the base class implementation if you need your own custom deserialization logic.
See OnAfterDeserialize for more information.
Examples
public new void OnAfterDeserialize()
{
base.OnAfterDeserialize();
// additional deserialization logic...
}
OnBeforeSerialize()
Called by Unity immediately before serializing this object.
Declaration
public void OnBeforeSerialize()
Remarks
The Agent class uses OnBeforeSerialize() for internal housekeeping. Call the base class implementation if you need your own custom serialization logic.
See OnBeforeSerialize for more information.
Examples
public new void OnBeforeSerialize()
{
base.OnBeforeSerialize();
// additional serialization logic...
}
OnDisable()
Called when the attached GameObject becomes disabled and inactive.
Declaration
protected virtual void OnDisable()
Remarks
Always call the base Agent class version of this function if you implement OnDisable()
in your own Agent subclasses.
Examples
protected override void OnDisable()
{
base.OnDisable();
// additional OnDisable logic...
}
See Also
OnEnable()
Called when the attached GameObject becomes enabled and active.
Declaration
protected virtual void OnEnable()
Remarks
This function initializes the Agent instance, if it hasn't been initialized yet.
Always call the base Agent class version of this function if you implement OnEnable()
in your own Agent subclasses.
Examples
protected override void OnEnable()
{
base.OnEnable();
// additional OnEnable logic...
}
OnEpisodeBegin()
Implement OnEpisodeBegin()
to set up an Agent instance at the beginning
of an episode.
Declaration
public virtual void OnEpisodeBegin()
See Also
RequestAction()
Requests an action for this agent.
Declaration
public void RequestAction()
Remarks
Call RequestAction()
to repeat the previous action returned by the agent's
most recent decision. A new decision is not requested. When you call this function,
the Agent instance invokes OnActionReceived(ActionBuffers) with the
existing action vector.
You can use RequestAction()
in situations where an agent must take an action
every update, but doesn't need to make a decision as often. For example, an
agent that moves through its environment might need to apply an action to keep
moving, but only needs to make a decision to change course or speed occasionally.
You can add a DecisionRequester component to the agent's
GameObject to drive the agent's decision making and action frequency. When you
use this component, do not call RequestAction()
separately.
Note that RequestDecision() calls RequestAction()
; you do not need to
call both functions at the same time.
RequestDecision()
Requests a new decision for this agent.
Declaration
public void RequestDecision()
Remarks
Call RequestDecision()
whenever an agent needs a decision. You often
want to request a decision every environment step. However, if an agent
cannot use the decision every step, then you can request a decision less
frequently.
You can add a DecisionRequester component to the agent's
GameObject to drive the agent's decision making. When you use this component,
do not call RequestDecision()
separately.
Note that this function calls RequestAction(); you do not need to call both functions at the same time.
ScaleAction(Single, Single, Single)
Scales continuous action from [-1, 1] to arbitrary range.
Declaration
protected static float ScaleAction(float rawAction, float min, float max)
Parameters
Type | Name | Description |
---|---|---|
Single | rawAction | The input action value. |
Single | min | The minimum output value. |
Single | max | The maximum output value. |
Returns
Type | Description |
---|---|
Single | The |
SetModel(String, NNModel, InferenceDevice)
Updates the Model assigned to this Agent instance.
Declaration
public void SetModel(string behaviorName, NNModel model, InferenceDevice inferenceDevice = default(InferenceDevice))
Parameters
Type | Name | Description |
---|---|---|
String | behaviorName | The identifier of the behavior. This will categorize the agent when training. |
NNModel | model | The model to use for inference. |
InferenceDevice | inferenceDevice | Define the device on which the model will be run. |
Remarks
If the agent already has an assigned model, that model is replaced with the the provided one. However, if you call this function with arguments that are identical to the current parameters of the agent, then no changes are made.
Note: the behaviorName
parameter is ignored when not training.
The model
and inferenceDevice
parameters
are ignored when not using inference.
SetReward(Single)
Overrides the current step reward of the agent and updates the episode reward accordingly.
Declaration
public void SetReward(float reward)
Parameters
Type | Name | Description |
---|---|---|
Single | reward | The new value of the reward. |
Remarks
This function replaces any rewards given to the agent during the current step. Use AddReward(Single) to incrementally change the reward rather than overriding it.
Typically, you assign rewards in the Agent subclass's OnActionReceived(ActionBuffers) implementation after carrying out the received action and evaluating its success.
Rewards are used during reinforcement learning; they are ignored during inference.
See Agents - Rewards for general advice on implementing rewards and Reward Signals for information about mixing reward signals from curiosity and Generative Adversarial Imitation Learning (GAIL) with rewards supplied through this method.
WriteDiscreteActionMask(IDiscreteActionMask)
Implement WriteDiscreteActionMask()
to collects the masks for discrete
actions. When using discrete actions, the agent will not perform the masked
action.
Declaration
public virtual void WriteDiscreteActionMask(IDiscreteActionMask actionMask)
Parameters
Type | Name | Description |
---|---|---|
IDiscreteActionMask | actionMask | The action mask for the agent. |
Implements
Remarks
When using Discrete Control, you can prevent the Agent from using a certain action by masking it with SetActionEnabled(Int32, Int32, Boolean).
See Agents - Actions for more information on masking actions.