Class Agent
An agent is an actor that can observe its environment, decide on the best course of action using those observations, and execute those actions within the environment.
Inherited Members
Namespace: Unity.MLAgents
Assembly: Unity.ML-Agents.dll
Syntax
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/docs/Learning-Environment-Design-Agents.md")]
[Serializable]
[RequireComponent(typeof(BehaviorParameters))]
[DefaultExecutionOrder(-50)]
public class Agent : MonoBehaviour, ISerializationCallbackReceiver, IActionReceiver, IHeuristicProvider
Remarks
Use the Agent class as the subclass for implementing your own agents. Add your Agent implementation to a GameObject in the Unity scene that serves as the agent's environment.
Agents in an environment operate in steps. At each step, an agent collects observations, passes them to its decision-making policy, and receives an action vector in response.
Agents make observations using ISensor implementations. The ML-Agents
API provides implementations for visual observations (Camera
Assign a decision making policy to an agent using a Behavior
Default: decisions are made by the external process,
when connected. Otherwise, decisions are made using inference. If no inference model
is specified in the BehaviorParameters component, then heuristic decision
making is used.
Inference
To trigger an agent decision automatically, you can attach a Decision
Use the On
When you call End
The Agent class extends the Unity MonoBehaviour class. You can implement the standard MonoBehaviour functions as needed for your agent. Since an agent's observations and actions typically take place during the FixedUpdate phase, you should only use the MonoBehaviour.Update function for cosmetic purposes. If you override the MonoBehaviour methods, OnEnable() or OnDisable(), always call the base Agent class implementations.
You can implement the Heuristic(in Action
Note that you can change the inference model assigned to an agent at any step
by calling Set
See Agents and Reinforcement Learning in Unity in the Unity ML-Agents Toolkit manual for more information on creating and training agents.
For sample implementations of agent behavior, see the examples available in the Unity ML-Agents Toolkit on Github.
Fields
MaxStep
The maximum number of steps the agent takes before being done.
Declaration
[FormerlySerializedAs("maxStep")]
[HideInInspector]
public int MaxStep
Field Value
Type | Description |
---|---|
int | The maximum steps for an agent to take before it resets; or 0 for unlimited steps. |
Remarks
The max step value determines the maximum length of an agent's episodes. Set to a positive integer to limit the episode length to that many steps. Set to 0 for unlimited episode length.
When an episode ends and a new one begins, the Agent object's
On
Consider limiting the number of steps in an episode to avoid wasting time during training. If you set the max step value to a reasonable estimate of the time it should take to complete a task, then agents that haven’t succeeded in that time frame will reset and start a new training episode rather than continue to fail.
Note: in general, you should limit the differences between the code you execute during training and the code you run during inference.
Examples
To use a step limit when training while allowing agents to run without resetting outside of training, you can set the max step to 0 in Initialize() if the Academy is not connected to an external process.
using Unity.MLAgents;
public class MyAgent : Agent
{
public override void Initialize()
{
if (!Academy.Instance.IsCommunicatorOn)
{
this.MaxStep = 0;
}
}
}
Properties
CompletedEpisodes
Returns the number of episodes that the Agent has completed (either End
Declaration
public int CompletedEpisodes { get; }
Property Value
Type | Description |
---|---|
int |
StepCount
Returns the current step counter (within the current episode).
Declaration
public int StepCount { get; }
Property Value
Type | Description |
---|---|
int | The current step count. |
Methods
AddReward(float)
Increments the step and episode rewards by the provided value.
Declaration
public void AddReward(float increment)
Parameters
Type | Name | Description |
---|---|---|
float | increment | Incremental reward value. |
Remarks
Use a positive reward to reinforce desired behavior. You can use a
negative reward to penalize mistakes. Use Set
Typically, you assign rewards in the Agent subclass's <xref href="Unity.MLAgents.Actuators.IActionReceiver.OnActionReceived(Unity.MLAgents.Actuators.ActionBuffers)" data-throw-if-not-resolved="false"></xref>
implementation after carrying out the received action and evaluating its success.
Rewards are used during reinforcement learning; they are ignored during inference.
See [Agents - Rewards] for general advice on implementing rewards and [Reward Signals]
for information about mixing reward signals from curiosity and Generative Adversarial
Imitation Learning (GAIL) with rewards supplied through this method.
[Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/docs/Learning-Environment-Design-Agents.md#rewards
[Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
Awake()
Called when the Agent is being loaded (before OnEnable()).
Declaration
protected virtual void Awake()
Remarks
This function registers the RpcCommunicator delegate if no delegate has been registered with CommunicatorFactory.
Always call the base Agent class version of this function if you implement Awake()
in your
own Agent subclasses.
Examples
protected override void Awake()
{
base.Awake();
// additional Awake logic...
}
CollectObservations(VectorSensor)
Implement CollectObservations()
to collect the vector observations of
the agent for the step. The agent observation describes the current
environment from the perspective of the agent.
Declaration
public virtual void CollectObservations(VectorSensor sensor)
Parameters
Type | Name | Description |
---|---|---|
Vector |
sensor | The vector observations for the agent. |
Remarks
An agent's observation is any environment information that helps the agent achieve its goal. For example, for a fighting agent, its observation could include distances to friends or enemies, or the current level of ammunition at its disposal.
You can use a combination of vector, visual, and raycast observations for an
agent. If you only use visual or raycast observations, you do not need to
implement a CollectObservations()
function.
Add vector observations to the sensor
parameter passed to
this method by calling the Vector
You can use any combination of these helper functions to build the agent's
vector of observations. You must build the vector in the same order
each time CollectObservations()
is called and the length of the vector
must always be the same. In addition, the length of the observation must
match the Vector
For more information about observations, see Observations and Sensors.
EndEpisode()
Sets the done flag to true and resets the agent.
Declaration
public void EndEpisode()
Remarks
This should be used when the episode can no longer continue, such as when the Agent reaches the goal or fails at the task.
See Also
EpisodeInterrupted()
Indicate that the episode is over but not due to the "fault" of the Agent.
This has the same end result as calling End
Declaration
public void EpisodeInterrupted()
Remarks
This should be used when the episode could continue, but has gone on for a sufficient number of steps.
See Also
GetCumulativeReward()
Retrieves the episode reward for the Agent.
Declaration
public float GetCumulativeReward()
Returns
Type | Description |
---|---|
float | The episode reward. |
GetObservations()
Returns a read-only view of the observations that were generated in
Collect
Declaration
public ReadOnlyCollection<float> GetObservations()
Returns
Type | Description |
---|---|
Read |
A read-only view of the observations list. |
GetStackedObservations()
Returns a read-only view of the stacked observations that were generated in
Collect
Declaration
public ReadOnlyCollection<float> GetStackedObservations()
Returns
Type | Description |
---|---|
Read |
A read-only view of the stacked observations list. |
GetStoredActionBuffers()
Gets the most recent ActionBuffer for this agent.
Declaration
public ActionBuffers GetStoredActionBuffers()
Returns
Type | Description |
---|---|
Action |
The most recent ActionBuffer for this agent |
Heuristic(in ActionBuffers)
Implement Heuristic(in Action
Declaration
public virtual void Heuristic(in ActionBuffers actionsOut)
Parameters
Type | Name | Description |
---|---|---|
Action |
actionsOut | The Action |
Remarks
Implement this function to provide custom decision making logic or to support manual control of an agent using keyboard, mouse, game controller input, or a script.
Your heuristic implementation can use any decision making logic you specify. Assign decision
values to the ContinuousArray.Clear(actionsOut, 0, actionsOut.Length);
.
Add values to the array at the same indexes as they are used in your
OnHeuristic()
method,
as this will prevent writing floats to the original action array.
An agent calls this Heuristic()
function to make a decision when you set its behavior
type to Heuristic
To perform imitation learning, implement manual control of the agent in the Heuristic()
function so that you can record the demonstrations required for the imitation learning
algorithms. (Attach a Demonstration Recorder component to the agent's GameObject to
record the demonstration session to a file.)
Even when you don’t plan to use heuristic decisions for an agent or imitation learning, implementing a simple heuristic function can aid in debugging agent actions and interactions with its environment.
Examples
The following example illustrates a `Heuristic()` function that provides WASD-style keyboard control for an agent that can move in two dimensions as well as jump. See [Input Manager] for more information about the built-in Unity input functions. You can also use the [Input System package], which provides a more flexible and configurable input system. [Input Manager]: https://docs.unity3d.com/Manual/class-InputManager.html [Input System package]: https://docs.unity3d.com/Packages/com.unity.inputsystem@1.0/manual/index.html
public override void Heuristic(in ActionBuffers actionsOut)
{
var continuousActionsOut = actionsOut.ContinuousActions;
continuousActionsOut[0] = Input.GetAxis("Horizontal");
continuousActionsOut[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
continuousActionsOut[2] = Input.GetAxis("Vertical");
}
See Also
Initialize()
Implement Initialize()
to perform one-time initialization or set up of the
Agent instance.
Declaration
public virtual void Initialize()
Remarks
Initialize()
is called once when the agent is first enabled. If, for example,
the Agent object needs references to other [GameObjects] in the scene, you
can collect and store those references here.
Note that On
LazyInitialize()
Initializes the agent. Can be safely called multiple times.
Declaration
public void LazyInitialize()
Remarks
This function calls your Initialize() implementation, if one exists.
OnActionReceived(ActionBuffers)
Implement OnActionReceived()
to specify agent behavior at every step, based
on the provided action.
Declaration
public virtual void OnActionReceived(ActionBuffers actions)
Parameters
Type | Name | Description |
---|---|---|
Action |
actions | Struct containing the buffers of actions to be executed at this step. |
Remarks
An action is passed to this function in the form of an Action
You decide how many elements you need in the ActionBuffers to control your
agent and what each element means. For example, if you want to apply a
force to move an agent around the environment, you can arbitrarily pick
three values in ActionBuffers.ContinuousActions array to use as the force components.
During training, the agent's policy learns to set those particular elements of
the array to maximize the training rewards the agent receives. (Of course,
if you implement a Heuristic(in Action
An Agent can use continuous and/or discrete actions. Configure this along with the size
of the action array, in the Brain
When an agent uses continuous actions, the values in the ActionBuffers.ContinuousActions array are floating point numbers. You should clamp the values to the range, -1..1, to increase numerical stability during training.
When an agent uses discrete actions, the values in the ActionBuffers.DiscreteActions array are integers that each represent a specific, discrete action. For example, you could define a set of discrete actions such as:
0 = Do nothing
1 = Move one space left
2 = Move one space right
3 = Move one space up
4 = Move one space down
When making a decision, the agent picks one of the five actions and puts the corresponding integer value in the ActionBuffers.DiscreteActions array. For example, if the agent decided to move left, the ActionBuffers.DiscreteActions parameter would be an array with a single element with the value 1.
You can define multiple sets, or branches, of discrete actions to allow an agent to perform simultaneous, independent actions. For example, you could use one branch for movement and another branch for throwing a ball left, right, up, or down, to allow the agent to do both in the same step.
The ActionBuffers.DiscreteActions array of an agent with discrete actions contains one element for each branch. The value of each element is the integer representing the chosen action for that branch. The agent always chooses one action for each branch.
When you use the discrete actions, you can prevent the training process
or the neural network model from choosing specific actions in a step by
implementing the Write
For more information about implementing agent actions see Agents - Actions.
OnAfterDeserialize()
Called by Unity immediately after deserializing this object.
Declaration
public void OnAfterDeserialize()
Remarks
The Agent class uses OnAfterDeserialize() for internal housekeeping. Call the base class implementation if you need your own custom deserialization logic.
See OnAfterDeserialize for more information.
Examples
public new void OnAfterDeserialize()
{
base.OnAfterDeserialize();
// additional deserialization logic...
}
OnBeforeSerialize()
Called by Unity immediately before serializing this object.
Declaration
public void OnBeforeSerialize()
Remarks
The Agent class uses OnBeforeSerialize() for internal housekeeping. Call the base class implementation if you need your own custom serialization logic.
See OnBeforeSerialize for more information.
Examples
public new void OnBeforeSerialize()
{
base.OnBeforeSerialize();
// additional serialization logic...
}
OnDisable()
Called when the attached [GameObject] becomes disabled and inactive. [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
Declaration
protected virtual void OnDisable()
Remarks
Always call the base Agent class version of this function if you implement OnDisable()
in your own Agent subclasses.
Examples
protected override void OnDisable()
{
base.OnDisable();
// additional OnDisable logic...
}
See Also
OnEnable()
Called when the attached [GameObject] becomes enabled and active. [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
Declaration
protected virtual void OnEnable()
Remarks
This function initializes the Agent instance, if it hasn't been initialized yet.
Always call the base Agent class version of this function if you implement OnEnable()
in your own Agent subclasses.
Examples
protected override void OnEnable()
{
base.OnEnable();
// additional OnEnable logic...
}
OnEpisodeBegin()
Implement OnEpisodeBegin()
to set up an Agent instance at the beginning
of an episode.
Declaration
public virtual void OnEpisodeBegin()
See Also
RequestAction()
Requests an action for this agent.
Declaration
public void RequestAction()
Remarks
Call RequestAction()
to repeat the previous action returned by the agent's
most recent decision. A new decision is not requested. When you call this function,
the Agent instance invokes On
You can use RequestAction()
in situations where an agent must take an action
every update, but doesn't need to make a decision as often. For example, an
agent that moves through its environment might need to apply an action to keep
moving, but only needs to make a decision to change course or speed occasionally.
You can add a DecisionRequestAction()
separately.
Note that RequestRequestAction()
; you do not need to
call both functions at the same time.
RequestDecision()
Requests a new decision for this agent.
Declaration
public void RequestDecision()
Remarks
Call RequestDecision()
whenever an agent needs a decision. You often
want to request a decision every environment step. However, if an agent
cannot use the decision every step, then you can request a decision less
frequently.
You can add a DecisionRequestDecision()
separately.
Note that this function calls Request
ScaleAction(float, float, float)
Scales continuous action from [-1, 1] to arbitrary range.
Declaration
protected static float ScaleAction(float rawAction, float min, float max)
Parameters
Type | Name | Description |
---|---|---|
float | rawAction | The input action value. |
float | min | The minimum output value. |
float | max | The maximum output value. |
Returns
Type | Description |
---|---|
float | The |
SetModel(string, ModelAsset, InferenceDevice)
Updates the Model assigned to this Agent instance.
Declaration
public void SetModel(string behaviorName, ModelAsset model, InferenceDevice inferenceDevice = InferenceDevice.Default)
Parameters
Type | Name | Description |
---|---|---|
string | behaviorName | The identifier of the behavior. This will categorize the agent when training. |
Model |
model | The model to use for inference. |
Inference |
inferenceDevice | Define the device on which the model will be run. |
Remarks
If the agent already has an assigned model, that model is replaced with the the provided one. However, if you call this function with arguments that are identical to the current parameters of the agent, then no changes are made.
Note: the behaviorName
parameter is ignored when not training.
The model
and inferenceDevice
parameters
are ignored when not using inference.
SetReward(float)
Overrides the current step reward of the agent and updates the episode reward accordingly.
Declaration
public void SetReward(float reward)
Parameters
Type | Name | Description |
---|---|---|
float | reward | The new value of the reward. |
Remarks
This function replaces any rewards given to the agent during the current step.
Use Add
Typically, you assign rewards in the Agent subclass's On
Rewards are used during reinforcement learning; they are ignored during inference.
See Agents - Rewards for general advice on implementing rewards and Reward Signals for information about mixing reward signals from curiosity and Generative Adversarial Imitation Learning (GAIL) with rewards supplied through this method.
WriteDiscreteActionMask(IDiscreteActionMask)
Implement WriteDiscreteActionMask()
to collects the masks for discrete
actions. When using discrete actions, the agent will not perform the masked
action.
Declaration
public virtual void WriteDiscreteActionMask(IDiscreteActionMask actionMask)
Parameters
Type | Name | Description |
---|---|---|
IDiscrete |
actionMask | The action mask for the agent. |
Remarks
When using Discrete Control, you can prevent the Agent from using a certain
action by masking it with Set
See Agents - Actions for more information on masking actions.