Version: 2019.2
应用程序性能分析
渲染性能分析器 (Rendering Profiler)

CPU Usage Profiler module

The CPU Usage Profiler module’s chart displays where time is spent in your application. It contains an overview of all the important areas where your application spends time, such as on rendering, its scripts, and animation. This section of the documentation covers:

图表类别

The CPU Usage Profiler module’s chart tracks the time spent on the application’s main thread. The timings are divided into nine categories,. You can change the order of the categories in the chart by dragging and dropping them in the chart’s legend. You can also click a category’s colored legend to toggle its display.

类别 描述
Rendering 应用程序花费多少时间来渲染图形。
Scripts 应用程序花费多少时间来运行脚本。
Physics 应用程序在物理引擎上花费多少时间。
动画 How much time your application spends on animating SkinnedMeshRenderers, GameObjects and other components in your application. This also includes the time spent on some calculations for systems the Animation and Animator components utilize.
GarbageCollector How much time your application spends on running the Garbage Collector.
VSync How much time is spent in a frame waiting for the targetFrameRate or the next VBlank to sync with. This is according to the QualitySettings.vSyncCount value, or the target framerate, or the VSync setting that is the default or enforced maximum of the platform your application is running on. For more information about VSync, see the section in this documentation on Rendering and VSync samples.
Global Illumination How much time is spent on lighting in your application.
UI How much time is spent on displaying your application’s UI.
Others How much time is spent in code that does not fall in any of the other categories, for instance the entire EditorLoop, or the Profiling overhead when profiling Playmode in the Editor.

模块详细信息面板

When you select the CPU Usage module, the module details pane displays a breakdown of where time was spent in the selected frame. The timing data is either displayed as a timeline or a hierarchical table, which you can change by clicking on the top left dropdown in the module details pane. The three views available are:

视图 功能
Timeline Displays a breakdown of the timings for a particular frame alongside a time axis of the frame’s length. This is the only view mode that you can use to see timings on threads other than the main thread, and correlate timings across threads, for instance Job System worker threads spinning up after a system on the main thread schedules them.
Hierarchy Groups the timing data by its hierarchical structure. This option displays the elements that your application called in a descending list format, ordered by the time spent (default), the amount of scripting memory allocated (GC.Alloc) or the number of calls. To change the column that orders the table, click on the table column’s header.
Raw Hierarchy 以类似于发生计时的调用栈的层级结构显示时间数据。Unity 在此模式中单独列出每个调用栈,而不是像在 Hierarchy 视图中一样将它们合并。

Timeline view

CPU Usage Profiler 模块以及 Timeline 视图
CPU Usage Profiler 模块以及 Timeline 视图

The Timeline view is the default view for the CPU Usage Profiler module. It contains an overview of where time is spent in your application and how the timings relate to each other. The Timeline view displays profiling data from all threads in their own own subsections and along the same time axis. This is unlike the Hierarchy views, which only show profiling data from the main thread.

You can use the Timeline view to see how activities on the different threads correlate to each other in their parallel execution. You can see how much or little you are utilizing the different threads, such as the Job System’s worker threads, how work on the threads are queued up, and if any thread is idling (“Idle” sample) or waiting for another thread or a Job to finish (“Wait for x” sample).

In the screenshot above, there are light blue animation samples in the worker threads of the Job System while the main thread also processes animation data. The rendering work is split between the Main Thread and the Render Thread. The Render Thread does not align with the Main Thread. During the first ~0.4 ms of this particular frame, the Render Thread was still rendering the last frame. Similarly, this frame takes up the first ~0.1ms of the next frame. Bars that belong to other frames are greyed out, and the vertical lines on the time ruler at the top of the modules detail pane mark the beginning and the end of the frame on the main thread.

When you profile the GPU Usage, the toolbar above the time ruler shows how much of the frame time was spent on the CPU and how much on the GPU. In this example, the game is GPU bound and spends the biggest amount of time on the CPU for rendering, so this application needs its graphics performance optimized.

Navigating and selecting items

To zoom in on areas of the time axis, use the scroll wheel on your mouse, or press and hold the Alt key while you drag with the right mouse button pressed down. You can also use the ends of the horizontal scrollbar to zoom in. Press the A key on your keyboard to reset the zoom so that the entire frame time is visible.

Whenever you see a white arrow on the bottom of a thread, you can click it to unfold the thread to show all lines or click again to show only the top ones. You can also drag the line that separates the threads to readjust how many lines you can see. Double-clicking the line sets the height of the thread’s section to the maximum depth of the call stack. To pan the view, press the middle mouse button or hold the Alt key (Command key on macOS) and press the left mouse button.

要折叠和展开线程组,请单击视图最左侧的线程名称旁的折叠箭头。

To see an item’s contribution to the CPU chart, select it in the lower pane by clicking on it, and the Profiler highlights its contribution, and dims the rest of the chart. To deselect the item, click elsewhere in the view. Press the F key to focus the current sample you selected or show the default zoom level if you’ve selected nothing.

Timeline 视图中的 CPU Usage 模块,并且选择了一项
Timeline 视图中的 CPU Usage 模块,并且选择了一项

In the above example, the tooltip on the selected item provides further details such as the number of instances and the total time of this sample across all threads. GC.Alloc samples show up colored in red-magenta and show you the size of the Allocation.

To show Managed Callstacks in the tooltip, enable the Managed Callstacks option from the Allocation Callstacks dropdown in the Profiler window’s toolbar. You need to enable the Managed Callstacks setting before profiling a frame to show it for that frame. This option only works when you are profiling in the Editor. For more information, see the section on Allocation Callstacks.

You can also manually measure any arbitrary time span in the Timeline view by clicking and dragging horizontally anywhere to display an overlay across a section of the timeline. You can see the time encompassed by that overlay in the time ruler at the top. Press the F key while the overlay is displayed to frame the view horizontally along the selected time section. Click anywhere to remove the overlay.

Hierarchy and Raw Hierarchy view

When you switch to the Hierarchy or Raw Hierarchy view, your selection carries over, as long as the sample is on the main thread. If you cannot immediately find your selection, press the F key to focus it.

CPU Usage Profiler 模块以及 Hierarchy 视图
CPU Usage Profiler 模块以及 Hierarchy 视图

The Hierarchy view lists all samples that have been profiled and groups them together by their shared call stack and the hierarchy of ProfilerMarkers. The Raw Hierarchy view does not group samples together, which makes it ideal for looking into samples on a granular level. Both display the following detailed information for each step in the hierarchy next to each row:

属性 功能
Total The total amount of time spent on a particular function as a percentage.
Self The total amount of time spent on a particular function as a percentage, excluding the time Unity spends calling sub-functions.

For example, in the screenshot, 41.7% of time is spent in the Camera.Render function. This is because it calls a lot of drawing and culling functions, however when you exclude the functions it calls, only 3.5% of time is spent on the Camera.Render function itself.
Calls 此帧中调用此函数的次数。在 Raw Hierarchy 视图中,此列中的值始终为 1,因为性能分析器不会合并样本的层级视图。
GC Alloc How much scripting heap memory Unity has allocated in the current frame. The scripting heap memory is managed by the garbage collector.

Whenever a GC.Collect() is called or there is a scripting heap allocation that does not fit within the heap’s current size, the garbage collector is triggered. It marks all allocations that have no more references to them and collects them. This process shows up as GC.Collect samples in the Profiler.

Unity runs the garbage collector more frequently as you allocate more on the heap. As the managed heap grows, it takes Unity longer to mark and collect the memory. As such, you should keep the GC Alloc value at zero while your application runs to prevent the garbage collector from affecting your framerate, as well as keeping the overall heap size small.

For more details about the managed heap see the Understanding Automatic Memory Management documentation.
Time ms The total amount of time spent on a particular function in milliseconds. This information might be misleading, as it only contains the time spent on the main thread. If your application uses the Job System or multithreaded rendering, you should be aware of this.
Self ms The total amount of time spent on a particular function in milliseconds, excluding the time Unity spends calling sub-functions.
Warning Indicated by a warning icon, this displays how many times a warning has been triggered during the current frame. For more information see the Performance warnings section of this documentation.

You can also get more information about where your application calls and uses the profiled functions by selecting either Show Related Objects or Show Calls view from the Details dropdown at the top right hand corner of the module details pane.

Show Related Objects panel
Show Related Objects panel

The Show Related Objects view displays a list of UnityEngine.Objects that are associated with the Profiler sample, using the Begin() overload that takes a UnityEngine.Object. Some samples Unity reports have these associations built in, such as Camera.Render samples that are linked to the Camera object that does the rendering. These objects are reported via their instance ID and resolved to a name in the Profiler window.

When you click on one of these objects, Unity tries to find the object via the scene hierarchy and ping it. Because the association uses the instance ID, pinging only works when you are profiling your application the Editor, and for as long as the object still exists.

For GC.Alloc samples, this view displays a list of “N/A” items, one for each allocation that occured at this hierarchy level, with the size of the allocation listed in the GC.Alloc column. If you profile your application in the Editor with the Allocation Callstacks setting enabled, when you select a GC.Alloc sample in this view, the call stack for the allocated scripting object you selected is displayed, even if you did not enable the Deep Profiling setting. For more information, see the Allocation Callstacks section of this documentation.

Show Calls 面板
Show Calls 面板

The Show Calls panel displays where the selected sample is being called from as well as what other functions it calls to.

Additionally, under the gear icon at the top of the module details pane, you can enable or disable the Collapse Editor Only Samples setting. This collapses all samples in the Player Loop that only happen because of Editor-only safety checks. When the samples are collapsed, their GC.Alloc value does not contribute to GC.Alloc value of their enclosing sample. This setting is enabled by default. For more information, see the Editor only samples section of this documentation.

Common samples

As well as samples that your scripting code generates, Unity provides a large amount of samples that give you some insight into what is taking up time in your application. The following tables explain what some of the more common samples do.

Main thread base samples

主线程基础样本可清晰区分在应用程序上花费的时间与在 Editor 和性能分析器活动中花费的时间。录制器也可以使用这些样本来获取主线程上帧的时间使用情况。

样本 功能
PlayerLoop The root to any samples that originate from your application’s main loop. When you enable the Profile Editor setting while the Player is running in the Editor in active playmode, this sample nests under the EditorLoop.
EditorLoop The root to any samples that originate from the Editor’s main loop. This is only present while you profile a player in the Editor. When you disable the Profile Editor setting, this sample shows how much time of the frame was spent rendering and running the Editor that contains the Player.
Profiler.CollectEditorStats 与收集不同活跃性能分析器模块的统计信息有关的任何样本的根。子样本 Profiler.CollectGlobalStats 下的任何样本都会在播放器上造成开销。所有其他子样本仅影响 Editor。要关闭特定模块,请关闭它们的图表或调用 Profiler.SetAreaEnabled()

Script update samples

除非使用作业系统,否则大多数脚本代码都嵌套在以下样本下面:

样本 功能
Update.ScriptRunBehaviourUpdate This sample includes calls to MonoBehaviour.Update and processing of coroutines.
BehaviourUpdate This sample processes all Update() methods.
CoroutinesDelayedCalls 包含首次生成后的协程样本。
PreLateUpdate.ScriptRunBehaviourLateUpdate This sample processes all LateUpdate() methods.
FixedBehaviourUpdate This sample processes all FixedUpdate() methods.

Rendering and VSync samples

These samples show where the CPU spends time processing data for the GPU or where it might be waiting for the GPU to finish. If the GPU Profiler is not available or it is adding too much overhead, the toolbar does not show this information. These samples can give you an idea of if you are CPU-bound or GPU-bound.

样本 功能
WaitForTargetFPS The time your application spends waiting for the targeted FPS that Application.targetFrameRate specifies.

If this sample is a sub-sample of Gfx.WaitForPresent, it represents the amount of time your application spends waiting for the VSync configured in QualitySettings.vSyncCount.

Note: The Editor doesn’t VSync on the GPU and instead uses WaitForTargetFPS to simulate the delay for VSync. Some platforms, in particular Android and iOS, enforce VSync or have a default frame rate cap of 30 or 60.
Gfx.ProcessCommands Contains all processing of the rendering commands on the render thread. Some of that time might be spent waiting for VSync or new commands from the main thread, which you can see from it’s child sample Gfx.WaitForPresent.
Gfx.WaitForCommands 指示渲染线程已准备好接受新命令,并且可能指示主线程上出现瓶颈。
Gfx.PresentFrame Indicates the time your application spends waiting for the GPU to render and present the frame, which might include waiting for VSync.

A WaitForTargetFPS sample on the main thread shows how much of that time is spent waiting for VSync.
Gfx.WaitForPresent Indicates that the main thread is ready to start rendering the next frame, but the render thread has not finished waiting on the GPU to present the frame. This might indicate that your application is GPU-bound. To see what the render thread is simultaneously spending time on, check the Timeline view.

If the render thread spends time in Camera.Render, your application is CPU-bound and might be spending too much time sending draw calls or textures to the GPU.

If the render thread spends time in Gfx.PresentFrame, your game is GPU-bound or it might be waiting for VSync on the GPU. A WaitForTargetFPS sub-sample of GFX.WaitForPresent indicates the portion of the Present phase that your application spends waiting for VSync.

Physics samples

The following table outlines some of the high-level physics Profiler samples. FixedUpdate calls all of these samples.

样本 功能
Physics.Simulate Updates the state of the current physics by instructing the physics engine (PhysX) to run its simulation.
Physics.Processing 处理所有非布料物理作业。展开此样本可显示物理引擎内部完成的工作的低级细节。
Physics.ProcessingCloth 处理所有布料物理作业。展开此样本可显示物理引擎内部完成的工作的低级细节。
Physics.FetchResults 从物理引擎收集物理模拟结果。
Physics.UpdateBodies 更新所有物理体的位置和旋转。此样本还包含在发送这些更新时传达的消息。
Physics.ProcessReports Runs once the physics FixedUpdate ends. Processes the various stages of responding to the results of the simulation. Contacts, joint breaks and triggers update and message in this sample. There are four distinct sub stages:
Physics.TriggerEnterExits Processes OnTriggerEnter and OnTriggerExit events.
Physics.TriggerStays Processes OnTriggerStay events.
Physics.Contacts Processes OnCollisionEnter, OnCollisionExit, and OnCollisionStay events.
Physics.JointBreaks 处理与受损关节相关的更新和消息。
Physics.UpdateCloth Contains updates relating to cloth and their skinned meshes.
Physics.Interpolation 管理所有物理对象的位置和旋转的插值。

性能警告

The CPU Profiler can detect some common performance issues and warn you about them. These appear in the Warning column of the Hierarchy view in the module details pane.

性能分析器警告指示已移动静态碰撞体
性能分析器警告指示已移动静态碰撞体

性能分析器可检测的具体问题包括:

  • Rigidbody.SetKinematic:为刚体重新创建非凸面体 MeshCollider
  • Animation.DestroyAnimationClip:触发 RebuildInternalState
  • Animation.AddClip:触发 RebuildInternalState
  • Animation.RemoveClip:触发 RebuildInternalState
  • Animation.Clone:触发 RebuildInternalState
  • Animation.Deactivate:触发 RebuildInternalState

Allocation Callstacks

If you profile your application in the Editor, you can see the full call stack for a GC.Alloc sample. To do this, enable the Managed Allocations setting in the Allocation Callstacks dropdown of the toolbar of the Profiler window. In the frames you profile after you turn this option on, the GC.Alloc samples contain their callstacks.

Every scripting heap allocation shows up as a GC.Alloc sample in both the Hierarchy view and Timeline view. In Timeline view, it is colored bright magenta. To see a call stack, select the CPU Profiler Module and select a GC.Alloc sample in Timeline view. The call stack appears in the selection highlight.

Alternatively you can see the call stack in Hierarchy or Raw Hierarchy view. Set the Details view to Show Related Objects. Because GC.Alloc samples have no name, they show up as N/A in this panel. When you select an N/A object, the call stack is displayed in the bottom half of the Details view.

有关托管分配的更多信息,请参阅关于了解自动内存管理的文档。

Timeline 视图中的 CPU Usage 模块以及一个 GC.Alloc 调用栈
Timeline 视图中的 CPU Usage 模块以及一个 GC.Alloc 调用栈
Hierarchy 视图中的调用栈
Hierarchy 视图中的调用栈

Editor only samples

Some samples are only present when you are profiling in the Editor. This includes security checks like the GetComponentNullErrorWrapper, which helps to identify a null component usage; CheckConsistency, which validates object setup; CheckAllowDestructionRecursive, which is a destruction check; and Prefab-related activities. All of these samples are not present in the Player.

Hierarchy view with EditorOnly sample collapsed
Hierarchy view with EditorOnly sample collapsed

By default, Editor-only samples are collapsed in the Hierarchy view, and are named EditorOnly [SampleName]. While they might cause GC.Alloc, they do not contribute to the GC.Alloc value of their enclosing sample if they are collapsed.

To change the default behavior, click the gear icon in the top right of the module details pane and disable the Collapse EditorOnly Samples option. When you do this, you can expand the sample and contribute its GC.Alloc value to the enclosing sample.

Hierarchy view with EditorOnly sample expanded
Hierarchy view with EditorOnly sample expanded

This option does not affect the Timeline view. These samples can usually be ignored and are a prompt to profile Player builds on target devices to find actual issues.

应用程序性能分析
渲染性能分析器 (Rendering Profiler)