If your custom shadersA program that runs on the GPU. More info
See in Glossary take a long time to execute, use a GPU profiling tool to check and improve their performance.
Always profile your application on the target platform, so the results represent the actual performance, not the performance on your development machine. For a list of profiling tools for each platform, refer to Profiling tools reference.
Check the GPU profiling tool for the following metrics, which can indicate performance issues in custom shaders:
Depending on the platform, you might not have access to all these performance metrics.
To reduce the number of texture reads, use the following approaches:
To reduce the number of texture writes, use the following approaches:
To reduce the amount the GPU reads from and writes to buffers, use the following approaches:
To limit how many operations the ALU executes, use the following approaches:
Move calculations from the fragment shader to the vertex shaderA program that runs on each vertex of a 3D model when the model is being rendered. More info
See in Glossary, so the ALU calculates once per vertex instead of once per fragment. If you need per-fragment values, let the GPU interpolate the per-vertex values.
Do a calculation once per frame in a C# script, then pass the value to the shader.
Avoid slow operations such as sqrt, log, and exp.
Avoid slow trigonometric operations such as sin and cos. Use dot and cross instead where you can.
Use half instead of float. For more information, refer to Use 16-bit precision in shaders.
Note: half can cause rendering artifacts if you use it for world-space positions or other variables that need to be high precision.
Use a lookup texture (LUT) instead of complex calculations, for example to calculate post-processingA process that improves product visuals by applying filters and effects before the image appears on screen. You can use post-processing effects to simulate physical camera and film properties, for example Bloom and Depth of Field. More info post processing, postprocessing, postprocess
See in Glossary effects such as color grading.
The GPU uses depth testing to discard hidden fragments before it executes the fragment shader, so it has fewer fragments to draw. This is called early depth testing.
To check if your shader prevents the GPU discarding fragments early, check for the following profiler metrics:
To increase the number of fragments the GPU discards early, use the following approaches:
discard and clip operations in your fragment shader.discard and clip, for example because you test the alpha channel for alpha clipping, use a conditional or a material property to disable them for GameObjectsThe fundamental object in Unity scenes, which can represent characters, props, scenery, cameras, waypoints, and more. A GameObject’s functionality is defined by the Components attached to it. More infoYou can also create a depth prepass that writes the depth of objects before URP’s opaque and transparent passes. For more information, refer to Write a depth-only pass in a shader.
You can use dynamic branching if your shaders run on a fast GPU, and don’t have asymmetric code branches where one branch is longer or more complex than the other. The GPU allocates registers based on the most complex branch, even if that branch never executes. The GPU might also execute both branches at once, which means the shorter branch must wait for the longer branch to finish.
For more information, refer to How Unity compiles branching shaders and Introduction to shader variants.