Optimizing Graphics Performance
Desktop
Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game's graphical rendering.
Optimizing Meshes
You only pay a rendering cost for objects that have a Mesh Renderer attached and are within the view frustum. There is no rendering cost from empty GameObjects in the scene or from objects that are out of the view of any camera.
Modern graphics cards are really good at handling a lot of polygons but there is a significant overhead for each batch (ie, mesh) that you submit to the graphics card. So if you have a 100-triangle object it is going to be just as expensive to render as a 1500-triangle object. The "sweet spot" for optimal rendering performance is somewhere around 1500-4000 triangles per mesh.
Usually, the best way to improve rendering performance is to combine objects together so that each mesh has around 1500 or more triangles and uses only one Material for the entire mesh. It is important to understand that combining two objects which don't share a material does not give you any performance increase at all. The most common reason for having multiple materials is that two meshes don't share the same textures, so to optimize rendering performance, you should ensure that any objects you combine share the same textures.
However, when using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense, as explained below.
Pixel Lights in the Forward Rendering Path
Note: this applies only to the Forward rendering path.
If you use pixel lighting then each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it will increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object will be taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, and so nothing is gained by combining. For this reason, you should not combine meshes that are far enough apart to be affected by different sets of pixel lights.
During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is. Furthermore, some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important will typically have a lower rendering overhead.
As an example, consider a driving game where the player's car is driving in the dark with headlights switched on. The headlights are likely to be the most visually significant light sources in the game, so their Render Mode would probably be set to Important. On the other hand, there may be other lights in the game that are less important (other cars' rear lights, say) and which don't improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important so as to avoid wasting rendering capacity in places where it will give little benefit.
Per-Layer Cull Distances
In some games, it may be appropriate to cull small objects more aggressively than large ones in order to reduce number of draw calls. For example, small rocks and debris could be made invisible at long distances while large buildings would still be visible. To accomplish this culling, you can put small objects into a separate layer and setup per-layer cull distances using the Camera.layerCullDistances script function.
Shadows
If you are deploying for Desktop platforms then you should be careful when using shadows because they can add a lot of rendering overhead to your game if not used correctly. For further details, see the Shadows page.
Note: Shadows are not currently supported on iOS or Android devices.
See Also
iOS
A useful background to iOS optimization can be found on the iOS hardware page.
Alpha-Testing
Unlike desktop machines, iOS devices incur a high performance overhead for alpha-testing (or use of the discard and clip operations in pixel shaders). You should replace alpha-test shaders with alpha-blend if at all possible. Where alpha-testing cannot be avoided, you should keep the overall number of visible alpha-tested pixels to a minimum.
Vertex Performance
Generally you should aim to have no more than 40,000 vertices visible per frame when targeting iPhone 3GS or newer devices. You should keep the vertex count below 10,000 for older devices equipped with the MBX GPU, such as the original iPhone, iPhone 3G and iPod Touch 1st and 2nd Generation.
Lighting Performance
Per-pixel dynamic lighting will add significant rendering overhead to every affected pixel and can lead to objects being rendered in multiple passes. Avoid having more than one Pixel Light illuminating any single object and use directional lights as far as possible. Note that a Pixel Light is a one which has its Render Mode option set to Important.
Per-vertex dynamic lighting can add significant cost to vertex transformations. Try to avoid situations where multiple lights illuminate any given object. For static objects, baked lighting is much more efficient.
Optimize Model Geometry
When optimizing the geometry of a model, there are two basic rules:
- Don't use any more triangles than necessary
- Try to keep the number of UV mapping seams and hard edges (ie, doubled-up vertices) as low as possible
Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the geometric vertex count, ie, the number of distinct corner points that make up a model.
For a graphics card, however, some geometric vertices will need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is invariably a lot higher than the count given by the 3D application.
Texture Compression
Using iOS's native PVRT compression formats will decrease the size of your textures (resulting in faster load times and smaller memory footprint) and can also dramatically increase rendering performance. Compressed textures use only a fraction of the memory bandwidth needed for uncompressed 32bit RGBA textures. A comparison of uncompressed vs compressed texture performance can be found in the iOS Hardware Guide.
Some images are prone to visual artifacts in the alpha channels of PVRT-compressed textures. In such cases, you might need to tweak the PVRT compression parameters directly in your imaging software. You can do that by installing the PVR export plugin or using PVRTexTool from Imagination Tech, the creators of the PVRT format. The resulting compressed image file with a .pvr extension will be imported by the Unity editor directly and the specified compression parameters will be preserved.
If PVRT-compressed textures do not give good enough visual quality or you need especially crisp imaging (as you might for GUI textures, say) then you should consider using 16-bit textures instead of RGBA textures. By doing so, you will reduce the memory bandwidth by half.
Tips for writing high-performance shaders
The GPUs on iOS devices have fully supported pixel and vertex shaders since the iPhone 3GS. However, the performance is nowhere near what you would get from a desktop machine, so you should not expect desktop shaders to port to iOS unchanged. Typically, shaders will need to be hand optimized to reduce calculations and texture reads in order to get good performance.
Complex mathematical operations
Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan, etc) will tax the GPU greatly, so a good rule of thumb is to have no more than one such operation per fragment. Consider using lookup textures as an alternative where applicable.
It is not advisable to attempt to write your own normalize, dot, inversesqrt operations, however. If you use the built-in ones then the driver will generate much better code for you.
Bear in mind also that the discard operation will make your fragments slower.
Floating point operations
You should always specify the precision of floating point variables when writing custom shaders. It is critical to pick the smallest possible floating point format in order to get the best performance.
If the shader is written in GLSL ES then the floating point precision is specified as follows:-
- highp - full 32-bit floating point format, suitable for vertex transformations but has the slowest performance.
- mediump - reduced 16-bit floating point format, suitable for texture UV coordinates and roughly twice as fast as highp
- lowp - 10-bit fixed point format, suitable for colors, lighting calculation and other high-performance operations and roughly four times faster than highp
If the shader is written in CG or it is a surface shader then precision is specified as follows:-
- float - analogous to highp in GLSL ES, slowest
- half - analogous to mediump in GLSL ES, roughly twice as fast as float
- fixed - analogous to lowp in GLSL ES, roughly four times faster than float
For further details about shader performance, please read the Shader Performance page.
Hardware documentation
Take your time to study Apple documentations on hardware and best practices for writing shaders. Note that we would suggest to be more aggressive with floating point precision hints however.
Bake Lighting into Lightmaps
Bake your scene static lighting into textures using Unity built-in Lightmapper. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
- It is going to run a lot faster (2-3 times for eg. 2 pixel lights)
- And look a lot better since you can bake global illumination and the lightmapper can smooth the results
Share Materials
If a number of objects being rendered by the same camera uses the same material, then Unity iOS will be able to employ a large variety of internal optimizations such as:
- Avoiding setting various render states to OpenGL ES.
- Avoiding calculation of different parameters required to setup vertex and pixel processing
- Batching small moving objects to reduce draw calls
- Batching both big and small objects with enabled "static" property to reduce draw calls
All these optimizations will save you precious CPU cycles. Therefore, putting extra work to combine textures into single atlas and making number of objects to use the same material will always pay off. Do it!
Simple Checklist to make Your Game Faster
- Keep vertex count below:
- 40K per frame when targeting iPhone 3GS and newer devices (with SGX GPU)
- 10K per frame when targeting older devices (with MBX GPU)
- If you're using built-in shaders, peek ones from Mobile category. Keep in mind that Mobile/VertexLit is currently the fastest shader.
- Keep the number of different materials per scene low - share as many materials between different objects as possible.
- Set Static property on a non-moving objects to allow internal optimizations.
- Use PVRTC formats for textures when possible, otherwise choose 16bit textures over 32bit.
- Use combiners or pixel shaders to mix several textures per fragment instead of multi-pass approach.
- If writing custom shaders, always use smallest possible floating point format:
- fixed / lowp -- perfect for color, lighting information and normals,
- half / mediump -- for texture UV coordinates,
- float / highp -- avoid in pixel shaders, fine to use in vertex shader for vertex position calculations.
- Minimize use of complex mathematical operations such as pow, sin, cos etc in pixel shaders.
- Do not use Pixel Lights when it is not necessary -- choose to have only a single (preferably directional) pixel light affecting your geometry.
- Do not use dynamic lights when it is not necessary -- choose to bake lighting instead.
- Choose to use less textures per fragment.
- Avoid alpha-testing, choose alpha-blending instead.
- Do not use fog when it is not necessary.
- Learn benefits of Occlusion culling and use it to reduce amount of visible geometry and draw-calls in case of complex static scenes with lots of occlusion. Plan your levels to benefit from Occlusion culling.
- Use skyboxes to "fake" distant geometry.
See Also
Android
Lighting Performance
Per-pixel dynamic lighting will add significant cost to every affected pixel and can lead to rendering object in multiple passes. Avoid having more than one Pixel Light affecting any single object, prefer it to be a directional light. Note that Pixel Light is a light which has a Render Mode setting set to Important.
Per-vertex dynamic lighting can add significant cost to vertex transformations. Avoid multiple lights affecting single objects. Bake lighting for static objects.
Optimize Model Geometry
When optimizing the geometry of a model, there are two basic rules:
- Don't use excessive amount of faces if you don't have to
- Keep the number of UV mapping seams and hard edges as low as possible
Note that the actual number of vertices that graphics hardware has to process is usually not the same as what is displayed in a 3D application. Modeling applications usually display the geometric vertex count, i.e. number of points that make up a model.
For a graphics card however, some vertices have to be split into separate ones. If a vertex has multiple normals (it's on a "hard edge"), or has multiple UV coordinates, or has multiple vertex colors, it has to be split. So the vertex count you see in Unity is almost always different from the one displayed in 3D application.
Texture Compression
All Android devices with support for OpenGL ES 2.0 also support the ETC1 compression format; it's therefore encouraged to whenever possible use ETC1 as the prefered texture format. Using compressed textures is important not only to decrease the size of your textures (resulting in faster load times and smaller memory footprint), but can also increase your rendering performance dramatically! Compressed textures require only a fraction of memory bandwidth compared to full blown 32bit RGBA textures.
If targeting a specific graphics architecture, such as the Nvidia Tegra or Qualcomm Snapdragon, it may be worth considering using the proprietary compression formats available on those architectures. The Android Market also allows filtering based on supported texture compression format, meaning a distribution archive (.apk) with for example DXT compressed textures can be prevented for download on a device which doesn't support it.
Enable Mip Maps
As a rule of thumb, always have Generate Mip Maps enabled. In the same way Texture Compression can help limit the amount of texture data transfered when the GPU is rendering, a mip mapped texture will enable the GPU to use a lower-resolution texture for smaller triangles. The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a pure 2D game.
Tips for writing well performing shaders
Although all Android OpenGL ES 2.0 GPUs fully support pixel and vertex shaders, do not expect to grab a desktop shader with complex per-pixel functionality and run it on Android device at 30 frames per second. Most often shaders will have to be hand optimized, calculations and texture reads kept to a minimum in order to achieve good frame rates.
Complex arithmetic operations
Arithmetic operations such as pow, exp, log, cos, sin, tan etc heavily tax GPU. Rule of thumb is to have not more than one such operation per fragment. Consider that sometimes lookup textures could be a better alternative.
Do NOT try to roll your own normalize, dot, inversesqrt operations however. Always use built-in ones -- this was driver will generate much better code for you.
Keep in mind that discard operation will make your fragments slower.
Floating point operations
Always specify precision of the floating point variables while writing custom shaders. It is crucial to pick smallest possible format in order to achieve best performance.
If shader is written in GLSL ES, then precision is specified as following:
- highp - full 32 bits floating point format, well suitable for vertex transformations, slowest
- mediump - reduced 16 bits floating point format, well suitable for texture UV coordinates, roughly x2 faster than highp
- lowp - 10 bits fixed point format, well suitable for colors, lighting calculation and other high performant operations, roughly x4 faster than highp
If shader is written in CG or it is a surface shader, then precision is specified as following:
- float - analogous to highp in GLSL ES, slowest
- half - analogous to mediump in GLSL ES, roughly x2 faster than float
- fixed - analogous to lowp in GLSL ES, roughly x4 faster than float
For more details about general shader performance, please read the Shader Performance page. Quoted performance figures are based on the PowerVR graphics architecture, available in devices such as the Samsung Nexus S. Other hardware architectures may experience less (or more) benefit from using reduced register precision.
Bake Lighting into Lightmaps
Bake your scene static lighting into textures using Unity built-in Lightmapper. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
- It is going to run a lot faster (2-3 times for eg. 2 pixel lights)
- And look a lot better since you can bake global illumination and the lightmapper can smooth the results
Share Materials
If a number of objects being rendered by the same camera uses the same material, then Unity Android will be able to employ a large variety of internal optimizations such as:
- Avoiding setting various render states to OpenGL ES.
- Avoiding calculation of different parameters required to setup vertex and pixel processing
- Batching small moving objects to reduce draw calls
- Batching both big and small objects with enabled "static" property to reduce draw calls
All these optimizations will save you precious CPU cycles. Therefore, putting extra work to combine textures into single atlas and making number of objects to use the same material will always pay off. Do it!
Simple Checklist to make Your Game Faster
- If you're using built-in shaders, peek ones from Mobile category. Keep in mind that Mobile/VertexLit is currently the fastest shader.
- Keep the number of different materials per scene low - share as many materials between different objects as possible.
- Set Static property on a non-moving objects to allow internal optimizations.
- Use ETC1 format for textures when possible, otherwise choose 16bit textures over 32bit for uncompressed texture data.
- Use mipmaps.
- Use combiners or pixel shaders to mix several textures per fragment instead of multi-pass approach.
- If writing custom shaders, always use smallest possible floating point format:
- fixed / lowp -- perfect for color, lighting information and normals,
- half / mediump -- for texture UV coordinates,
- float / highp -- avoid in pixel shaders, fine to use in vertex shader for vertex position calculations.
- Minimize use of complex mathematical operations such as pow, sin, cos etc in pixel shaders.
- Do not use Pixel Lights when it is not necessary -- choose to have only a single (preferably directional) pixel light affecting your geometry.
- Do not use dynamic lights when it is not necessary -- choose to bake lighting instead.
- Choose to use less textures per fragment.
- Avoid alpha-testing, choose alpha-blending instead.
- Do not use fog when it is not necessary.
- Learn benefits of Occlusion culling and use it to reduce amount of visible geometry and draw-calls in case of complex static scenes with lots of occlusion. Plan your levels to benefit from Occlusion culling.
- Use skyboxes to "fake" distant geometry.
See Also
- iPhone Optimizing Graphics Performance (for when the graphics architecture is known to be Imagination Tech's PowerVR.)