Optimize your application
Once you've captured performance data, focus on the following areas to improve performance:
- Monitor GPU activity: Check GpuCyclesand overall utilization to confirm GPU-bound performance.
- Identify primary queue: Compare GpuFragmentQueueUtilizationvsGpuNonFragmentQueueUtilizationto identify dominant workload.
- Analyze content efficiency: Review primitive culling rates and early-Z efficiency to identify content optimization opportunities.
- Examine functional units: Check arithmetic, texture, and load/store utilization to identify shader corec bottlenecks.
- Validate memory performance: Monitor bandwidth usage and stall rates to identify memory limitations. To further monitor memory usage, use the Memory Profiler.
Mobile-specific considerations
Mobile devices have specific concerns that you might want to address as part of your analysis:
- Power Efficiency: Memory bandwidth has significant power cost (~80 to 100 mW per GB/s).
- Thermal Throttling: Sustained high GPU utilization might trigger frequency reduction.
- Cycle Budgets: Mobile GPUs have strict cycle budgets; optimize for efficiency over peak performance.
- Battery Life: Balance visual quality with energy consumption for longer battery life.
Cycle budget planning
For mobile Mali GPUs, the recommended best practice from ARM is to set cycle budgets based on target resolution and frame rate to manage performance expectations as follows:
- Maximum cycle budget: (GPU Frequency × Core Count) / (Resolution × Target FPS)
- Real-world budget: 0.85 × Maximum Budgetassuming 85% utilization efficiency
For example, a 3-core 500MHz Mali GPU targeting 1080p60 has the following budget:
- Maximum budget: (500MHz × 3 cores) / (1920×1080 × 60 fps) = ~12 cycles per pixel
- Realistic budget: 0.85 × 12 = ~10 cycles per pixel
This budget must cover all processing costs including vertex shading, fragment shading, and fixed-function operations.