docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Loop vectorization

    Burst uses loop vectorization to improve the performance of your code. It uses this technique to loop over multiple values at the same time, rather than looping over single values at a time, which speeds up the performance of your code. For example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
    {
        for (var i = 0; i < count; i++)
        {
            a[i] += b[i];
        }
    }
    
    public static unsafe void Foo(int count)
    {
        var a = stackalloc int[count];
        var b = stackalloc int[count];
    
        Bar(a, b, count);
    }
    

    Burst converts the scalar loop in Bar into a vectorized loop. Then, instead of looping over a single value at a time, it generates code that loops over multiple values at the same time, which produces faster code.

    This is the x64 assembly Burst generates for AVX2 for the loop in Bar above:

    .LBB1_4:
        vmovdqu    ymm0, ymmword ptr [rdx + 4*rax]
        vmovdqu    ymm1, ymmword ptr [rdx + 4*rax + 32]
        vmovdqu    ymm2, ymmword ptr [rdx + 4*rax + 64]
        vmovdqu    ymm3, ymmword ptr [rdx + 4*rax + 96]
        vpaddd     ymm0, ymm0, ymmword ptr [rcx + 4*rax]
        vpaddd     ymm1, ymm1, ymmword ptr [rcx + 4*rax + 32]
        vpaddd     ymm2, ymm2, ymmword ptr [rcx + 4*rax + 64]
        vpaddd     ymm3, ymm3, ymmword ptr [rcx + 4*rax + 96]
        vmovdqu    ymmword ptr [rcx + 4*rax], ymm0
        vmovdqu    ymmword ptr [rcx + 4*rax + 32], ymm1
        vmovdqu    ymmword ptr [rcx + 4*rax + 64], ymm2
        vmovdqu    ymmword ptr [rcx + 4*rax + 96], ymm3
        add        rax, 32
        cmp        r8, rax
        jne        .LBB1_4
    

    Burst has unrolled and vectorized the loop into four vpaddd instructions, which calculate eight integer additions each, for a total of 32 integer additions per loop iteration.

    Loop vectorization intrinsics

    Burst includes experimental intrinsics to express loop vectorization assumptions: Loop.ExpectVectorized and Loop.ExpectNotVectorized. Burst then validates the loop vectorization at compile-time. This is useful in a situation where you might break the auto vectorization. For example, if you introduce a branch to the code:

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
    {
        for (var i = 0; i < count; i++)
        {
            if (a[i] > b[i])
            {
                break;
            }
    
            a[i] += b[i];
        }
    }
    

    This changes the assembly to the following:

    .LBB1_3:
        mov        r9d, dword ptr [rcx + 4*r10]
        mov        eax, dword ptr [rdx + 4*r10]
        cmp        r9d, eax
        jg        .LBB1_4
        add        eax, r9d
        mov        dword ptr [rcx + 4*r10], eax
        inc        r10
        cmp        r8, r10
        jne        .LBB1_3
    

    This isn't ideal because the loop is scalar and only has 1 integer addition per loop iteration. It can be difficult to spot this happening in your code, so use the experimental intrinsics Loop.ExpectVectorized and Loop.ExpectNotVectorized to express loop vectorization assumptions. Burst then validates the loop vectorization at compile-time.

    Because the intrinsics are experimental, you need to use the UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS preprocessor define to enable them.

    The following example shows the original Bar example with the Loop.ExpectVectorized intrinsic:

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
    {
        for (var i = 0; i < count; i++)
        {
            Unity.Burst.CompilerServices.Loop.ExpectVectorized();
    
            a[i] += b[i];
        }
    }
    

    Burst then validates at compile-time whether the loop is vectorized. If the loop isn't vectorized, Burst emits a compiler error. The following example produces an error:

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
    {
        for (var i = 0; i < count; i++)
        {
            Unity.Burst.CompilerServices.Loop.ExpectVectorized();
    
            if (a[i] > b[i])
            {
                break;
            }
    
            a[i] += b[i];
        }
    }
    

    Burst emits the following error at compile-time:

    LoopIntrinsics.cs(6,9): Burst error BC1321: The loop is not vectorized where it was expected that it is vectorized.

    Important

    These intrinsics don't work inside if statements. Burst doesn't prevent this from happening, so you won't see a compile-time error for this.

    Additional resources

    • Burst intrinsics
    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)