Class X86.Sse
SSE intrinsics
Inherited Members
Namespace: Unity.Burst.Intrinsics
Assembly: Unity.Burst.dll
Syntax
public static class X86.Sse
Properties
Name | Description |
---|---|
IsSseSupported | Evaluates to true at compile time if SSE intrinsics are supported. |
Methods
Name | Description |
---|---|
SHUFFLE(int, int, int, int) | Return a shuffle immediate suitable for use with shuffle_ps and similar instructions. |
TRANSPOSE4_PS(ref v128, ref v128, ref v128, ref v128) | Transposes a 4x4 matrix of single precision floating point values (_MM_TRANSPOSE4_PS). |
add_ps(v128, v128) | Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". |
add_ss(v128, v128) | Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
and_ps(v128, v128) | Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". |
andnot_ps(v128, v128) | Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". |
cmpeq_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst". |
cmpeq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpge_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst". |
cmpge_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpgt_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst". |
cmpgt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmple_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst". |
cmple_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmplt_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst". |
cmplt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpneq_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst". |
cmpneq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpnge_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst". |
cmpnge_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpngt_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst". |
cmpngt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpnle_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst". |
cmpnle_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpnlt_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst". |
cmpnlt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpord_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst". |
cmpord_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cmpunord_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst". |
cmpunord_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
comieq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). |
comige_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). |
comigt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). |
comile_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). |
comilt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). |
comineq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). |
cvt_ss2si(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". Follows standard of rounding to nearest, and for midpoint rounding it rounds to even. |
cvtsi32_ss(v128, int) | Convert the 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cvtsi64_ss(v128, long) | Convert the 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
cvtss_f32(v128) | Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". |
cvtss_si32(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". |
cvtss_si64(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". Follows standard of rounding to nearest, and for midpoint rounding it rounds to even. |
cvtt_ss2si(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". |
cvttss_si32(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". |
cvttss_si64(v128) | Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". |
div_ps(v128, v128) | Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". |
div_ss(v128, v128) | Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
load_ps(void*) | Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into dst. |
loadu_ps(void*) | Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr does not need to be aligned on any particular boundary. |
loadu_si16(void*) | Load unaligned 16-bit integer from memory into the first element of dst. |
loadu_si64(void*) | Load unaligned 64-bit integer from memory into the first element of dst. |
max_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". |
max_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". |
min_ps(v128, v128) | Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". |
min_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". |
move_ss(v128, v128) | Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 elements from "a" to the upper elements of "dst". |
movehl_ps(v128, v128) | Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst". |
movelh_ps(v128, v128) | Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst". |
movemask_ps(v128) | Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a". |
mul_ps(v128, v128) | Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". |
mul_ss(v128, v128) | Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
or_ps(v128, v128) | Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". |
rcp_ps(v128) | Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. |
rcp_ss(v128) | Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. |
rsqrt_ps(v128) | Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. |
rsqrt_ss(v128) | Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. |
set1_ps(float) | Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". |
set_ps(float, float, float, float) | Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. |
set_ps1(float) | Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". |
set_ss(float) | Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements. |
setr_ps(float, float, float, float) | Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. |
setzero_ps() | Return vector of type v128 with all elements set to zero. |
shuffle_ps(v128, v128, int) | Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst". |
sqrt_ps(v128) | Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". |
sqrt_ss(v128) | Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
store_ps(void*, v128) | Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. |
storeu_ps(void*, v128) | Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary. |
storeu_si16(void*, v128) | Store 16-bit integer from the first element of a into memory. mem_addr does not need to be aligned on any particular boundary. |
storeu_si64(void*, v128) | Store 64-bit integer from the first element of a into memory. mem_addr does not need to be aligned on any particular boundary. |
stream_ps(void*, v128) | Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception will be generated. |
sub_ps(v128, v128) | Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". |
sub_ss(v128, v128) | Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". |
ucomieq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
ucomige_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
ucomigt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
ucomile_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
ucomilt_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
ucomineq_ss(v128, v128) | Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. |
unpackhi_ps(v128, v128) | Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst". |
unpacklo_ps(v128, v128) | Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst". |
xor_ps(v128, v128) | Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". |