Class X86.Avx
AVX intrinsics
Inherited Members
Namespace: Unity.Burst.Intrinsics
Assembly: Unity.Burst.dll
Syntax
public static class X86.Avx
Properties
Name | Description |
---|---|
IsAvxSupported | Evaluates to true at compile time if AVX intrinsics are supported. |
Methods
Name | Description |
---|---|
broadcast_ss(void*) | Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst. |
cmp_pd(v128, v128, int) | Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst. |
cmp_ps(v128, v128, int) | Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst. |
cmp_sd(v128, v128, int) | Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. |
cmp_ss(v128, v128, int) | Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. |
maskload_pd(void*, v128) | Load packed double-precision (64-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set). |
maskload_ps(void*, v128) | Load packed single-precision (32-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set). |
maskstore_pd(void*, v128, v128) | Store packed double-precision (64-bit) floating-point elements from a into memory using mask. |
maskstore_ps(void*, v128, v128) | Store packed single-precision (32-bit) floating-point elements from a into memory using mask. |
mm256_add_pd(v256, v256) | Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. |
mm256_add_ps(v256, v256) | Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. |
mm256_addsub_pd(v256, v256) | Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b, and store the results in dst. |
mm256_addsub_ps(v256, v256) | Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b, and store the results in dst. |
mm256_and_pd(v256, v256) | Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. |
mm256_and_ps(v256, v256) | Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. |
mm256_andnot_pd(v256, v256) | Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b, and store the results in dst. |
mm256_andnot_ps(v256, v256) | Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b, and store the results in dst. |
mm256_blend_pd(v256, v256, int) | Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8, and store the results in dst. |
mm256_blend_ps(v256, v256, int) | Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8, and store the results in dst. |
mm256_blendv_pd(v256, v256, v256) | Blend packed double-precision (64-bit) floating-point elements from a and b using mask, and store the results in dst. |
mm256_blendv_ps(v256, v256, v256) | Blend packed single-precision (32-bit) floating-point elements from a and b using mask, and store the results in dst. |
mm256_broadcast_pd(void*) | Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of dst. |
mm256_broadcast_ps(void*) | Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of dst. |
mm256_broadcast_sd(void*) | Broadcast a double-precision (64-bit) floating-point element from memory to all elements of dst. |
mm256_broadcast_ss(void*) | Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst. |
mm256_castpd128_pd256(v128) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castpd256_pd128(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castpd_ps(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castpd_si256(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castps128_ps256(v128) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castps256_ps128(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castps_pd(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castps_si256(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castsi128_si256(v128) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castsi256_pd(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castsi256_ps(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_castsi256_si128(v256) | For compatibility with C++ code only. This is a no-op in Burst. |
mm256_ceil_pd(v256) | Round the packed double-precision (64-bit) floating-point elements in a up to an integer value, and store the results as packed double-precision floating-point elements in dst. |
mm256_ceil_ps(v256) | Round the packed single-precision (32-bit) floating-point elements in a up to an integer value, and store the results as packed single-precision floating-point elements in dst. |
mm256_cmp_pd(v256, v256, int) | Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst. |
mm256_cmp_ps(v256, v256, int) | Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst. |
mm256_cvtepi32_pd(v128) | Convert packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. |
mm256_cvtepi32_ps(v256) | Convert packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. |
mm256_cvtpd_epi32(v256) | Convert packed double-precision(64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst. |
mm256_cvtpd_ps(v256) | Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. |
mm256_cvtps_epi32(v256) | Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst. |
mm256_cvtps_pd(v128) | Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. |
mm256_cvtss_f32(v256) | Copy the lower single-precision (32-bit) floating-point element of a to dst. |
mm256_cvttpd_epi32(v256) | Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst. |
mm256_cvttps_epi32(v256) | Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst. |
mm256_div_pd(v256, v256) | Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst. |
mm256_div_ps(v256, v256) | Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst. |
mm256_dp_ps(v256, v256, int) | Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum in dst using the low 4 bits of imm8. |
mm256_extract_epi32(v256, int) | Extract a 32-bit integer from a, selected with index (which must be a constant), and store the result in dst. |
mm256_extract_epi64(v256, int) | Extract a 64-bit integer from a, selected with index (which must be a constant), and store the result in dst. |
mm256_extractf128_pd(v256, int) | Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the result in dst. |
mm256_extractf128_ps(v256, int) | Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst. |
mm256_extractf128_si256(v256, int) | Extract 128 bits (composed of integer data) from a, selected with imm8, and store the result in dst. |
mm256_floor_pd(v256) | Round the packed double-precision (64-bit) floating-point elements in a down to an integer value, and store the results as packed double-precision floating-point elements in dst. |
mm256_floor_ps(v256) | Round the packed single-precision (32-bit) floating-point elements in a down to an integer value, and store the results as packed single-precision floating-point elements in dst. |
mm256_hadd_pd(v256, v256) | Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst. |
mm256_hadd_ps(v256, v256) | Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst. |
mm256_hsub_pd(v256, v256) | Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst. |
mm256_hsub_ps(v256, v256) | Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst. |
mm256_insert_epi16(v256, int, int) | Copy a to dst, and insert the 16-bit integer i into dst at the location specified by index (which must be a constant). |
mm256_insert_epi32(v256, int, int) | Copy a to dst, and insert the 32-bit integer i into dst at the location specified by index (which must be a constant). |
mm256_insert_epi64(v256, long, int) | Copy a to dst, and insert the 64-bit integer i into dst at the location specified by index (which must be a constant). |
mm256_insert_epi8(v256, int, int) | Copy a to dst, and insert the 8-bit integer i into dst at the location specified by index (which must be a constant). |
mm256_insertf128_pd(v256, v128, int) | Copy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm8. |
mm256_insertf128_ps(v256, v128, int) | Copy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8. |
mm256_insertf128_si256(v256, v128, int) | Copy a to dst, then insert 128 bits of integer data from b into dst at the location specified by imm8. |
mm256_lddqu_si256(void*) | Load 256-bits of integer data from unaligned memory into dst. This intrinsic may perform better than mm256_loadu_si256 when the data crosses a cache line boundary. |
mm256_load_pd(void*) | Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory |
mm256_load_ps(void*) | Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory |
mm256_load_si256(void*) | Load 256-bits (composed of 8 packed 32-bit integers elements) from memory |
mm256_loadu2_m128(void*, void*) | Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_loadu2_m128d(void*, void*) | Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_loadu2_m128i(void*, void*) | Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_loadu_pd(void*) | Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory |
mm256_loadu_ps(void*) | Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory |
mm256_loadu_si256(void*) | Load 256-bits (composed of 8 packed 32-bit integers elements) from memory |
mm256_maskload_pd(void*, v256) | Load packed double-precision (64-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set). |
mm256_maskload_ps(void*, v256) | Load packed single-precision (32-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set). |
mm256_maskstore_pd(void*, v256, v256) | Store packed double-precision (64-bit) floating-point elements from a into memory using mask. |
mm256_maskstore_ps(void*, v256, v256) | Store packed single-precision (32-bit) floating-point elements from a into memory using mask. |
mm256_max_pd(v256, v256) | Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst. |
mm256_max_ps(v256, v256) | Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst. |
mm256_min_pd(v256, v256) | Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst. |
mm256_min_ps(v256, v256) | Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst. |
mm256_movedup_pd(v256) | Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst. |
mm256_movehdup_ps(v256) | Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst. |
mm256_moveldup_ps(v256) | Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst. |
mm256_movemask_pd(v256) | Set each bit of mask dst based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a. |
mm256_movemask_ps(v256) | Set each bit of mask dst based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a. |
mm256_mul_pd(v256, v256) | Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. |
mm256_mul_ps(v256, v256) | Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. |
mm256_or_pd(v256, v256) | Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. |
mm256_or_ps(v256, v256) | Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. |
mm256_permute2f128_pd(v256, v256, int) | Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst. |
mm256_permute2f128_ps(v256, v256, int) | Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst. |
mm256_permute2f128_si256(v256, v256, int) | Shuffle 128-bits (composed of integer data) selected by imm8 from a and b, and store the results in dst. |
mm256_permute_pd(v256, int) | Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst. |
mm256_permute_ps(v256, int) | Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst. |
mm256_permutevar_pd(v256, v256) | Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst. |
mm256_permutevar_ps(v256, v256) | Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst. |
mm256_rcp_ps(v256) | Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12. |
mm256_round_pd(v256, int) | Round the packed double-precision (64-bit) floating-point elements in a using the rounding parameter, and store the results as packed double-precision floating-point elements in dst. |
mm256_round_ps(v256, int) | Round the packed single-precision (32-bit) floating-point elements in a using the rounding parameter, and store the results as packed single-precision floating-point elements in dst. |
mm256_rsqrt_ps(v256) | Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12. |
mm256_set1_epi16(short) | Broadcast 16-bit integer a to all all elements of dst. This intrinsic may generate the vpbroadcastw instruction. |
mm256_set1_epi32(int) | Broadcast 32-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastd instruction. |
mm256_set1_epi64x(long) | Broadcast 64-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastq instruction. |
mm256_set1_epi8(byte) | Broadcast 8-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastb instruction. |
mm256_set1_pd(double) | Broadcast double-precision (64-bit) floating-point value a to all elements of dst. |
mm256_set1_ps(float) | Broadcast single-precision (32-bit) floating-point value a to all elements of dst. |
mm256_set_epi16(short, short, short, short, short, short, short, short, short, short, short, short, short, short, short, short) | Set packed short elements in dst with the supplied values. |
mm256_set_epi32(int, int, int, int, int, int, int, int) | Set packed int elements in dst with the supplied values. |
mm256_set_epi64x(long, long, long, long) | Set packed 64-bit integers in dst with the supplied values. |
mm256_set_epi8(byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte) | Set packed byte elements in dst with the supplied values. |
mm256_set_m128(v128, v128) | Set packed __m256 vector dst with the supplied values. |
mm256_set_m128d(v128, v128) | Set packed v256 vector with the supplied values. |
mm256_set_m128i(v128, v128) | Set packed v256 vector with the supplied values. |
mm256_set_pd(double, double, double, double) | Set packed double-precision (64-bit) floating-point elements in dst with the supplied values. |
mm256_set_ps(float, float, float, float, float, float, float, float) | Set packed single-precision (32-bit) floating-point elements in dst with the supplied values. |
mm256_setr_epi16(short, short, short, short, short, short, short, short, short, short, short, short, short, short, short, short) | Set packed short elements in dst with the supplied values in reverse order. |
mm256_setr_epi32(int, int, int, int, int, int, int, int) | Set packed int elements in dst with the supplied values in reverse order. |
mm256_setr_epi64x(long, long, long, long) | Set packed 64-bit integers in dst with the supplied values in reverse order. |
mm256_setr_epi8(byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte) | Set packed byte elements in dst with the supplied values in reverse order. |
mm256_setr_m128(v128, v128) | Set packed v256 vector with the supplied values in reverse order. |
mm256_setr_m128d(v128, v128) | Set packed v256 vector with the supplied values in reverse order. |
mm256_setr_m128i(v128, v128) | Set packed v256 vector with the supplied values in reverse order. |
mm256_setr_pd(double, double, double, double) | Set packed double-precision (64-bit) floating-point elements in dst with the supplied values in reverse order. |
mm256_setr_ps(float, float, float, float, float, float, float, float) | Set packed single-precision (32-bit) floating-point elements in dst with the supplied values in reverse order. |
mm256_setzero_pd() | Return Vector with all elements set to zero. |
mm256_setzero_ps() | Return Vector with all elements set to zero. |
mm256_setzero_si256() | Return Vector with all elements set to zero. |
mm256_shuffle_pd(v256, v256, int) | Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst. |
mm256_shuffle_ps(v256, v256, int) | Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst. |
mm256_sqrt_pd(v256) | Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. |
mm256_sqrt_ps(v256) | Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. |
mm256_store_pd(void*, v256) | Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory |
mm256_store_ps(void*, v256) | Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory |
mm256_store_si256(void*, v256) | Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory |
mm256_storeu2_m128(void*, void*, v256) | Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_storeu2_m128d(void*, void*, v256) | Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_storeu2_m128i(void*, void*, v256) | Store the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary. |
mm256_storeu_pd(void*, v256) | Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory |
mm256_storeu_ps(void*, v256) | Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory |
mm256_storeu_si256(void*, v256) | Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory |
mm256_stream_pd(void*, v256) | Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. |
mm256_stream_ps(void*, v256) | Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. |
mm256_stream_si256(void*, v256) | Store 256-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. |
mm256_sub_pd(v256, v256) | Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst. |
mm256_sub_ps(v256, v256) | Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst. |
mm256_testc_pd(v256, v256) | Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value. |
mm256_testc_ps(v256, v256) | Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value. |
mm256_testc_si256(v256, v256) | Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value. |
mm256_testnzc_pd(v256, v256) | Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0. |
mm256_testnzc_ps(v256, v256) | Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0. |
mm256_testnzc_si256(v256, v256) | Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0. |
mm256_testz_pd(v256, v256) | Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value. |
mm256_testz_ps(v256, v256) | Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value. |
mm256_testz_si256(v256, v256) | Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the ZF value. |
mm256_undefined_pd() | Return a 256-bit vector with undefined contents. |
mm256_undefined_ps() | Return a 256-bit vector with undefined contents. |
mm256_undefined_si256() | Return a 256-bit vector with undefined contents. |
mm256_unpackhi_pd(v256, v256) | Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst. |
mm256_unpackhi_ps(v256, v256) | Unpack and interleave single-precision(32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst. |
mm256_unpacklo_pd(v256, v256) | Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst. |
mm256_unpacklo_ps(v256, v256) | Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst. |
mm256_xor_pd(v256, v256) | Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. |
mm256_xor_ps(v256, v256) | Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. |
mm256_zeroall() | Zeros the contents of all YMM registers |
mm256_zeroupper() | Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. |
mm256_zextpd128_pd256(v128) | Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
mm256_zextps128_ps256(v128) | Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
mm256_zextsi128_si256(v128) | Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. |
permute_pd(v128, int) | Shuffle double-precision (64-bit) floating-point elements in a using the control in imm8, and store the results in dst. |
permute_ps(v128, int) | Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst. |
permutevar_pd(v128, v128) | Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and store the results in dst. |
permutevar_ps(v128, v128) | Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and store the results in dst. |
testc_pd(v128, v128) | Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value. |
testc_ps(v128, v128) | Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value. |
testnzc_pd(v128, v128) | Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0. |
testnzc_ps(v128, v128) | Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0. |
testz_pd(v128, v128) | Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value. |
testz_ps(v128, v128) | Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value. |
undefined_pd() | Return a 128-bit vector with undefined contents. |
undefined_ps() | Return a 128-bit vector with undefined contents. |
undefined_si128() | Return a 128-bit vector with undefined contents. |