Name	Description
broadcast_ss(void*)	Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst.
cmp_pd(v128, v128, int)	Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
cmp_ps(v128, v128, int)	Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
cmp_sd(v128, v128, int)	Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
cmp_ss(v128, v128, int)	Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
maskload_pd(void*, v128)	Load packed double-precision (64-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set).
maskload_ps(void*, v128)	Load packed single-precision (32-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set).
maskstore_pd(void*, v128, v128)	Store packed double-precision (64-bit) floating-point elements from a into memory using mask.
maskstore_ps(void*, v128, v128)	Store packed single-precision (32-bit) floating-point elements from a into memory using mask.
mm256_add_pd(v256, v256)	Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
mm256_add_ps(v256, v256)	Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
mm256_addsub_pd(v256, v256)	Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b, and store the results in dst.
mm256_addsub_ps(v256, v256)	Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b, and store the results in dst.
mm256_and_pd(v256, v256)	Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
mm256_and_ps(v256, v256)	Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
mm256_andnot_pd(v256, v256)	Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b, and store the results in dst.
mm256_andnot_ps(v256, v256)	Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b, and store the results in dst.
mm256_blend_pd(v256, v256, int)	Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8, and store the results in dst.
mm256_blend_ps(v256, v256, int)	Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8, and store the results in dst.
mm256_blendv_pd(v256, v256, v256)	Blend packed double-precision (64-bit) floating-point elements from a and b using mask, and store the results in dst.
mm256_blendv_ps(v256, v256, v256)	Blend packed single-precision (32-bit) floating-point elements from a and b using mask, and store the results in dst.
mm256_broadcast_pd(void*)	Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of dst.
mm256_broadcast_ps(void*)	Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of dst.
mm256_broadcast_sd(void*)	Broadcast a double-precision (64-bit) floating-point element from memory to all elements of dst.
mm256_broadcast_ss(void*)	Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst.
mm256_castpd128_pd256(v128)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castpd256_pd128(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castpd_ps(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castpd_si256(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castps128_ps256(v128)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castps256_ps128(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castps_pd(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castps_si256(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castsi128_si256(v128)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castsi256_pd(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castsi256_ps(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_castsi256_si128(v256)	For compatibility with C++ code only. This is a no-op in Burst.
mm256_ceil_pd(v256)	Round the packed double-precision (64-bit) floating-point elements in a up to an integer value, and store the results as packed double-precision floating-point elements in dst.
mm256_ceil_ps(v256)	Round the packed single-precision (32-bit) floating-point elements in a up to an integer value, and store the results as packed single-precision floating-point elements in dst.
mm256_cmp_pd(v256, v256, int)	Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
mm256_cmp_ps(v256, v256, int)	Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
mm256_cvtepi32_pd(v128)	Convert packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
mm256_cvtepi32_ps(v256)	Convert packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
mm256_cvtpd_epi32(v256)	Convert packed double-precision(64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
mm256_cvtpd_ps(v256)	Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
mm256_cvtps_epi32(v256)	Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
mm256_cvtps_pd(v128)	Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
mm256_cvtss_f32(v256)	Copy the lower single-precision (32-bit) floating-point element of a to dst.
mm256_cvttpd_epi32(v256)	Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
mm256_cvttps_epi32(v256)	Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
mm256_div_pd(v256, v256)	Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst.
mm256_div_ps(v256, v256)	Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.
mm256_dp_ps(v256, v256, int)	Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum in dst using the low 4 bits of imm8.
mm256_extract_epi32(v256, int)	Extract a 32-bit integer from a, selected with index (which must be a constant), and store the result in dst.
mm256_extract_epi64(v256, int)	Extract a 64-bit integer from a, selected with index (which must be a constant), and store the result in dst.
mm256_extractf128_pd(v256, int)	Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
mm256_extractf128_ps(v256, int)	Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
mm256_extractf128_si256(v256, int)	Extract 128 bits (composed of integer data) from a, selected with imm8, and store the result in dst.
mm256_floor_pd(v256)	Round the packed double-precision (64-bit) floating-point elements in a down to an integer value, and store the results as packed double-precision floating-point elements in dst.
mm256_floor_ps(v256)	Round the packed single-precision (32-bit) floating-point elements in a down to an integer value, and store the results as packed single-precision floating-point elements in dst.
mm256_hadd_pd(v256, v256)	Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst.
mm256_hadd_ps(v256, v256)	Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst.
mm256_hsub_pd(v256, v256)	Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst.
mm256_hsub_ps(v256, v256)	Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst.
mm256_insert_epi16(v256, int, int)	Copy a to dst, and insert the 16-bit integer i into dst at the location specified by index (which must be a constant).
mm256_insert_epi32(v256, int, int)	Copy a to dst, and insert the 32-bit integer i into dst at the location specified by index (which must be a constant).
mm256_insert_epi64(v256, long, int)	Copy a to dst, and insert the 64-bit integer i into dst at the location specified by index (which must be a constant).
mm256_insert_epi8(v256, int, int)	Copy a to dst, and insert the 8-bit integer i into dst at the location specified by index (which must be a constant).
mm256_insertf128_pd(v256, v128, int)	Copy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm8.
mm256_insertf128_ps(v256, v128, int)	Copy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
mm256_insertf128_si256(v256, v128, int)	Copy a to dst, then insert 128 bits of integer data from b into dst at the location specified by imm8.
mm256_lddqu_si256(void*)	Load 256-bits of integer data from unaligned memory into dst. This intrinsic may perform better than mm256_loadu_si256 when the data crosses a cache line boundary.
mm256_load_pd(void*)	Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
mm256_load_ps(void*)	Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
mm256_load_si256(void*)	Load 256-bits (composed of 8 packed 32-bit integers elements) from memory
mm256_loadu2_m128(void, void)	Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_loadu2_m128d(void, void)	Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_loadu2_m128i(void, void)	Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value in dst. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_loadu_pd(void*)	Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory
mm256_loadu_ps(void*)	Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
mm256_loadu_si256(void*)	Load 256-bits (composed of 8 packed 32-bit integers elements) from memory
mm256_maskload_pd(void*, v256)	Load packed double-precision (64-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set).
mm256_maskload_ps(void*, v256)	Load packed single-precision (32-bit) floating-point elements from memory into dst using mask (elements are zeroed out when the high bit of the corresponding element is not set).
mm256_maskstore_pd(void*, v256, v256)	Store packed double-precision (64-bit) floating-point elements from a into memory using mask.
mm256_maskstore_ps(void*, v256, v256)	Store packed single-precision (32-bit) floating-point elements from a into memory using mask.
mm256_max_pd(v256, v256)	Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
mm256_max_ps(v256, v256)	Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
mm256_min_pd(v256, v256)	Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
mm256_min_ps(v256, v256)	Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
mm256_movedup_pd(v256)	Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst.
mm256_movehdup_ps(v256)	Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
mm256_moveldup_ps(v256)	Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
mm256_movemask_pd(v256)	Set each bit of mask dst based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.
mm256_movemask_ps(v256)	Set each bit of mask dst based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
mm256_mul_pd(v256, v256)	Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
mm256_mul_ps(v256, v256)	Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
mm256_or_pd(v256, v256)	Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
mm256_or_ps(v256, v256)	Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
mm256_permute2f128_pd(v256, v256, int)	Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
mm256_permute2f128_ps(v256, v256, int)	Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
mm256_permute2f128_si256(v256, v256, int)	Shuffle 128-bits (composed of integer data) selected by imm8 from a and b, and store the results in dst.
mm256_permute_pd(v256, int)	Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_permute_ps(v256, int)	Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_permutevar_pd(v256, v256)	Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
mm256_permutevar_ps(v256, v256)	Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
mm256_rcp_ps(v256)	Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
mm256_round_pd(v256, int)	Round the packed double-precision (64-bit) floating-point elements in a using the rounding parameter, and store the results as packed double-precision floating-point elements in dst.
mm256_round_ps(v256, int)	Round the packed single-precision (32-bit) floating-point elements in a using the rounding parameter, and store the results as packed single-precision floating-point elements in dst.
mm256_rsqrt_ps(v256)	Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 1.5*2^-12.
mm256_set1_epi16(short)	Broadcast 16-bit integer a to all all elements of dst. This intrinsic may generate the vpbroadcastw instruction.
mm256_set1_epi32(int)	Broadcast 32-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastd instruction.
mm256_set1_epi64x(long)	Broadcast 64-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastq instruction.
mm256_set1_epi8(byte)	Broadcast 8-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastb instruction.
mm256_set1_pd(double)	Broadcast double-precision (64-bit) floating-point value a to all elements of dst.
mm256_set1_ps(float)	Broadcast single-precision (32-bit) floating-point value a to all elements of dst.
mm256_set_epi16(short, short, short, short, short, short, short, short, short, short, short, short, short, short, short, short)	Set packed short elements in dst with the supplied values.
mm256_set_epi32(int, int, int, int, int, int, int, int)	Set packed int elements in dst with the supplied values.
mm256_set_epi64x(long, long, long, long)	Set packed 64-bit integers in dst with the supplied values.
mm256_set_epi8(byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte)	Set packed byte elements in dst with the supplied values.
mm256_set_m128(v128, v128)	Set packed __m256 vector dst with the supplied values.
mm256_set_m128d(v128, v128)	Set packed v256 vector with the supplied values.
mm256_set_m128i(v128, v128)	Set packed v256 vector with the supplied values.
mm256_set_pd(double, double, double, double)	Set packed double-precision (64-bit) floating-point elements in dst with the supplied values.
mm256_set_ps(float, float, float, float, float, float, float, float)	Set packed single-precision (32-bit) floating-point elements in dst with the supplied values.
mm256_setr_epi16(short, short, short, short, short, short, short, short, short, short, short, short, short, short, short, short)	Set packed short elements in dst with the supplied values in reverse order.
mm256_setr_epi32(int, int, int, int, int, int, int, int)	Set packed int elements in dst with the supplied values in reverse order.
mm256_setr_epi64x(long, long, long, long)	Set packed 64-bit integers in dst with the supplied values in reverse order.
mm256_setr_epi8(byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte)	Set packed byte elements in dst with the supplied values in reverse order.
mm256_setr_m128(v128, v128)	Set packed v256 vector with the supplied values in reverse order.
mm256_setr_m128d(v128, v128)	Set packed v256 vector with the supplied values in reverse order.
mm256_setr_m128i(v128, v128)	Set packed v256 vector with the supplied values in reverse order.
mm256_setr_pd(double, double, double, double)	Set packed double-precision (64-bit) floating-point elements in dst with the supplied values in reverse order.
mm256_setr_ps(float, float, float, float, float, float, float, float)	Set packed single-precision (32-bit) floating-point elements in dst with the supplied values in reverse order.
mm256_setzero_pd()	Return Vector with all elements set to zero.
mm256_setzero_ps()	Return Vector with all elements set to zero.
mm256_setzero_si256()	Return Vector with all elements set to zero.
mm256_shuffle_pd(v256, v256, int)	Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_shuffle_ps(v256, v256, int)	Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_sqrt_pd(v256)	Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
mm256_sqrt_ps(v256)	Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
mm256_store_pd(void*, v256)	Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory
mm256_store_ps(void*, v256)	Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory
mm256_store_si256(void*, v256)	Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory
mm256_storeu2_m128(void, void, v256)	Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_storeu2_m128d(void, void, v256)	Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_storeu2_m128i(void, void, v256)	Store the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
mm256_storeu_pd(void*, v256)	Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory
mm256_storeu_ps(void*, v256)	Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory
mm256_storeu_si256(void*, v256)	Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory
mm256_stream_pd(void*, v256)	Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
mm256_stream_ps(void*, v256)	Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
mm256_stream_si256(void*, v256)	Store 256-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
mm256_sub_pd(v256, v256)	Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
mm256_sub_ps(v256, v256)	Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
mm256_testc_pd(v256, v256)	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
mm256_testc_ps(v256, v256)	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
mm256_testc_si256(v256, v256)	Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value.
mm256_testnzc_pd(v256, v256)	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
mm256_testnzc_ps(v256, v256)	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
mm256_testnzc_si256(v256, v256)	Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
mm256_testz_pd(v256, v256)	Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
mm256_testz_ps(v256, v256)	Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
mm256_testz_si256(v256, v256)	Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the ZF value.
mm256_undefined_pd()	Return a 256-bit vector with undefined contents.
mm256_undefined_ps()	Return a 256-bit vector with undefined contents.
mm256_undefined_si256()	Return a 256-bit vector with undefined contents.
mm256_unpackhi_pd(v256, v256)	Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_ps(v256, v256)	Unpack and interleave single-precision(32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_pd(v256, v256)	Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_ps(v256, v256)	Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_xor_pd(v256, v256)	Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
mm256_xor_ps(v256, v256)	Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
mm256_zeroall()	Zeros the contents of all YMM registers
mm256_zeroupper()	Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
mm256_zextpd128_pd256(v128)	Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
mm256_zextps128_ps256(v128)	Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
mm256_zextsi128_si256(v128)	Casts vector of type v128 to type v256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
permute_pd(v128, int)	Shuffle double-precision (64-bit) floating-point elements in a using the control in imm8, and store the results in dst.
permute_ps(v128, int)	Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst.
permutevar_pd(v128, v128)	Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and store the results in dst.
permutevar_ps(v128, v128)	Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and store the results in dst.
testc_pd(v128, v128)	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
testc_ps(v128, v128)	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
testnzc_pd(v128, v128)	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
testnzc_ps(v128, v128)	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
testz_pd(v128, v128)	Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
testz_ps(v128, v128)	Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
undefined_pd()	Return a 128-bit vector with undefined contents.
undefined_ps()	Return a 128-bit vector with undefined contents.
undefined_si128()	Return a 128-bit vector with undefined contents.