| Method |
Description |
| broadcast_ss |
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst.
|
| cmp_pd |
Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
|
| cmp_ps |
Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
|
| cmp_sd |
Compare the lower double-precision (64-bit) floating-point
element in a and b based on the comparison operand specified by
imm8, store the result in the lower element of dst, and copy
the upper element from a to the upper element of dst.
|
| cmp_ss |
Compare the lower single-precision (32-bit) floating-point
element in a and b based on the comparison operand specified by
imm8, store the result in the lower element of dst, and copy
the upper 3 packed elements from a to the upper elements of
dst.
|
| maskload_pd |
Load packed double-precision (64-bit) floating-point elements
from memory into dst using mask (elements are zeroed out when
the high bit of the corresponding element is not set).
|
| maskload_ps |
Load packed single-precision (32-bit) floating-point elements
from memory into dst using mask (elements are zeroed out when
the high bit of the corresponding element is not set).
|
| maskstore_pd |
Store packed double-precision (64-bit) floating-point elements from a into memory using mask.
|
| maskstore_ps |
Store packed single-precision (32-bit) floating-point elements from a into memory using mask.
|
| mm256_add_pd |
Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_add_ps |
Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_addsub_pd |
Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b, and store the results in dst.
|
| mm256_addsub_ps |
Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b, and store the results in dst.
|
| mm256_and_pd |
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_and_ps |
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_andnot_pd |
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b, and store the results in dst.
|
| mm256_andnot_ps |
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b, and store the results in dst.
|
| mm256_blend_pd |
Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8, and store the results in dst.
|
| mm256_blend_ps |
Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8, and store the results in dst.
|
| mm256_blendv_pd |
Blend packed double-precision (64-bit) floating-point elements from a and b using mask, and store the results in dst.
|
| mm256_blendv_ps |
Blend packed single-precision (32-bit) floating-point elements from a and b using mask, and store the results in dst.
|
| mm256_broadcast_pd |
Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of dst.
|
| mm256_broadcast_ps |
Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of dst.
|
| mm256_broadcast_sd |
Broadcast a double-precision (64-bit) floating-point element from memory to all elements of dst.
|
| mm256_broadcast_ss |
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of dst.
|
| mm256_castpd_ps | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castpd_si256 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castpd128_pd256 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castpd256_pd128 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castps_pd | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castps_si256 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castps128_ps256 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castps256_ps128 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castsi128_si256 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castsi256_pd | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castsi256_ps | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_castsi256_si128 | For compatibility with C++ code only. This is a no-op in Burst. |
| mm256_ceil_pd |
Round the packed double-precision (64-bit) floating-point
elements in a up to an integer value, and store the results as
packed double-precision floating-point elements in dst.
|
| mm256_ceil_ps |
Round the packed single-precision (32-bit) floating-point
elements in a up to an integer value, and store the results as
packed single-precision floating-point elements in dst.
|
| mm256_cmp_pd |
Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
|
| mm256_cmp_ps |
Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in dst.
|
| mm256_cvtepi32_pd |
Convert packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
|
| mm256_cvtepi32_ps |
Convert packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
|
| mm256_cvtpd_epi32 |
Convert packed double-precision(64-bit) floating-point elements
in a to packed 32-bit integers, and store the results in dst.
|
| mm256_cvtpd_ps |
Convert packed double-precision (64-bit) floating-point
elements in a to packed single-precision (32-bit)
floating-point elements, and store the results in dst.
|
| mm256_cvtps_epi32 |
Convert packed single-precision (32-bit) floating-point
elements in a to packed 32-bit integers, and store the results
in dst.
|
| mm256_cvtps_pd |
Convert packed single-precision (32-bit) floating-point
elements in a to packed double-precision (64-bit)
floating-point elements, and store the results in dst.
|
| mm256_cvtss_f32 |
Copy the lower single-precision (32-bit) floating-point element of a to dst.
|
| mm256_cvttpd_epi32 |
Convert packed double-precision (64-bit) floating-point
elements in a to packed 32-bit integers with truncation, and
store the results in dst.
|
| mm256_cvttps_epi32 |
Convert packed single-precision (32-bit) floating-point
elements in a to packed 32-bit integers with truncation, and
store the results in dst.
|
| mm256_div_pd |
Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst.
|
| mm256_div_ps |
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.
|
| mm256_dp_ps |
Conditionally multiply the packed single-precision (32-bit)
floating-point elements in a and b using the high 4 bits in
imm8, sum the four products, and conditionally store the sum in
dst using the low 4 bits of imm8.
|
| mm256_extract_epi32 |
Extract a 32-bit integer from a, selected with index (which must be a constant), and store the result in dst.
|
| mm256_extract_epi64 |
Extract a 64-bit integer from a, selected with index (which must be a constant), and store the result in dst.
|
| mm256_extractf128_pd |
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
|
| mm256_extractf128_ps |
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
|
| mm256_extractf128_si256 |
Extract 128 bits (composed of integer data) from a, selected with imm8, and store the result in dst.
|
| mm256_floor_pd |
Round the packed double-precision (64-bit) floating-point
elements in a down to an integer value, and store the results
as packed double-precision floating-point elements in dst.
|
| mm256_floor_ps |
Round the packed single-precision (32-bit) floating-point
elements in a down to an integer value, and store the results
as packed single-precision floating-point elements in dst.
|
| mm256_hadd_pd |
Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst.
|
| mm256_hadd_ps |
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst.
|
| mm256_hsub_pd |
Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b, and pack the results in dst.
|
| mm256_hsub_ps |
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b, and pack the results in dst.
|
| mm256_insert_epi16 |
Copy a to dst, and insert the 16-bit integer i into dst at the location specified by index (which must be a constant).
|
| mm256_insert_epi32 |
Copy a to dst, and insert the 32-bit integer i into dst at the location specified by index (which must be a constant).
|
| mm256_insert_epi64 |
Copy a to dst, and insert the 64-bit integer i into dst at the location specified by index (which must be a constant).
|
| mm256_insert_epi8 |
Copy a to dst, and insert the 8-bit integer i into dst at the location specified by index (which must be a constant).
|
| mm256_insertf128_pd |
Copy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm8.
|
| mm256_insertf128_ps |
Copy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
|
| mm256_insertf128_si256 |
Copy a to dst, then insert 128 bits of integer data from b into dst at the location specified by imm8.
|
| mm256_lddqu_si256 |
Load 256-bits of integer data from unaligned memory into dst.
This intrinsic may perform better than mm256_loadu_si256 when
the data crosses a cache line boundary.
|
| mm256_load_pd |
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
|
| mm256_load_ps |
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
|
| mm256_load_si256 |
Load 256-bits (composed of 8 packed 32-bit integers elements) from memory
|
| mm256_loadu_pd |
Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory
|
| mm256_loadu_ps |
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory
|
| mm256_loadu_si256 |
Load 256-bits (composed of 8 packed 32-bit integers elements) from memory
|
| mm256_loadu2_m128 |
Load two 128-bit values (composed of 4 packed single-precision
(32-bit) floating-point elements) from memory, and combine them
into a 256-bit value in dst. hiaddr and loaddr do not need to
be aligned on any particular boundary.
|
| mm256_loadu2_m128d |
Load two 128-bit values (composed of 2 packed double-precision
(64-bit) floating-point elements) from memory, and combine them
into a 256-bit value in dst. hiaddr and loaddr do not need to
be aligned on any particular boundary.
|
| mm256_loadu2_m128i |
Load two 128-bit values (composed of integer data) from memory,
and combine them into a 256-bit value in dst. hiaddr and loaddr
do not need to be aligned on any particular boundary.
|
| mm256_maskload_pd |
Load packed double-precision (64-bit) floating-point elements
from memory into dst using mask (elements are zeroed out when
the high bit of the corresponding element is not set).
|
| mm256_maskload_ps |
Load packed single-precision (32-bit) floating-point elements
from memory into dst using mask (elements are zeroed out when
the high bit of the corresponding element is not set).
|
| mm256_maskstore_pd |
Store packed double-precision (64-bit) floating-point elements from a into memory using mask.
|
| mm256_maskstore_ps |
Store packed single-precision (32-bit) floating-point elements from a into memory using mask.
|
| mm256_max_pd |
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
|
| mm256_max_ps |
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
|
| mm256_min_pd |
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
|
| mm256_min_ps |
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
|
| mm256_movedup_pd |
Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst.
|
| mm256_movehdup_ps |
Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
|
| mm256_moveldup_ps |
Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
|
| mm256_movemask_pd |
Set each bit of mask dst based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.
|
| mm256_movemask_ps |
Set each bit of mask dst based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
|
| mm256_mul_pd |
Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_mul_ps |
Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_or_pd |
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_or_ps |
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_permute_pd |
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
|
| mm256_permute_ps |
Shuffle single-precision (32-bit) floating-point elements in a
within 128-bit lanes using the control in imm8, and store the
results in dst.
|
| mm256_permute2f128_pd |
Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
|
| mm256_permute2f128_ps |
Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
|
| mm256_permute2f128_si256 |
Shuffle 128-bits (composed of integer data) selected by imm8 from a and b, and store the results in dst.
|
| mm256_permutevar_pd |
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
|
| mm256_permutevar_ps |
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
|
| mm256_rcp_ps |
Compute the approximate reciprocal of packed single-precision
(32-bit) floating-point elements in a, and store the results in
dst. The maximum relative error for this approximation is less
than 1.5*2^-12.
|
| mm256_round_pd |
Round the packed double-precision (64-bit) floating-point
elements in a using the rounding parameter, and store the
results as packed double-precision floating-point elements in
dst.
|
| mm256_round_ps |
Round the packed single-precision (32-bit) floating-point
elements in a using the rounding parameter, and store the
results as packed single-precision floating-point elements in
dst.
|
| mm256_rsqrt_ps |
Compute the approximate reciprocal square root of packed
single-precision (32-bit) floating-point elements in a, and
store the results in dst. The maximum relative error for this
approximation is less than 1.5*2^-12.
|
| mm256_set_epi16 |
Set packed short elements in dst with the supplied values.
|
| mm256_set_epi32 |
Set packed int elements in dst with the supplied values.
|
| mm256_set_epi64x |
Set packed 64-bit integers in dst with the supplied values.
|
| mm256_set_epi8 |
Set packed byte elements in dst with the supplied values.
|
| mm256_set_m128 |
Set packed __m256 vector dst with the supplied values.
|
| mm256_set_m128d |
Set packed v256 vector with the supplied values.
|
| mm256_set_m128i |
Set packed v256 vector with the supplied values.
|
| mm256_set_pd |
Set packed double-precision (64-bit) floating-point elements in dst with the supplied values.
|
| mm256_set_ps |
Set packed single-precision (32-bit) floating-point elements in dst with the supplied values.
|
| mm256_set1_epi16 |
Broadcast 16-bit integer a to all all elements of dst. This intrinsic may generate the vpbroadcastw instruction.
|
| mm256_set1_epi32 |
Broadcast 32-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastd instruction.
|
| mm256_set1_epi64x |
Broadcast 64-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastq instruction.
|
| mm256_set1_epi8 |
Broadcast 8-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastb instruction.
|
| mm256_set1_pd |
Broadcast double-precision (64-bit) floating-point value a to all elements of dst.
|
| mm256_set1_ps |
Broadcast single-precision (32-bit) floating-point value a to all elements of dst.
|
| mm256_setr_epi16 |
Set packed short elements in dst with the supplied values in reverse order.
|
| mm256_setr_epi32 |
Set packed int elements in dst with the supplied values in reverse order.
|
| mm256_setr_epi64x |
Set packed 64-bit integers in dst with the supplied values in reverse order.
|
| mm256_setr_epi8 |
Set packed byte elements in dst with the supplied values in reverse order.
|
| mm256_setr_m128 |
Set packed v256 vector with the supplied values in reverse order.
|
| mm256_setr_m128d |
Set packed v256 vector with the supplied values in reverse order.
|
| mm256_setr_m128i |
Set packed v256 vector with the supplied values in reverse order.
|
| mm256_setr_pd |
Set packed double-precision (64-bit) floating-point elements in dst with the supplied values in reverse order.
|
| mm256_setr_ps |
Set packed single-precision (32-bit) floating-point elements in dst with the supplied values in reverse order.
|
| mm256_setzero_pd |
Return Vector with all elements set to zero.
|
| mm256_setzero_ps |
Return Vector with all elements set to zero.
|
| mm256_setzero_si256 |
Return Vector with all elements set to zero.
|
| mm256_shuffle_pd |
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst.
|
| mm256_shuffle_ps |
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
|
| mm256_sqrt_pd |
Compute the square root of packed double-precision (64-bit)
floating-point elements in a, and store the results in dst.
|
| mm256_sqrt_ps |
Compute the square root of packed single-precision (32-bit)
floating-point elements in a, and store the results in dst.
|
| mm256_store_pd |
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory
|
| mm256_store_ps |
Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory
|
| mm256_store_si256 |
Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory
|
| mm256_storeu_pd |
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory
|
| mm256_storeu_ps |
Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory
|
| mm256_storeu_si256 |
Store 256-bits (composed of 8 packed 32-bit integer elements) from a into memory
|
| mm256_storeu2_m128 |
Store the high and low 128-bit halves (each composed of 4
packed single-precision (32-bit) floating-point elements) from
a into memory two different 128-bit locations. hiaddr and
loaddr do not need to be aligned on any particular boundary.
|
| mm256_storeu2_m128d |
Store the high and low 128-bit halves (each composed of 2
packed double-precision (64-bit) floating-point elements) from
a into memory two different 128-bit locations. hiaddr and
loaddr do not need to be aligned on any particular boundary.
|
| mm256_storeu2_m128i |
Store the high and low 128-bit halves (each composed of integer
data) from a into memory two different 128-bit locations. hiaddr
and loaddr do not need to be aligned on any particular boundary.
|
| mm256_stream_pd |
Store 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from a into memory using a
non-temporal memory hint. mem_addr must be aligned on a 32-byte
boundary or a general-protection exception may be generated.
|
| mm256_stream_ps |
Store 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from a into memory using a
non-temporal memory hint. mem_addr must be aligned on a 32-byte
boundary or a general-protection exception may be generated.
|
| mm256_stream_si256 |
Store 256-bits of integer data from a into memory using a
non-temporal memory hint. mem_addr must be aligned on a 32-byte
boundary or a general-protection exception may be generated.
|
| mm256_sub_pd |
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
|
| mm256_sub_ps |
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
|
| mm256_testc_pd |
Compute the bitwise AND of 256 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the CF value.
|
| mm256_testc_ps |
Compute the bitwise AND of 256 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the CF value.
|
| mm256_testc_si256 |
Compute the bitwise AND of 256 bits (representing integer data)
in a and b, and set ZF to 1 if the result is zero, otherwise
set ZF to 0. Compute the bitwise NOT of a and then AND with b,
and set CF to 1 if the result is zero, otherwise set CF to 0.
Return the CF value.
|
| mm256_testnzc_pd |
Compute the bitwise AND of 256 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return 1 if both the ZF
and CF values are zero, otherwise return 0.
|
| mm256_testnzc_ps |
Compute the bitwise AND of 256 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return 1 if both the ZF
and CF values are zero, otherwise return 0.
|
| mm256_testnzc_si256 |
Compute the bitwise AND of 256 bits (representing integer data)
in a and b, and set ZF to 1 if the result is zero, otherwise
set ZF to 0. Compute the bitwise NOT of a and then AND with b,
and set CF to 1 if the result is zero, otherwise set CF to 0.
Return 1 if both the ZF and CF values are zero, otherwise
return 0.
|
| mm256_testz_pd |
Compute the bitwise AND of 256 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the ZF value.
|
| mm256_testz_ps |
Compute the bitwise AND of 256 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 256-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the ZF value.
|
| mm256_testz_si256 |
Compute the bitwise AND of 256 bits (representing integer data)
in a and b, and set ZF to 1 if the result is zero, otherwise
set ZF to 0. Compute the bitwise NOT of a and then AND with b,
and set CF to 1 if the result is zero, otherwise set CF to 0.
Return the ZF value.
|
| mm256_undefined_pd | Return a 256-bit vector with undefined contents. |
| mm256_undefined_ps | Return a 256-bit vector with undefined contents. |
| mm256_undefined_si256 | Return a 256-bit vector with undefined contents. |
| mm256_unpackhi_pd |
Unpack and interleave double-precision (64-bit) floating-point
elements from the high half of each 128-bit lane in a and b,
and store the results in dst.
|
| mm256_unpackhi_ps |
Unpack and interleave single-precision(32-bit) floating-point
elements from the high half of each 128-bit lane in a and b,
and store the results in dst.
|
| mm256_unpacklo_pd |
Unpack and interleave double-precision (64-bit) floating-point
elements from the low half of each 128-bit lane in a and b, and
store the results in dst.
|
| mm256_unpacklo_ps |
Unpack and interleave single-precision (32-bit) floating-point
elements from the low half of each 128-bit lane in a and b, and
store the results in dst.
|
| mm256_xor_pd |
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_xor_ps |
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
|
| mm256_zeroall |
Zeros the contents of all YMM registers
|
| mm256_zeroupper |
Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
|
| mm256_zextpd128_pd256 |
Casts vector of type v128 to type v256; the upper 128 bits of the result
are zeroed. This intrinsic is only used for compilation and does not
generate any instructions, thus it has zero latency.
|
| mm256_zextps128_ps256 |
Casts vector of type v128 to type v256; the upper 128 bits of the result
are zeroed. This intrinsic is only used for compilation and does not
generate any instructions, thus it has zero latency.
|
| mm256_zextsi128_si256 |
Casts vector of type v128 to type v256; the upper 128 bits of the result
are zeroed. This intrinsic is only used for compilation and does not
generate any instructions, thus it has zero latency.
|
| permute_pd |
Shuffle double-precision (64-bit) floating-point elements in a using the control in imm8, and store the results in dst.
|
| permute_ps |
Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst.
|
| permutevar_pd |
Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and store the results in dst.
|
| permutevar_ps |
Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and store the results in dst.
|
| testc_pd |
Compute the bitwise AND of 128 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the CF value.
|
| testc_ps |
Compute the bitwise AND of 128 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the CF value.
|
| testnzc_pd |
Compute the bitwise AND of 128 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return 1 if both the ZF
and CF values are zero, otherwise return 0.
|
| testnzc_ps |
Compute the bitwise AND of 128 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return 1 if both the ZF
and CF values are zero, otherwise return 0.
|
| testz_pd |
Compute the bitwise AND of 128 bits (representing
double-precision (64-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 64-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 64-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the ZF value.
|
| testz_ps |
Compute the bitwise AND of 128 bits (representing
single-precision (32-bit) floating-point elements) in a and b,
producing an intermediate 128-bit value, and set ZF to 1 if the
sign bit of each 32-bit element in the intermediate value is
zero, otherwise set ZF to 0. Compute the bitwise NOT of a and
then AND with b, producing an intermediate value, and set CF to
1 if the sign bit of each 32-bit element in the intermediate
value is zero, otherwise set CF to 0. Return the ZF value.
|
| undefined_pd | Return a 128-bit vector with undefined contents. |
| undefined_ps | Return a 128-bit vector with undefined contents. |
| undefined_si128 | Return a 128-bit vector with undefined contents. |