Version: Unity 6.6 Alpha (6000.6)
LanguageEnglish
  • C#

Avx2

class in Unity.Burst.Intrinsics

Suggest a change

Success!

Thank you for helping us improve the quality of Unity Documentation. Although we cannot accept all submissions, we do read each suggested change from our users and will make updates where applicable.

Close

Submission failed

For some reason your suggested change could not be submitted. Please <a>try again</a> in a few minutes. And thank you for taking the time to help us improve the quality of Unity Documentation.

Close

Cancel

Description

AVX2 intrinsics

Static Properties

Property Description
IsAvx2Supported Evaluates to true at compile time if AVX2 intrinsics are supported.

Static Methods

Method Description
blend_epi32 Blend packed 32-bit integers from a and b using control mask imm8, and store the results in dst.
broadcastb_epi8 Broadcast the low packed 8-bit integer from a to all elements of dst.
broadcastd_epi32 Broadcast the low packed 32-bit integer from a to all elements of dst.
broadcastq_epi64 Broadcast the low packed 64-bit integer from a to all elements of dst.
broadcastsd_pd Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
broadcastss_ps Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
broadcastw_epi16 Broadcast the low packed 16-bit integer from a to all elements of dst.
i32gather_epi32 Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_epi64 Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_pd Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_ps Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_epi32 Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_epi64 Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_pd Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_ps Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mask_i32gather_epi32 Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_epi64 Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_pd Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_ps Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_epi32 Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_epi64 Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_pd Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_ps Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
maskload_epi32 Load packed 32-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
maskload_epi64 Load packed 64-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
maskstore_epi32 Store packed 32-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
maskstore_epi64 Store packed 64-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_abs_epi16 Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in dst.
mm256_abs_epi32 Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in dst.
mm256_abs_epi8 Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in dst.
mm256_add_epi16 Add packed 16-bit integers in a and b, and store the results in dst.
mm256_add_epi32 Add packed 32-bit integers in a and b, and store the results in dst.
mm256_add_epi64 Add packed 64-bit integers in a and b, and store the results in dst.
mm256_add_epi8 Add packed 8-bit integers in a and b, and store the results in dst.
mm256_adds_epi16 Add packed 16-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epi8 Add packed 8-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epu16 Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epu8 Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
mm256_alignr_epi8 Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst.
mm256_and_si256 Compute the bitwise AND of 256 bits (representing integer data) in a and b, and store the result in dst.
mm256_andnot_si256 Compute the bitwise NOT of 256 bits (representing integer data) in a and then AND with b, and store the result in dst.
mm256_avg_epu16 Average packed unsigned 16-bit integers in a and b, and store the results in dst.
mm256_avg_epu8 Average packed unsigned 8-bit integers in a and b, and store the results in dst.
mm256_blend_epi16 Blend packed 16-bit integers from a and b within 128-bit lanes using control mask imm8, and store the results in dst.
mm256_blend_epi32 Blend packed 32-bit integers from a and b using control mask imm8, and store the results in dst.
mm256_blendv_epi8 Blend packed 8-bit integers from a and b using mask, and store the results in dst.
mm256_broadcastb_epi8 Broadcast the low packed 8-bit integer from a to all elements of dst.
mm256_broadcastd_epi32 Broadcast the low packed 32-bit integer from a to all elements of dst.
mm256_broadcastq_epi64 Broadcast the low packed 64-bit integer from a to all elements of dst.
mm256_broadcastsd_pd Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
mm256_broadcastsi128_si256 Broadcast 128 bits of integer data from a to all 128-bit lanes in dst
mm256_broadcastss_ps Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
mm256_broadcastw_epi16 Broadcast the low packed 16-bit integer from a to all elements of dst.
mm256_bslli_epi128 Shift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_bsrli_epi128 Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_cmpeq_epi16 Compare packed 16-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi32 Compare packed 32-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi64 Compare packed 64-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi8 Compare packed 8-bit integers in a and b for equality, and store the results in dst.
mm256_cmpgt_epi16 Compare packed 16-bit integers in a and b for greater-than, and store the results in dst.
mm256_cmpgt_epi32 Compare packed 32-bit integers in a and b for greater-than, and store the results in dst.
mm256_cmpgt_epi64 Compare packed 64-bit integers in a and b for equality, and store the results in dst.
mm256_cmpgt_epi8 Compare packed 8-bit integers in a and b for greater-than, and store the results in dst.
mm256_cvtepi16_epi32 Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepi16_epi64 Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepi32_epi64 Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepi8_epi16 Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst.
mm256_cvtepi8_epi32 Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepi8_epi64 Sign extend packed 8-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu16_epi32 Sign extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepu16_epi64 Sign extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu32_epi64 Sign extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu8_epi16 Sign extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst.
mm256_cvtepu8_epi32 Sign extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepu8_epi64 Sign extend packed unsigned 8-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtsd_f64 Copy the lower double-precision (64-bit) floating-point element of a to dst.
mm256_cvtsi256_si32 Copy the lower 32-bit integer in a to dst.
mm256_cvtsi256_si64 Copy the lower 64-bit integer in a to dst.
mm256_extract_epi16 Extract a 16-bit integer from a, selected with index (which must be constant), and store the result in dst.
mm256_extract_epi8 Extract an 8-bit integer from a, selected with index (which must be constant), and store the result in dst.
mm256_extracti128_si256 Extract 128 bits (composed of integer data) from a, selected with imm8, and store the result in dst.
mm256_hadd_epi16 Horizontally add adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hadd_epi32 Horizontally add adjacent pairs of 32-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hadds_epi16 Horizontally add adjacent pairs of 16-bit integers in a and b using saturation, and pack the signed 16-bit results in dst.
mm256_hsub_epi16 Horizontally subtract adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hsub_epi32 Horizontally subtract adjacent pairs of 32-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hsubs_epi16 Horizontally subtract adjacent pairs of 16-bit integers in a and b using saturation, and pack the signed 16-bit results in dst.
mm256_i32gather_epi32 Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_epi64 Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_pd Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_ps Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_i64gather_epi32 Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_epi64 Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_pd Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_ps Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_inserti128_si256 Copy a to dst, then insert 128 bits (composed of integer data) from b into dst at the location specified by imm8.
mm256_madd_epi16 Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst.
mm256_maddubs_epi16 Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst.
mm256_mask_i32gather_epi32 Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_epi64 Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_pd Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_ps Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_epi32 Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_epi64 Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_pd Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_ps Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_maskload_epi32 Load packed 32-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
mm256_maskload_epi64 Load packed 64-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
mm256_maskstore_epi32 Store packed 32-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_maskstore_epi64 Store packed 64-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_max_epi16 Compare packed 16-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epi32 Compare packed 32-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epi8 Compare packed 8-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu16 Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu32 Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu8 Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst.
mm256_min_epi16 Compare packed 16-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epi32 Compare packed 32-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epi8 Compare packed 8-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu16 Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu32 Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu8 Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst.
mm256_movemask_epi8 Create mask from the most significant bit of each 8-bit element in a, and store the result in dst.
mm256_mpsadbw_epu8 Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet from b and eight quadruplets from a. One quadruplet is selected from b starting at on the offset specified in imm8. Eight quadruplets are formed from sequential 8-bit integers selected from a starting at the offset specified in imm8.
mm256_mul_epi32 Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst.
mm256_mul_epu32 Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst.
mm256_mulhi_epi16 Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
mm256_mulhi_epu16 Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
mm256_mulhrs_epi16 Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
mm256_mullo_epi16 Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst.
mm256_mullo_epi32 Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.
mm256_or_si256 Compute the bitwise OR of 256 bits (representing integer data) in a and b, and store the result in dst.
mm256_packs_epi16 Convert packed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst.
mm256_packs_epi32 Convert packed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst.
mm256_packus_epi16 Convert packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst.
mm256_packus_epi32 Convert packed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst.
mm256_permute2x128_si256 Shuffle 128-bits (composed of integer data) selected by imm8 from a and b, and store the results in dst.
mm256_permute4x64_epi64 Shuffle 64-bit integers in a across lanes using the control in imm8, and store the results in dst.
mm256_permute4x64_pd Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm8, and store the results in dst.
mm256_permutevar8x32_epi32 Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
mm256_permutevar8x32_ps Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
mm256_sad_epu8 Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst.
mm256_shuffle_epi32 Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_shuffle_epi8 Shuffle 8-bit integers in a within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst.
mm256_shufflehi_epi16 Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from from a to dst.
mm256_shufflelo_epi16 Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from from a to dst.
mm256_sign_epi16 Negate packed 16-bit integers in a when the corresponding signed 16-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sign_epi32 Negate packed 32-bit integers in a when the corresponding signed 32-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sign_epi8 Negate packed 8-bit integers in a when the corresponding signed 8-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sll_epi16 Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_sll_epi32 Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_sll_epi64 Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_slli_epi16 Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_epi32 Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_epi64 Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_si256 Shift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_sllv_epi32 Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_sllv_epi64 Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_sra_epi16 Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst.
mm256_sra_epi32 Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst.
mm256_srai_epi16 Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
mm256_srai_epi32 Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
mm256_srav_epi32 Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
mm256_srl_epi16 Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srl_epi32 Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srl_epi64 Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srli_epi16 Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_epi32 Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_epi64 Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_si256 Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_srlv_epi32 Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_srlv_epi64 Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_stream_load_si256 Load 256-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
mm256_sub_epi16 Subtract packed 16-bit integers in a and b, and store the results in dst.
mm256_sub_epi32 Subtract packed 32-bit integers in a and b, and store the results in dst.
mm256_sub_epi64 Subtract packed 64-bit integers in a and b, and store the results in dst.
mm256_sub_epi8 Subtract packed 8-bit integers in a and b, and store the results in dst.
mm256_subs_epi16 Subtract packed 16-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epi8 Subtract packed 8-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epu16 Subtract packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epu8 Subtract packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
mm256_unpackhi_epi16 Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi32 Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi64 Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi8 Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi16 Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi32 Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi64 Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi8 Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_xor_si256 Compute the bitwise XOR of 256 bits (representing integer data) in a and b, and store the result in dst.
sllv_epi32 Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
sllv_epi64 Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
srav_epi32 Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
srlv_epi32 Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
srlv_epi64 Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.