Class X86.Avx2

AVX2 intrinsics

Inheritance

X86.Avx2

Namespace: Unity.Burst.Intrinsics

Syntax

public static class Avx2 : object

Properties

Name	Description
IsAvx2Supported	Evaluates to true at compile time if AVX2 intrinsics are supported.

Methods

Name	Description
blend_epi32(v128, v128, Int32)	Blend packed 32-bit integers from a and b using control mask imm8, and store the results in dst.
broadcastb_epi8(v128)	Broadcast the low packed 8-bit integer from a to all elements of dst.
broadcastd_epi32(v128)	Broadcast the low packed 32-bit integer from a to all elements of dst.
broadcastq_epi64(v128)	Broadcast the low packed 64-bit integer from a to all elements of dst.
broadcastsd_pd(v128)	Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
broadcastss_ps(v128)	Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
broadcastw_epi16(v128)	Broadcast the low packed 16-bit integer from a to all elements of dst.
i32gather_epi32(Void*, v128, Int32)	Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_epi64(Void*, v128, Int32)	Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_pd(Void*, v128, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i32gather_ps(Void*, v128, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_epi32(Void*, v128, Int32)	Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_epi64(Void*, v128, Int32)	Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_pd(Void*, v128, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
i64gather_ps(Void*, v128, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mask_i32gather_epi32(v128, Void*, v128, v128, Int32)	Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_epi64(v128, Void*, v128, v128, Int32)	Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_pd(v128, Void*, v128, v128, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i32gather_ps(v128, Void*, v128, v128, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_epi32(v128, Void*, v128, v128, Int32)	Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_epi64(v128, Void*, v128, v128, Int32)	Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_pd(v128, Void*, v128, v128, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mask_i64gather_ps(v128, Void*, v128, v128, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
maskload_epi32(Void*, v128)	Load packed 32-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
maskload_epi64(Void*, v128)	Load packed 64-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
maskstore_epi32(Void*, v128, v128)	Store packed 32-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
maskstore_epi64(Void*, v128, v128)	Store packed 64-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_abs_epi16(v256)	Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in dst.
mm256_abs_epi32(v256)	Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in dst.
mm256_abs_epi8(v256)	Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in dst.
mm256_add_epi16(v256, v256)	Add packed 16-bit integers in a and b, and store the results in dst.
mm256_add_epi32(v256, v256)	Add packed 32-bit integers in a and b, and store the results in dst.
mm256_add_epi64(v256, v256)	Add packed 64-bit integers in a and b, and store the results in dst.
mm256_add_epi8(v256, v256)	Add packed 8-bit integers in a and b, and store the results in dst.
mm256_adds_epi16(v256, v256)	Add packed 16-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epi8(v256, v256)	Add packed 8-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epu16(v256, v256)	Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
mm256_adds_epu8(v256, v256)	Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
mm256_alignr_epi8(v256, v256, Int32)	Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst.
mm256_and_si256(v256, v256)	Compute the bitwise AND of 256 bits (representing integer data) in a and b, and store the result in dst.
mm256_andnot_si256(v256, v256)	Compute the bitwise NOT of 256 bits (representing integer data) in a and then AND with b, and store the result in dst.
mm256_avg_epu16(v256, v256)	Average packed unsigned 16-bit integers in a and b, and store the results in dst.
mm256_avg_epu8(v256, v256)	Average packed unsigned 8-bit integers in a and b, and store the results in dst.
mm256_blend_epi16(v256, v256, Int32)	Blend packed 16-bit integers from a and b within 128-bit lanes using control mask imm8, and store the results in dst.
mm256_blend_epi32(v256, v256, Int32)	Blend packed 32-bit integers from a and b using control mask imm8, and store the results in dst.
mm256_blendv_epi8(v256, v256, v256)	Blend packed 8-bit integers from a and b using mask, and store the results in dst.
mm256_broadcastb_epi8(v128)	Broadcast the low packed 8-bit integer from a to all elements of dst.
mm256_broadcastd_epi32(v128)	Broadcast the low packed 32-bit integer from a to all elements of dst.
mm256_broadcastq_epi64(v128)	Broadcast the low packed 64-bit integer from a to all elements of dst.
mm256_broadcastsd_pd(v128)	Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
mm256_broadcastsi128_si256(v128)	Broadcast 128 bits of integer data from a to all 128-bit lanes in dst
mm256_broadcastss_ps(v128)	Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
mm256_broadcastw_epi16(v128)	Broadcast the low packed 16-bit integer from a to all elements of dst.
mm256_bslli_epi128(v256, Int32)	Shift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_bsrli_epi128(v256, Int32)	Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_cmpeq_epi16(v256, v256)	Compare packed 16-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi32(v256, v256)	Compare packed 32-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi64(v256, v256)	Compare packed 64-bit integers in a and b for equality, and store the results in dst.
mm256_cmpeq_epi8(v256, v256)	Compare packed 8-bit integers in a and b for equality, and store the results in dst.
mm256_cmpgt_epi16(v256, v256)	Compare packed 16-bit integers in a and b for greater-than, and store the results in dst.
mm256_cmpgt_epi32(v256, v256)	Compare packed 32-bit integers in a and b for greater-than, and store the results in dst.
mm256_cmpgt_epi64(v256, v256)	Compare packed 64-bit integers in a and b for equality, and store the results in dst.
mm256_cmpgt_epi8(v256, v256)	Compare packed 8-bit integers in a and b for greater-than, and store the results in dst.
mm256_cvtepi16_epi32(v128)	Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepi16_epi64(v128)	Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepi32_epi64(v128)	Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepi8_epi16(v128)	Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst.
mm256_cvtepi8_epi32(v128)	Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepi8_epi64(v128)	Sign extend packed 8-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu16_epi32(v128)	Sign extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepu16_epi64(v128)	Sign extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu32_epi64(v128)	Sign extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtepu8_epi16(v128)	Sign extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst.
mm256_cvtepu8_epi32(v128)	Sign extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst.
mm256_cvtepu8_epi64(v128)	Sign extend packed unsigned 8-bit integers in a to packed 64-bit integers, and store the results in dst.
mm256_cvtsd_f64(v256)	Copy the lower double-precision (64-bit) floating-point element of a to dst.
mm256_cvtsi256_si32(v256)	Copy the lower 32-bit integer in a to dst.
mm256_cvtsi256_si64(v256)	Copy the lower 64-bit integer in a to dst.
mm256_extract_epi16(v256, Int32)	Extract a 16-bit integer from a, selected with index (which must be constant), and store the result in dst.
mm256_extract_epi8(v256, Int32)	Extract an 8-bit integer from a, selected with index (which must be constant), and store the result in dst.
mm256_extracti128_si256(v256, Int32)	Extract 128 bits (composed of integer data) from a, selected with imm8, and store the result in dst.
mm256_hadd_epi16(v256, v256)	Horizontally add adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hadd_epi32(v256, v256)	Horizontally add adjacent pairs of 32-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hadds_epi16(v256, v256)	Horizontally add adjacent pairs of 16-bit integers in a and b using saturation, and pack the signed 16-bit results in dst.
mm256_hsub_epi16(v256, v256)	Horizontally subtract adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hsub_epi32(v256, v256)	Horizontally subtract adjacent pairs of 32-bit integers in a and b, and pack the signed 16-bit results in dst.
mm256_hsubs_epi16(v256, v256)	Horizontally subtract adjacent pairs of 16-bit integers in a and b using saturation, and pack the signed 16-bit results in dst.
mm256_i32gather_epi32(Void*, v256, Int32)	Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_epi64(Void*, v128, Int32)	Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_pd(Void*, v128, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i32gather_ps(Void*, v256, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_i64gather_epi32(Void*, v256, Int32)	Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_epi64(Void*, v256, Int32)	Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_pd(Void*, v256, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_i64gather_ps(Void*, v256, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
mm256_inserti128_si256(v256, v128, Int32)	Copy a to dst, then insert 128 bits (composed of integer data) from b into dst at the location specified by imm8.
mm256_madd_epi16(v256, v256)	Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst.
mm256_maddubs_epi16(v256, v256)	Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst.
mm256_mask_i32gather_epi32(v256, Void*, v256, v256, Int32)	Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_epi64(v256, Void*, v128, v256, Int32)	Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_pd(v256, Void*, v128, v256, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i32gather_ps(v256, Void*, v256, v256, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_epi32(v128, Void*, v256, v128, Int32)	Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_epi64(v256, Void*, v256, v256, Int32)	Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_pd(v256, Void*, v256, v256, Int32)	Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_mask_i64gather_ps(v128, Void*, v256, v128, Int32)	Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
mm256_maskload_epi32(Void*, v256)	Load packed 32-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
mm256_maskload_epi64(Void*, v256)	Load packed 64-bit integers from memory into dst using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
mm256_maskstore_epi32(Void*, v256, v256)	Store packed 32-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_maskstore_epi64(Void*, v256, v256)	Store packed 64-bit integers from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element).
mm256_max_epi16(v256, v256)	Compare packed 16-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epi32(v256, v256)	Compare packed 32-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epi8(v256, v256)	Compare packed 8-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu16(v256, v256)	Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu32(v256, v256)	Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst.
mm256_max_epu8(v256, v256)	Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst.
mm256_min_epi16(v256, v256)	Compare packed 16-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epi32(v256, v256)	Compare packed 32-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epi8(v256, v256)	Compare packed 8-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu16(v256, v256)	Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu32(v256, v256)	Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst.
mm256_min_epu8(v256, v256)	Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst.
mm256_movemask_epi8(v256)	Create mask from the most significant bit of each 8-bit element in a, and store the result in dst.
mm256_mpsadbw_epu8(v256, v256, Int32)	Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet from b and eight quadruplets from a. One quadruplet is selected from b starting at on the offset specified in imm8. Eight quadruplets are formed from sequential 8-bit integers selected from a starting at the offset specified in imm8.
mm256_mul_epi32(v256, v256)	Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst.
mm256_mul_epu32(v256, v256)	Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst.
mm256_mulhi_epi16(v256, v256)	Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
mm256_mulhi_epu16(v256, v256)	Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
mm256_mulhrs_epi16(v256, v256)	Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
mm256_mullo_epi16(v256, v256)	Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst.
mm256_mullo_epi32(v256, v256)	Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.
mm256_or_si256(v256, v256)	Compute the bitwise OR of 256 bits (representing integer data) in a and b, and store the result in dst.
mm256_packs_epi16(v256, v256)	Convert packed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst.
mm256_packs_epi32(v256, v256)	Convert packed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst.
mm256_packus_epi16(v256, v256)	Convert packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst.
mm256_packus_epi32(v256, v256)	Convert packed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst.
mm256_permute2x128_si256(v256, v256, Int32)	Shuffle 128-bits (composed of integer data) selected by imm8 from a and b, and store the results in dst.
mm256_permute4x64_epi64(v256, Int32)	Shuffle 64-bit integers in a across lanes using the control in imm8, and store the results in dst.
mm256_permute4x64_pd(v256, Int32)	Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm8, and store the results in dst.
mm256_permutevar8x32_epi32(v256, v256)	Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
mm256_permutevar8x32_ps(v256, v256)	Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
mm256_sad_epu8(v256, v256)	Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst.
mm256_shuffle_epi32(v256, Int32)	Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst.
mm256_shuffle_epi8(v256, v256)	Shuffle 8-bit integers in a within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst.
mm256_shufflehi_epi16(v256, Int32)	Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from from a to dst.
mm256_shufflelo_epi16(v256, Int32)	Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from from a to dst.
mm256_sign_epi16(v256, v256)	Negate packed 16-bit integers in a when the corresponding signed 16-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sign_epi32(v256, v256)	Negate packed 32-bit integers in a when the corresponding signed 32-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sign_epi8(v256, v256)	Negate packed 8-bit integers in a when the corresponding signed 8-bit integer in b is negative, and store the results in dst. Element in dst are zeroed out when the corresponding element in b is zero.
mm256_sll_epi16(v256, v128)	Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_sll_epi32(v256, v128)	Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_sll_epi64(v256, v128)	Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst.
mm256_slli_epi16(v256, Int32)	Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_epi32(v256, Int32)	Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_epi64(v256, Int32)	Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
mm256_slli_si256(v256, Int32)	Shift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_sllv_epi32(v256, v256)	Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_sllv_epi64(v256, v256)	Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_sra_epi16(v256, v128)	Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst.
mm256_sra_epi32(v256, v128)	Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst.
mm256_srai_epi16(v256, Int32)	Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
mm256_srai_epi32(v256, Int32)	Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
mm256_srav_epi32(v256, v256)	Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
mm256_srl_epi16(v256, v128)	Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srl_epi32(v256, v128)	Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srl_epi64(v256, v128)	Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst.
mm256_srli_epi16(v256, Int32)	Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_epi32(v256, Int32)	Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_epi64(v256, Int32)	Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
mm256_srli_si256(v256, Int32)	Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
mm256_srlv_epi32(v256, v256)	Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_srlv_epi64(v256, v256)	Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
mm256_stream_load_si256(Void*)	Load 256-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
mm256_sub_epi16(v256, v256)	Subtract packed 16-bit integers in a and b, and store the results in dst.
mm256_sub_epi32(v256, v256)	Subtract packed 32-bit integers in a and b, and store the results in dst.
mm256_sub_epi64(v256, v256)	Subtract packed 64-bit integers in a and b, and store the results in dst.
mm256_sub_epi8(v256, v256)	Subtract packed 8-bit integers in a and b, and store the results in dst.
mm256_subs_epi16(v256, v256)	Subtract packed 16-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epi8(v256, v256)	Subtract packed 8-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epu16(v256, v256)	Subtract packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
mm256_subs_epu8(v256, v256)	Subtract packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
mm256_unpackhi_epi16(v256, v256)	Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi32(v256, v256)	Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi64(v256, v256)	Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpackhi_epi8(v256, v256)	Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi16(v256, v256)	Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi32(v256, v256)	Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi64(v256, v256)	Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_unpacklo_epi8(v256, v256)	Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
mm256_xor_si256(v256, v256)	Compute the bitwise XOR of 256 bits (representing integer data) in a and b, and store the result in dst.
sllv_epi32(v128, v128)	Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
sllv_epi64(v128, v128)	Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
srav_epi32(v128, v128)	Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
srlv_epi32(v128, v128)	Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
srlv_epi64(v128, v128)	Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.