_mm256_dbsad_epu8
Classification
AVX-512, Miscellaneous, CPUID Test: AVX512BW
Header File
immintrin.h
Instruction
VDBPSADBW ymm, ymm, ymm, imm8
Synopsis
 _mm256_dbsad_epu8(__m256i a, __m256i b, int imm8);
Description
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
Operation
FOR i := 0 to 1
	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 3
	i := j*64
	dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
	               ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
	
	dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
	                  ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
	
	dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
	                  ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
	
	dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
	                  ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
dst[MAX:256] := 0