_mm256_dpbssd_epi32
Classification
AVX_ALL, Arithmetic, CPUID Test: AVX_VNNI_INT8
Header File
immintrin.h
Instruction
VPDPBSSD ymm, ymm, ymm
Synopsis
 _mm256_dpbssd_epi32(__m256i __W, __m256i __A, __m256i __B);
Description
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
Operation
FOR j := 0 to 7
	tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
	tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
	tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
	tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0