_mm512_4dpwssd_epi32
Classification
AVX-512, Arithmetic, CPUID Test: AVX512_4VNNIW
Header File
Instruction
VP4DPWSSD zmm, zmm, m128
Synopsis
_mm512_4dpwssd_epi32(__m512i src, __m512i a0, __m512i a1, __m512i a2, __m512i a3, __m128i * b);
Description
Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation, and store the results in "dst".
Operation
dst[511:0] := src[511:0]
FOR i := 0 to 15
FOR m := 0 to 3
lim_base := b + m*32
t.dword := MEM[lim_base+31:lim_base]
p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0]))
p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1]))
dst.dword[i] := dst.dword[i] + p1.dword + p2.dword
ENDFOR
ENDFOR
dst[MAX:512] := 0