_mm512_maskz_dpbf16_ps
Classification
AVX-512, Arithmetic, CPUID Test: AVX512_BF16
Header File
Instruction
VDPBF16PS zmm {z}, zmm, zmm
Synopsis
_mm512_maskz_dpbf16_ps(__mmask16 k, __m512 src, __m512bh a, __m512bh b);
Description
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
Operation
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 15
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0