_mm256_dpbf16_ps
Classification
AVX-512, Arithmetic, CPUID Test: AVX512_BF16
Header File
Instruction
VDPBF16PS ymm, ymm, ymm
Synopsis
_mm256_dpbf16_ps(__m256 src, __m256bh a, __m256bh b);
Description
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".
Operation
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 7
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ENDFOR
dst[MAX:256] := 0