_mm256_dpbf16_ps
Classification
AVX-512, Arithmetic, CPUID Test: AVX512_BF16
Header File
immintrin.h
Instruction
VDPBF16PS ymm, ymm, ymm
Synopsis
 _mm256_dpbf16_ps(__m256 src, __m256bh a, __m256bh b);
Description
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".
Operation
DEFINE make_fp32(x[15:0]) {
	y.fp32  := 0.0
	y[31:16] := x[15:0]
	RETURN y
}
dst := src
FOR j := 0 to 7
	dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
	dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ENDFOR
dst[MAX:256] := 0