_mm_mask3_fcmadd_round_sch
Classification
AVX-512, Arithmetic, CPUID Test: AVX512_FP16
Header File
Instruction
VFCMADDCSH xmm {k}, xmm, xmm {er}
Synopsis
_mm_mask3_fcmadd_round_sch(__m128h a, __m128h b, __m128h c, __mmask8 k, const int rounding);
Description
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
Operation
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := c.fp16[0]
dst.fp16[1] := c.fp16[1]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0