_mm_dp_ps
Classification
SSE_ALL, Arithmetic, CPUID Test: SSE4.1
Header File
smmintrin.h
Instruction
DPPS xmm, xmm, imm8
Synopsis
 _mm_dp_ps(__m128 a, __m128 b, const int imm8);
Description
Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".
Operation
DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
	FOR j := 0 to 3
		i := j*32
		IF imm8[(4+j)%8]
			temp[i+31:i] := a[i+31:i] * b[i+31:i]
		ELSE
			temp[i+31:i] := 0
		FI
	ENDFOR
	
	sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0])
	
	FOR j := 0 to 3
		i := j*32
		IF imm8[j%8]
			tmpdst[i+31:i] := sum[31:0]
		ELSE
			tmpdst[i+31:i] := 0
		FI
	ENDFOR
	RETURN tmpdst[127:0]
}
dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])