__tile_dpbsud
Classification
AMX, Application-Targeted, CPUID Test: AMX-INT8
Header File
immintrin.h
Instruction
TDPBSUD tmm, tmm, tmm
Synopsis
 __tile_dpbsud(__tile1024i* dst, __tile1024i src0, __tile1024i src1);
Description
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "src0" with corresponding unsigned 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
Operation
DEFINE DPBD(c, x, y) {
	tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
	tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
	tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
	tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
	tmp := dst.row[m]
	FOR k := 0 TO (src0.colsb / 4) - 1
		FOR n := 0 TO (dst.colsb / 4) - 1
			tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
		ENDFOR
	ENDFOR
	write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()