Skip to content

Commit 423192e

Browse files
EgorBovargaz
authored andcommitted
[netcore][llvm] Implement Sse1-4.2 subsets used by corlib (mono#18103)
* Implement SSE41 subset used by corlib * Implement Sse2.MoveMask * only works with llvm * Implement Sse3 and Ssse3 subsets used by corlib * Implement a few SSE1 methods * Address feedback, also, implement Sse.Add/Subtract * Fix build * Implement Sse.Multiply, Sse.Store * Implement Sse.CompareNotEquals * Implement Sse.MoveScalar * Finish SSE1 corlib subset * Implement Sse.LoadVector128, Sse.Shuffle * Sse.Shuffle cleanup * Implement Sse2 APIs * More of SSE2: LoadAlignedVector128, Compare* * Implement Sse2.Unpack* and Sse2.StoreScalar * Implement Sse2.PackUnsignedSaturate * Implement Sse2.ShiftRightLogical * Implement Sse2.Shuffle * Implement Vector128<T>.Zero * Fix CreateScalarUnsafe * Implement Vector128.As*, Fix Vector128.CreateScalarUnsafe * Fix failures * Fix failures * Fix Sse.MoveMask * remove redundant null checks * Fix AOT failures * fix compilation warrning * rename create_vector_mask_* * Fix failures found via tests * Index in Sse41.Insert has to be a constant * add local tests for mono * Update tests (cleanup) * Code cleanup * test * fix typo * Clean up * Clean up * Implement And, AndNot, Or, Xor, Divide for Sse1 * Cleanup * Fix build * limit emit_vector128 with LLVM * enable IsSupported for corlib * Fix build on wasm (if-defs issue) * Don't intrinsify Vector256 * Address feedback
1 parent e92fe19 commit 423192e

9 files changed

+2326
-43
lines changed

mono/mini/mini-llvm.c

+340-19
Large diffs are not rendered by default.

mono/mini/mini-ops.h

+33-1
Original file line numberDiff line numberDiff line change
@@ -1004,11 +1004,41 @@ MINI_OP(OP_CVTTPS2DQ, "cvttps2dq", XREG, XREG, NONE)
10041004
/* multiply all 4 single precision float elements, add them together, and store the result to the lowest element */
10051005
MINI_OP(OP_DPPS, "dpps", XREG, XREG, XREG)
10061006

1007-
/* sse 4.1 */
1007+
/* sse 1 */
1008+
/* inst_c1 is target type */
1009+
MINI_OP(OP_SSE_LOADU, "sse_loadu", XREG, XREG, NONE)
1010+
MINI_OP(OP_SSE_MOVMSK, "sse_movmsk", IREG, XREG, NONE)
1011+
MINI_OP(OP_SSE_STORE, "sse_store", NONE, XREG, XREG)
1012+
MINI_OP(OP_SSE_STORES, "sse_stores", NONE, XREG, XREG)
1013+
MINI_OP(OP_SSE_MOVS, "sse_movs", XREG, XREG, NONE)
1014+
MINI_OP(OP_SSE_MOVS2, "sse_movs2", XREG, XREG, XREG)
1015+
MINI_OP(OP_SSE_MOVEHL, "sse_movehl", XREG, XREG, XREG)
1016+
MINI_OP(OP_SSE_MOVELH, "sse_movelh", XREG, XREG, XREG)
1017+
MINI_OP(OP_SSE_UNPACKLO, "sse_unpacklo", XREG, XREG, XREG)
1018+
MINI_OP(OP_SSE_UNPACKHI, "sse_unpackhi", XREG, XREG, XREG)
1019+
MINI_OP(OP_SSE_SHUFFLE, "sse_shuffle", XREG, XREG, XREG)
1020+
MINI_OP(OP_SSE_AND, "sse_and", XREG, XREG, XREG)
1021+
MINI_OP(OP_SSE_OR, "sse_or", XREG, XREG, XREG)
1022+
MINI_OP(OP_SSE_XOR, "sse_xor", XREG, XREG, XREG)
1023+
MINI_OP(OP_SSE_ANDN, "sse_andn", XREG, XREG, XREG)
1024+
1025+
/* sse 2 */
1026+
MINI_OP(OP_SSE2_PACKUS, "sse2_packus", XREG, XREG, XREG)
1027+
MINI_OP(OP_SSE2_SRLI, "sse2_srli", XREG, XREG, XREG)
1028+
MINI_OP(OP_SSE2_SHUFFLE, "sse2_shuffle", XREG, XREG, XREG)
1029+
1030+
/* sse 3 */
1031+
MINI_OP(OP_SSE3_MOVDDUP, "sse3_movddup", XREG, XREG, NONE)
1032+
1033+
/* ssse 3 */
1034+
MINI_OP(OP_SSSE3_SHUFFLE, "ssse3_shuffle", XREG, XREG, XREG)
10081035

1036+
/* sse 4.1 */
10091037
/* inst_c0 is the rounding mode: 0 = round, 1 = floor, 2 = ceiling */
10101038
MINI_OP(OP_SSE41_ROUNDPD, "roundpd", XREG, XREG, NONE)
10111039
MINI_OP(OP_SSE41_ROUNDSS, "roundss", XREG, XREG, NONE)
1040+
MINI_OP3(OP_SSE41_INSERT, "sse41_insert", XREG, XREG, XREG, IREG)
1041+
MINI_OP(OP_SSE41_PTESTZ, "sse41_ptestz", IREG, XREG, XREG)
10121042

10131043
/* Intel BMI1 */
10141044
/* Count trailing zeroes, return 32/64 if the input is 0 */
@@ -1031,6 +1061,8 @@ MINI_OP3(OP_MULX_HL64, "mulxhl64", LREG, LREG, LREG, LREG)
10311061

10321062
#endif
10331063

1064+
MINI_OP(OP_CREATE_SCALAR_UNSAFE, "create_scalar_unsafe", XREG, XREG, NONE)
1065+
10341066
MINI_OP(OP_XMOVE, "xmove", XREG, XREG, NONE)
10351067
MINI_OP(OP_XZERO, "xzero", XREG, NONE, NONE)
10361068
MINI_OP(OP_XONES, "xones", XREG, NONE, NONE)

0 commit comments

Comments
 (0)