Fast Path StringCoding.countPostives and hasNegative for Power #21597

luke-li-2003 · 2025-04-08T20:56:43Z

Fast path the StringCoding methods countPositives and hasNegative on Power, since their logics are similar, they can be implemented by a single instrinsic.

luke-li-2003 · 2025-04-08T20:57:47Z

Still a draft PR since I need to figure out a good way to deal with shorter arrays.

luke-li-2003 · 2025-04-08T20:58:18Z

Related to https://github.ibm.com/runtimes/openj9-jit-xopt/issues/624

… Power Fast path the StringCoding methods countPositives and hasNegative on Power, since their logics are similar, they can be implemented by a single instrinsic. Signed-off-by: Luke Li <luke.li@ibm.com>

luke-li-2003 · 2025-04-08T21:00:31Z

For reference, this is what we are dealing with:

On jdk21+:

    public static boolean hasNegatives(byte[] ba, int off, int len) {
        return countPositives(ba, off, len) != len;
    }

    @IntrinsicCandidate
    public static int countPositives(byte[] ba, int off, int len) {
        int limit = off + len;
        for (int i = off; i < limit; i++) {
            if (ba[i] < 0) {
                return i - off;
            }
        }
        return len;
    }

Before jdk21:

    public static boolean hasNegatives(byte[] ba, int off, int len) {
        for (int i = off; i < off + len; i++) {
            if (ba[i] < 0) {
                return true;
            }
        }
        return false;
    }

luke-li-2003 · 2025-04-08T21:03:50Z

Performance: as outlined before it is not doing well for arrays shorter than byte[16]

build	byte[1]	4	8	16	30	50	100
nightly	1100	325	217	107	34	19	14
fast path	478	175	97	40	41	144	77

luke-li-2003 · 2025-04-16T21:00:50Z

After some experimentations, I ended up with two totally different implementations:

The first one modifies stringIndexOf, which uses aligned loads and masking mechanisms to ensure a serial loop is not necessary.

The second one is the one on the PR branch, it uses unaligned vector loads, with the residue going into a serial loop.

The performance tradeoffs are outlined before

build	byte[1]	2	3	4	15	17	128
default	306	195	150	176	78	83	20
aligned	307	165	160	160	147	145	86
unaligned	304	250	190	155	93	80	58

There seems to be some inescapable performance tradeoffs here.

luke-li-2003 · 2025-04-17T22:04:59Z

The reason why the default build could be so fast some of the times, was because in those times the offset value was fixed.

I now have two benchmarks, one randomises the starting offset, while the other does not. I am not sure which one presents a more realistic scenario:

Randomised offset:

build	1	2	3	15	30
default	300	291	191	100	64
fast-pathed	571	362	302	125	124

Offset fixed to 0:

build	1	2	3	15	30
default	1000	418	360	89	60
fast-pathed	575	369	290	127	125

luke-li-2003 · 2025-04-25T19:58:51Z

A day of testing only showed what didn't work:

Making first-byte-mismatch the fall-through branch only had negligible performance difference while making the code more convoluted.

Using the counter register for the unroll loop resulted in a 50% reduction in throughput while only saving 1 register.

luke-li-2003 · 2025-05-01T22:34:43Z

New data with the updated code and randomised offset:

build	1	2	3	15	30	33
default	429	279	229	97	63	64
fast-path	617	393	320	129	128	110
P9+	615	383	313	136	141	328

I don't really understand why the P9+ version is slightly slower on arrays shorter than 3, given the new instructions should not affect them at all.

luke-li-2003 · 2025-05-05T18:34:21Z

I made a broken version of the intrinsic that simply does nothing, and it could reach a throughput of 800M, compared to the jitted code's 1000M...

Fast Path StringCoding.countPostives and StringCoding.hasNegative for…

35cdf26

… Power Fast path the StringCoding methods countPositives and hasNegative on Power, since their logics are similar, they can be implemented by a single instrinsic. Signed-off-by: Luke Li <luke.li@ibm.com>

luke-li-2003 force-pushed the CountPostivesP branch from f1a5eaf to 35cdf26 Compare April 8, 2025 20:58

luke-li-2003 added 2 commits April 11, 2025 10:28

unrolling impl

2d85413

loop unrolling improvement

d391ad6

luke-li-2003 added 4 commits April 22, 2025 12:19

bug fixes

120bc31

cleanup

3fb1c02

first label

3b9bdc4

roll back first label

39583ed

luke-li-2003 force-pushed the CountPostivesP branch from 8a88de9 to 39583ed Compare April 25, 2025 19:20

edge caes

165cfdd

luke-li-2003 force-pushed the CountPostivesP branch from 6eaebec to 165cfdd Compare April 28, 2025 19:37

luke-li-2003 added 4 commits April 28, 2025 15:56

no LE byte swaps

0ccd48d

P9 adaption

a2d929b

use count trailing zeros

7e275c3

lbz adaption

bc97a8f

luke-li-2003 force-pushed the CountPostivesP branch 2 times, most recently from 7b96532 to bc97a8f Compare April 29, 2025 19:44

clean whitespace

656f158

use andi_r

685f86c

luke-li-2003 force-pushed the CountPostivesP branch from 8a58536 to aaea552 Compare May 5, 2025 16:05

clean up

ca84576

luke-li-2003 force-pushed the CountPostivesP branch from aaea552 to ca84576 Compare May 5, 2025 16:06

register cleanup

b055248

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Path StringCoding.countPostives and hasNegative for Power #21597

Fast Path StringCoding.countPostives and hasNegative for Power #21597

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025 •

edited

Loading

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 16, 2025

luke-li-2003 commented Apr 17, 2025

luke-li-2003 commented Apr 25, 2025

luke-li-2003 commented May 1, 2025 •

edited

Loading

luke-li-2003 commented May 5, 2025

Fast Path StringCoding.countPostives and hasNegative for Power #21597

Are you sure you want to change the base?

Fast Path StringCoding.countPostives and hasNegative for Power #21597

Conversation

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 8, 2025 • edited Loading

luke-li-2003 commented Apr 8, 2025

luke-li-2003 commented Apr 16, 2025

luke-li-2003 commented Apr 17, 2025

luke-li-2003 commented Apr 25, 2025

luke-li-2003 commented May 1, 2025 • edited Loading

luke-li-2003 commented May 5, 2025

luke-li-2003 commented Apr 8, 2025 •

edited

Loading

luke-li-2003 commented May 1, 2025 •

edited

Loading