Skip to content

Deduplicate encoded stack map data inside .cwasm sections #10431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fitzgen opened this issue Mar 20, 2025 · 0 comments
Open

Deduplicate encoded stack map data inside .cwasm sections #10431

fitzgen opened this issue Mar 20, 2025 · 0 comments
Labels
wasm-proposal:gc Issues with the implementation of the gc wasm proposal wasmtime:code-size Issues related to reducing the code size of Wasmtime binaries wasmtime Issues about wasmtime that don't fall into another label

Comments

@fitzgen
Copy link
Member

fitzgen commented Mar 20, 2025

Something to consider for the future: if we frequently have multiple sequential entries for different PCs but which have the same stack slots, eg

...
0x1dc: offset of [8, 12]
0x124: offset of [8, 12, 24] copy 1
0x142: offset of [8, 12, 24] copy 2
0x15a: offset of [8, 12, 24] copy 3
0x166: offset of [8]
...

then it may make sense for each entry in the index to store non-overlapping PC ranges, rather than exact PCs, and we could effectively dedupe the index entries and the stack map data. That is, the previous example would become

...
0x1dc..0x1dd: offset of [8, 12]
0x124..0x15b: offset of [8, 12, 24] (only copy)
0x166..0x167: offset of [8]
...

The downsides are that

  1. We would need to change Cranelift to actually emit empty stack maps for safepoints without any live GC refs, otherwise if we have (pc=0x1234, [8]); (pc=0x1238, []); (pc=0x123b, [8]) and we don't see that middle entry in this builder, then we risk using [8] as our stack map at pc 0x1238, which is extending a dead gc ref's lifetime at best and is giving the collector uninitialized data at worst.
  2. Relatedly, we lose our ability to catch bugs where the return address PC we are tracing isn't an exact match for a stack map entry.

These are actually pretty scary, so maybe we don't want to do this, even if it would let us make these binary search indices much smaller.


All that said, we can actually already dedupe the stack map data if we want to, and have multiple index entries point to the same stack map data (even if they aren't contiguous!) with the encoding scheme already in use in this PR. We just need to hash cons and cache stack-map-data to encoded offset in this builder. This doesn't have any of the downsides from above. Seems like it would be a pure win.

Originally posted by @fitzgen in #10404 (comment)

@alexcrichton alexcrichton added wasmtime Issues about wasmtime that don't fall into another label wasm-proposal:gc Issues with the implementation of the gc wasm proposal wasmtime:code-size Issues related to reducing the code size of Wasmtime binaries labels Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasm-proposal:gc Issues with the implementation of the gc wasm proposal wasmtime:code-size Issues related to reducing the code size of Wasmtime binaries wasmtime Issues about wasmtime that don't fall into another label
Projects
None yet
Development

No branches or pull requests

2 participants