Skip to content

Improve performance for half-edge construction from <:Connectivity vector #1183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

halleysfifthinc
Copy link
Contributor

@halleysfifthinc halleysfifthinc commented Apr 7, 2025

Using a 72 tri, 36 quad torus, I measure the following speed and memory improvements with BenchmarkTools:

Benchmarking setup

Using the two meshes from this gist.

quads = GeoIO.load("quads.obj").geometry;
mixed = GeoIO.load("quads_tris.obj").geometry;
quads_connec = collect(faces(topology(quads), 2))
mixed_connec = collect(faces(topology(mixed), 2))

adjsort master

julia> @benchmark adjsort(faces) setup=(faces=shuffle(mixed_connec)) seconds=15
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  753.325 μs …  16.606 ms  ┊ GC (min … max): 0.00% … 94.35%
 Time  (median):     825.192 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   872.599 μs ± 491.641 μs  ┊ GC (mean ± σ):  4.49% ±  7.20%

     ▁▁▂▂▃▄▅▆▇███▇▆▆▅▄▃▂▁                  ▁                    ▃
  ▂▆███████████████████████▇▇▇▇▇▇▆▇▇▇▆▇▇▆█████▇▅▆▃▄▂▄▆▅▆▆▇▆▆▅▃▅ █
  753 μs        Histogram: log(frequency) by time       1.07 ms <

 Memory estimate: 449.75 KiB, allocs estimate: 8623.

adjsortperm PR

julia> @benchmark adjsortperm(faces) setup=(faces=shuffle(mixed_connec)) seconds=15 evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  224.498 μs …  15.208 ms  ┊ GC (min … max): 0.00% … 97.47%
 Time  (median):     281.176 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   290.772 μs ± 233.485 μs  ┊ GC (mean ± σ):  2.61% ±  3.39%

                    ▁▁▃▄▆▆▇▇██▇█▇▇▆▆▅▄▃▄▃▁▁                      
  ▂▁▂▂▂▂▂▂▂▃▃▄▄▄▅▇▇█████████████████████████▇█▆▆▅▅▆▄▄▄▄▄▃▃▃▃▃▂▂ ▅
  224 μs           Histogram: frequency by time          340 μs <

 Memory estimate: 73.43 KiB, allocs estimate: 1878.

Judgement:

BenchmarkTools.TrialJudgement: 
  time:   -66.68% => improvement (5.00% tolerance)
  memory: -83.67% => improvement (1.00% tolerance)

Using a 72 quad torus, I see

adjsort master

julia> @benchmark adjsort(faces) setup=(faces=shuffle(quads_connec)) seconds=15
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  111.442 μs …  27.753 ms  ┊ GC (min … max): 0.00% … 98.85%
 Time  (median):     122.528 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   144.562 μs ± 395.443 μs  ┊ GC (mean ± σ):  8.27% ±  3.75%

    ▂█▇▅▄▂▁                                                      
  ▁▃███████▆▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▂▃▃▃▂▂▂▂▁▁▁▂▂▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  111 μs           Histogram: frequency by time          210 μs <

 Memory estimate: 147.16 KiB, allocs estimate: 2691.

adjsortperm PR

julia> @benchmark adjsortperm(faces) setup=(faces=shuffle(quads_connec)) seconds=15 evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   9.778 μs … 28.545 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.294 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.392 μs ±  1.099 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▁▃▅▅▇█▇▇▇█▅▆▆▅▄▃▁                                
  ▁▁▁▂▂▂▃▄▄▆▇██████████████████▇▆▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  9.78 μs         Histogram: frequency by time          17 μs <

 Memory estimate: 6.76 KiB, allocs estimate: 24.

BenchmarkTools.TrialJudgement: 
  time:   -91.43% => improvement (5.00% tolerance)
  memory: -95.41% => improvement (1.00% tolerance)

Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.32%. Comparing base (669680a) to head (2b1d8a2).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/topologies/halfedge.jl 98.21% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1183      +/-   ##
==========================================
+ Coverage   88.13%   88.32%   +0.18%     
==========================================
  Files         196      196              
  Lines        6084     6097      +13     
==========================================
+ Hits         5362     5385      +23     
+ Misses        722      712      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@juliohm juliohm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @halleysfifthinc ! Thank you for trying to improve the performance of this constructor. That is super welcome.

Could you please comment on the need to enforce type annotations? Do they really contribute to the performance gains you are seeing? If we could avoid type annotations as much as possible, that would be ideal.

I left some review comments in a first round of review. Happy to review the other parts of the code later.

@halleysfifthinc
Copy link
Contributor Author

halleysfifthinc commented Apr 8, 2025

The type-annotations were added based on investigation with Cthulhu.jl. The type-annotations benefit the later code by allowing the compiler to work with more specific types1. Justification for each annotation given in the review comments.

In Blender, I created a test mesh (torus, 72 quads), and the benchmarks vs no type annotations is:

All quads, no type-annots benchmark

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  626.260 μs   32.464 ms  ┊ GC (min  max): 0.00%  97.20%
 Time  (median):     674.720 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   762.418 μs ± 731.223 μs  ┊ GC (mean ± σ):  7.88% ±  8.53%

   ▃▅▇██▇▆▅▅▄▄▃▂▂▂▁▁▁      ▁▃▄▃▂▁▁▁                             ▂
  ███████████████████████▇████████████▇▆▄▆▅▅▅▆▅▆▄▆▄▅▃▅▃▅▅▅▃▆▅▆▃ █
  626 μs        Histogram: log(frequency) by time       1.11 ms <

 Memory estimate: 573.52 KiB, allocs estimate: 14665.

BenchmarkTools.TrialJudgement: 
  time:   +323.22% => regression (5.00% tolerance)
  memory: +271.06% => regression (1.00% tolerance)

I then split half the quads into tris (36 quads, 72 tris) and got:

Quads and tris; no type-annots benchmark

BenchmarkTools.Trial: 7528 samples with 1 evaluation per sample.
 Range (min  max):  3.641 ms   35.721 ms  ┊ GC (min  max): 0.00%  87.22%
 Time  (median):     3.837 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.971 ms ± 970.236 μs  ┊ GC (mean ± σ):  2.81% ±  7.91%

  ▆█▆▂  ▂▁                                                    ▁
  █████▇██▆▅▆▄▅▅▅▄▄▁▄▃▄▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆▇ █
  3.64 ms      Histogram: log(frequency) by time      10.7 ms <

 Memory estimate: 1.32 MiB, allocs estimate: 39578.

BenchmarkTools.TrialJudgement: 
  time:   +348.65% => regression (5.00% tolerance)
  memory: +216.15% => regression (1.00% tolerance)

Footnotes

  1. It seems that the type annotations alone (without the other changes) might actually make things slower, but when combined, enhances the benefit of the other changes.

@juliohm
Copy link
Member

juliohm commented Apr 8, 2025

That is very nice. Could you please share a MWE so that I can reproduce benchmark results and test the PR locally? :)

@halleysfifthinc
Copy link
Contributor Author

Quads and mixed faces meshes here.

using Meshes, GeoIO, BenchmarkTools

quads = GeoIO.load("quads.obj").geometry
quads_tris = GeoIO.load("quads_tris.obj").geometry

@benchmark topoconvert(HalfEdgeTopology, quads)
@benchmark topoconvert(HalfEdgeTopology, quads_tris)

adjsort is responsible for ~85% of the runtime and ~50% of allocations (count), but the HalfEdgeTopology(::AbstractVector{Tuple{HalfEdge,HalfEdge}}) constructor contributes ~80% of total allocated amount (from the Dicts). So although 47fa3f4 only removes a few individual allocations, they are larger allocations and save a decent amount of memory.

@halleysfifthinc halleysfifthinc force-pushed the halfedge-perf branch 2 times, most recently from 1d9cf83 to 71a2312 Compare April 14, 2025 23:40
@halleysfifthinc
Copy link
Contributor Author

I reimplemented adjsort as adjsortperm, which now returns a sort permutation (faster, and removes the need for the indexin call, which was a non-negligible runtime contributor). Before/after benchmarks have been updated in the top comment (its quite a bit faster).

Along the way, I found a weird mesh/topo that is manifold, however, the old adjsort led to one face being missing from the final converted topology. Despite my best efforts, I could not determine what/where/why the problem was. This gist contains two meshes which are identical save for the triangle orders and orientations (same vertices). I am happy to include the bad mesh as a test if you wish.

@halleysfifthinc
Copy link
Contributor Author

Failed CI doesn't seem related.

@juliohm
Copy link
Member

juliohm commented Apr 15, 2025

@halleysfifthinc thank you for refactoring the code.

I refactored it a bit more to make sure the new algorithm is easy to read. Do we still need to reintroduce type annotations to improve performance? Appreciate it if you could take another look. If type annotations are necessary, we could start with annotating the return type of adjsortperm, by typing einds::Vector{Vector{Int}} at the last line of the function.

If we could encapsulate these type annotations in the adjsortperm function, that would decrease our chances to break type instability in the future.

@juliohm
Copy link
Member

juliohm commented Apr 15, 2025

Along the way, I found a weird mesh/topo that is manifold, however, the old adjsort led to one face being missing from the final converted topology.

Do you mean that the old implementation had a bug? Is it present in the new implementation in this PR?

@halleysfifthinc halleysfifthinc marked this pull request as draft April 17, 2025 02:05
@halleysfifthinc
Copy link
Contributor Author

Just pushed yet another refactor, this version is much faster still (and as an unintentional bonus, now actually properly separates connected components, which as previously discussed, is something I need).

I converted to a draft, as I am still unsatisfied with some behaviors, but can't continue spending time on this for the moment.

I will update the benchmarks when I get a chance.

@juliohm
Copy link
Member

juliohm commented Apr 17, 2025

That is really amazing @halleysfifthinc. The new connected components function is quite useful in other contexts. We should probably convert it to a utility function after this PR is merged. Looking forward to it!

@halleysfifthinc
Copy link
Contributor Author

Benchmarks updated (now specifically for adjsort/adjsortperm)!

Re: connected_components utility, I agree. I'd like to see it exported, or public at least, so I can use it and avoid the Graphs.jl dependency.

Do you mean that the old implementation had a bug? Is it present in the new implementation in this PR?

Yes, the old adjsort was leading to one face not being added/recognized in the final HalfEdgeTopology. That specific problem is no longer present with the new adjsortperm implementation.

However, the old adjsort was only part of the problem, I believe the main HalfEdgeTopology(::Vector{<:Connectivity} function is incorrect/unreliable for certain mixes of face orientations. I have a mesh1 which has inconsistencies after being converted to a HalfEdgeTopology (next or prev edges not matching the current elem). I have been able to determine that the bug is related to inconsistent orientations among the faces, but I am struggling to find any logical errors in the existing orientation handling code here and above

if !CCW[e]
# reinsert pairs in CCW orientation
for i in 1:n
half4pair[(v[i + 1], v[i])] = HalfEdge(v[i + 1], eleminds[e])
end
end

Footnotes

  1. I can't share the mesh, and I don't understand the cause of the bug well enough to simplify the mesh like I did for the meshes attached to my earlier comment)

@juliohm
Copy link
Member

juliohm commented Apr 21, 2025

Benchmarks updated (now specifically for adjsort/adjsortperm)!

🚀

Please let us know when the PR is ready for review @halleysfifthinc 🙂

@halleysfifthinc
Copy link
Contributor Author

tl;dr: The face ordering with the new connected_components/adjsortperm isn't adjacently sorted under the strictest definition (subsequent faces share at least one already existing edge). Obviously, it is your call whether this is acceptable, but the updated HalfEdgeTopology(::Vector{<:Connectivity}) function handles the current (and previous) behavior. (Although nearly all of the performance gains are from the refactored adjacency sort.)

I determined that the bug I was hunting is because adjacency is now functionally defined as any face that has >2 previously observed vertices. This allows a face (triangle 4 in the photo) to be added, even if all of its edges are new/previously unobserved. The orientation of this triangle cannot be checked against existing faces, and we can end up with irreparable inconsistencies when the "hole" (triangle 5 in the photo) is filled (if the orientation of triangle 4 is ultimately inconsistent with the correct orientation of triangle 5).

That bug is not present in the original code on master.

@juliohm
Copy link
Member

juliohm commented Apr 29, 2025

Thank you for the update @halleysfifthinc. Before I spend time reviewing the minor details in the diff, could you please confirm the nature of the changes in this PR?

Is it correct to say that there are

  1. Changes in the HalfEdgeTopology constructor with vector of Connectivity to improve allocations and type stability
  2. Changes in the adjsortperm to improve runtime in terms of a new connected_components function

and nothing else?

What about the behavior change you mentioned? Where is it coming from? You mean that the new adjsortperm implementation is not equivalent to the current implementation in master? What is the new definition of adjacency if not >2 vertices in common?

I wonder if we could split this PR into smaller PRs that are easier to review and test. The HalfEdgeTopology is used in various topological relations and algorithms, and we should do our best to avoid introducing new bugs.

@halleysfifthinc
Copy link
Contributor Author

Is it correct to say that there are

Yes that is a correct summary. I'm happy to split this up for easier review if you would like.

You mean that the new adjsortperm implementation is not equivalent to the current implementation in master?

Yes, the new adjsortperm does not produce identical sorts to the current implementation in master. This is partially by design (the current adjsort is not stable due to the reverse iteration order, which means that sorting a sorted Vector{<:Connectivity} still changes the face order).

What is the new definition of adjacency if not >2 vertices in common?

The old definition was theoretically defined by >2 previously observed vertices, but due to the way adjsort was implemented, the actual functioning definition was >1 previously observed edge (defined by a pair of previously observed vertices). The distinction is subtle (and therefore took me a while to realize), but you can have a face that shares 2 previously observed vertices without those vertices having been previously observed as an edge/pair (e.g. triangle 4 in the above photo diagram).

I won't have the time to investigate the new test failures for a few days.

@juliohm
Copy link
Member

juliohm commented Apr 29, 2025

I'm happy to split this up for easier review if you would like.

That would be super appreciated. Smaller PRs with all tests passing can be quickly merged and shared with all users of the package via patch releases. If you can identify subsets of changes that just improve performance, without behavior change, we can quickly review and approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants