Skip to content

perf: replace recursive closure in Text() with iterative subtree walk#556

Closed
jvoisin wants to merge 1 commit into
PuerkitoBio:masterfrom
jvoisin:iter-text
Closed

perf: replace recursive closure in Text() with iterative subtree walk#556
jvoisin wants to merge 1 commit into
PuerkitoBio:masterfrom
jvoisin:iter-text

Conversation

@jvoisin
Copy link
Copy Markdown
Contributor

@jvoisin jvoisin commented May 26, 2026

Text() built a self-referencing closure var f func(*html.Node); f = ... to recurse into descendants and accumulate text. The closure capture of its own variable forces the compiler to box f, and each recursive call pays a Go stack-frame setup.

This commit replaces it with an iterative depth-first walk using FirstChild/NextSibling/Parent pointers, anchored at each root in s.Nodes so the walk never escapes the original subtree.

Allocations are unchanged (the remaining 6/15/19 allocs come from strings.Builder grow-doubling), but ns/op drops by roughly 30-40% across shallow and deep subtrees:

name old time/op new time/op delta
Text-8 1.86µs 1.11µs -40.4% (h2, n=14)
TextDeep-8 126µs 86µs -31.6% (whole body)
TextManyShallow-8 40.9µs 29.1µs -28.9% (all anchors)

All p=0.000 over 10 runs on arm64. B/op and allocs/op are bit-for-bit identical in all three benchmarks.

Text() built a self-referencing closure `var f func(*html.Node); f = ...`
to recurse into descendants and accumulate text. The closure capture of
its own variable forces the compiler to box `f`, and each recursive call
pays a Go stack-frame setup.

This commit replaces it with an iterative depth-first walk using
FirstChild/NextSibling/Parent pointers, anchored at each root in s.Nodes so the
walk never escapes the original subtree.

Allocations are unchanged (the remaining 6/15/19 allocs come from
strings.Builder grow-doubling), but ns/op drops by roughly 30-40%
across shallow and deep subtrees:

name                old time/op  new time/op  delta
Text-8              1.86µs       1.11µs       -40.4%   (h2, n=14)
TextDeep-8          126µs        86µs         -31.6%   (whole body)
TextManyShallow-8   40.9µs       29.1µs       -28.9%   (all anchors)

All p=0.000 over 10 runs on arm64. B/op and allocs/op are bit-for-bit
identical in all three benchmarks.
@mna
Copy link
Copy Markdown
Member

mna commented May 27, 2026

I'm not sold on that one, the resulting implementation is harder to read and has subtle conditions. I think just getting rid of the closure could be enough, and keep the recursive approach.

@jvoisin
Copy link
Copy Markdown
Contributor Author

jvoisin commented May 27, 2026

I guess it's a matter of taste, I prefer the explicit iterative version with comments.

But alright, I'll try a version using an outlined function instead of a closure, benchmark it, and will send a pull-request if it's improving things.

@jvoisin jvoisin closed this May 27, 2026
@jvoisin
Copy link
Copy Markdown
Contributor Author

jvoisin commented May 27, 2026

See #561

@jvoisin jvoisin deleted the iter-text branch May 27, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants