tier-4 (CRDs / projection): deep-copy PolicyByRuleId.AllowedProcesses by entlein · Pull Request #51 · k8sstormcenter/node-agent

entlein · 2026-05-20T16:37:48Z

Tier 4 of the v0.3.26 release chain

CRDs / projection consumer layer. Branches off node-agent#50 (tier-3 network). Pairs with storage main / tier-2 (the CollapseConfig CRD has been on storage main since PR #31; no storage tier-4 branch needed).

Addresses the CodeRabbit PR #43 Major finding on pkg/objectcache/containerprofilecache/projection_apply.go:44, deferred from the NA #46 batch-1 PR body:

projection_apply.go (PolicyByRuleId site) — Shallow-copy hazard for nested maps.

Problem

maps.Copy(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId)

maps.Copy value-copies the RulePolicy struct into each projected map entry, but RulePolicy.AllowedProcesses is a []string — the slice header is copied (each map entry has its own header), but the underlying backing array is shared with the source profile. A downstream append on the projected slice (or any in-place mutation) silently extends the source slice's backing array, breaking the projection's pure-transform contract.

Same shape of hazard the ExecCall.Args clone already addresses in extractExecsByPath.

Fix

Iterate cp.Spec.PolicyByRuleId and deep-copy each entry — fresh []string for AllowedProcesses, copy elements, then build the projected RulePolicy from the new slice plus the scalar AllowedContainer. Source profile is now isolated from any downstream mutation on the projected map's slice values.

Pinning test

TestApply_PolicyByRuleId_AllowedProcessesDeepCopied constructs a profile with {"R1006": {AllowedProcesses: ["runc"]}}, projects it, appends "evilshim" to the projected slice, and asserts the source's slice is unchanged. Fails on the pre-fix code (shared backing array extends both).

Why no benchmarks

Projection runs once per profile load / refresh (cold path), not per event. The deep-copy adds an O(N) allocation per projected entry where N is len(AllowedProcesses) — typically 1-3 entries with 1-5 processes each. No meaningful change in hot-path cost.

Component test evidence

Run 26174376833 (NA Component Tests on release/v0.3.26/tier-4-crds with STORAGE_TAG=tier2 + STORAGE_REF=1159de1f):

25/27 component matrix legs green — same shape as tier-1 / tier-2 baseline
Test_16_ApNotStuckOnRestart ✅ — confirms tier-3's one-off failure was an environmental flake (no NA code change between tier-3 and tier-4 that could have fixed Test_16; the deep-copy fix isolates a projection contract that doesn't touch Test_16's path)
Pre-existing flakes: Test_09_FalsePositiveTest + Test_10_MalwareDetectionTest (same as every tier and main baseline)
1 workflow-plumbing trigger-integration-tests failure — same auto-trigger branch-name issue addressed in storage#36

Test plan

go build ./... clean
go test ./pkg/objectcache/containerprofilecache/... -count=1 green
TestApply_PolicyByRuleId_AllowedProcessesDeepCopied pin passes; fails on pre-fix code (verified by reverting projection_apply.go under stash + re-running)
Component tests 25/27 green on paired tier-2 image (run 26174376833)
Test_16 confirmed flake — green on this tier's CT

…true) Addresses Matthias's upstream PR kubescape#811 blocking review (2026-05-19T10:35Z): Blocking: projectField() still keeps dynamic open paths only in cp.Opens.Patterns when cp.Opens.All == true. Removing the Patterns scan here makes ap.was_path_opened_with_suffix() / _with_prefix() return false for dynamic entries in pass-through mode, so rules that omit profileDataRequired.opens regress. If the old strings.HasPrefix/HasSuffix behavior on pattern text is too permissive, we still need a narrower fallback here instead of ignoring Patterns entirely. Context ------- Pre-PR-kubescape#811, CodeRabbit (PR #43 review on open.go:79) flagged that running strings.HasSuffix("/var/log/pods/*/volumes/...", "foo.log") returns false even though the pattern's concrete realisations might end in "foo.log" — a false negative. We addressed it by SKIPPING the Patterns scan in the All=true branch. Matthias's review reverses that: skipping is too aggressive and silently regresses rules that omit profileDataRequired.opens (which rely on the All=true pass-through behavior). Fix --- Narrower fallback — when Opens.All == true and the Values scan didn't match, scan Patterns with split semantics: patternConcreteSuffix(p) → literal text after the LAST wildcard segment in p; "" if p ends in a wildcard segment. patternConcretePrefix(p) → literal text before the FIRST wildcard segment in p; "" if p starts with a wildcard segment. Suffix/prefix matcher then: for each pattern: tail/head := patternConcrete{Suffix,Prefix}(pattern) if tail/head == "": return true // wildcard tail/head → permissive (concrete // realisations could match ANY suffix/prefix) if strings.HasSuffix/HasPrefix(tail/head, query): return true Boundary slashes are KEPT in the returned concrete suffix/prefix so queries with leading/trailing slashes match correctly (every concrete realisation has the boundary slash too). Tests ----- Updated previous "PatternsNotScanned" tests (which pinned the pre-kubescape#811 CR fix) to reflect Matthias's new contract: TestWasPathOpenedWithSuffix_PatternsScannedWithConcreteTail pattern "/var/log/⋯/foo.log", query ".log" → true (tail "/foo.log" has suffix ".log") pattern "/var/log/⋯/foo.log", query ".txt" → false TestWasPathOpenedWithSuffix_PatternWildcardTail_Permissive pattern "/var/log/pods/*", any suffix → true TestWasPathOpenedWithPrefix_PatternsScannedWithConcreteHead pattern "/var/⋯/log/foo", query "/var/" → true (head "/var/" has prefix "/var/") pattern "/var/⋯/log/foo", query "/etc/" → false TestWasPathOpenedWithPrefix_PatternWildcardHead_Permissive pattern "*/run", any prefix → true TestPatternConcreteSuffix_AndPrefix pins the splitter helpers against 7 representative shapes (no_wildcards, trailing_star, leading_star, mid_ellipsis, both_mid, lone_star, lone_ellipsis). Benchmarks (new file: open_bench_test.go) ----------------------------------------- Helper itself — zero allocation across all shapes: BenchmarkPatternConcreteSuffix//var/log/⋯/foo.log 11.5 ns/op 0 B/op 0 allocs/op BenchmarkPatternConcreteSuffix//var/log/pods/* 10.0 ns/op 0 B/op 0 allocs/op BenchmarkPatternConcreteSuffix//var/log/foo.log 9.4 ns/op 0 B/op 0 allocs/op BenchmarkPatternConcreteSuffix/* 1.6 ns/op 0 B/op 0 allocs/op BenchmarkPatternConcreteSuffix//var/⋯/log/⋯/foo.log 14.8 ns/op 0 B/op 0 allocs/op Full matcher (50 Values + 10 Patterns) — the Patterns scan adds ~20 ns on top of the Values-only path with no extra allocations (the steady 56 B/op / 3 allocs/op is pre-existing CEL value-wrapping cost, unrelated to this change): BenchmarkWasPathOpenedWithSuffix_AllMode/values_only 372 ns/op 56 B/op 3 allocs/op BenchmarkWasPathOpenedWithSuffix_AllMode/patterns_concrete 391 ns/op 56 B/op 3 allocs/op BenchmarkWasPathOpenedWithSuffix_AllMode/patterns_wildcard 393 ns/op 56 B/op 3 allocs/op BenchmarkWasPathOpenedWithPrefix_AllMode/values_only 351 ns/op 56 B/op 3 allocs/op BenchmarkWasPathOpenedWithPrefix_AllMode/patterns_concrete 395 ns/op 56 B/op 3 allocs/op BenchmarkWasPathOpenedWithPrefix_AllMode/patterns_wildcard 428 ns/op 56 B/op 3 allocs/op

Addresses the CodeRabbit PR #43 Major finding on `pkg/rulemanager/cel/libraries/applicationprofile/exec.go:144`, deferred from the NA #46 batch-1 PR body. Tracked there as: exec.go:144 — Exact path match should return false when args don't match, not fall through to pattern matching. Problem ------- Pre-fix `wasExecutedWithArgs` flow on exact-path hit: if pathStr in cp.Execs.Values: if pathStr in cp.ExecsByPath: if CompareExecArgs(...) → return true // else: FALL THROUGH to pattern matching else: return true (back-compat no-constraint) for pattern in cp.Execs.Patterns: ... The fall-through on `CompareExecArgs == false` would let a LOOSER pattern (e.g. `/usr/bin/*` with profileArgs `[*]`) shadow the exact-path entry's strict args constraint. A rule-author who lists `{Path: /usr/bin/curl, Args: [-X, GET]}` expects that ONLY GET requests pass — but if their profile also contains a permissive `/usr/bin/⋯` pattern, the fall-through silently allows DELETE/POST/… via the pattern. Fix --- When the exact path IS in Values AND ExecsByPath has an entry, the constraint result IS the answer for this path. Return it directly — no fall-through: if pathStr in cp.Execs.Values: if pathStr in cp.ExecsByPath: return CompareExecArgs(profileArgs, runtimeArgs) // else: back-compat no-constraint return true // path NOT in Values — try Patterns for pattern in cp.Execs.Patterns: ... Pattern-loop fall-through stays as-is — different patterns CAN encode different `(path, args)` shapes and a non-matching one shouldn't kill the loop. The asymmetry is intentional: exact-path-with-constraint is producer-authored authoritative, pattern-with-constraint is one candidate among many. Bench (`-count=3 -benchtime=200ms`) ----------------------------------- exact_path_args_match ~890 ns/op 2136 B/op 20 allocs/op exact_path_args_mismatch ~860 ns/op 2136 B/op 20 allocs/op pattern_path_args_match ~450 ns/op 360 B/op 13 allocs/op The exact-path mismatch case is no longer paying the pattern-loop cost it used to (and used to silently return true via the pattern shadow). The remaining alloc/op floor is pre-existing CEL value wrapping (matcher unchanged).

…AddressPortProtocol No code change to the matchers — main already has the correct matchIPField + matchDNSField shape (ProjectedField.All is explicitly NOT consulted as a short-circuit; see line 24-29 of network.go, which addresses CodeRabbit PR #43 R-NET-7). This PR captures the per-call cost of the network matchers as a documented baseline against future refactors. Numbers below are the median of 3 runs, count=3, benchtime=200ms, arm64. matchIPField (Values-hit / Patterns-CIDR / `*` sentinel / miss): values_only_hit 6.1 ns/op 0 B/op 0 allocs/op patterns_cidr_hit 263 ns/op 248 B/op 12 allocs/op any_sentinel 78 ns/op 80 B/op 2 allocs/op miss 47 ns/op 8 B/op 1 allocs/op matchDNSField (Values-exact / canon / Patterns-wildcard / miss): values_exact_hit 6.5 ns/op 0 B/op 0 allocs/op values_trailing_dot_canon 6.5 ns/op 0 B/op 0 allocs/op patterns_leading_wildcard_hit 234 ns/op 184 B/op 4 allocs/op miss 21 ns/op 0 B/op 0 allocs/op wasAddressPortProtocolInEgress (full CEL surface): hit_via_values 100 ns/op 120 B/op 8 allocs/op hit_via_cidr 269 ns/op 280 B/op 15 allocs/op miss 236 ns/op 256 B/op 13 allocs/op Known gap (NOT addressed in tier-3, deferred to projection-v2): wasAddressPortProtocolIn* currently ignores port + protocol. The matcher validates their shape (port range, protocol string type) but the actual match only consults the address via matchIPField. See the existing comment at network.go:164-166. Fixing this needs an AddressPortsByAddr projection slice on ProjectedField; that's a coordinated projection-side change and will land separately. CodeRabbit PR #43 noted this as a Major; the production impact is bounded because rules that genuinely care about (address, port, protocol) triples don't currently exist in the default rule set. The bench numbers show the CEL value-wrapping overhead (~50-100 ns, ~100 B, ~8 allocs per call) dominates the matcher cost — that's pre-existing and shared across all CEL libraries.

Addresses the CodeRabbit PR #43 Major finding on `pkg/objectcache/containerprofilecache/projection_apply.go:44`, deferred from the NA #46 batch-1 PR body: projection_apply.go (PolicyByRuleId site) — Shallow-copy hazard for nested maps. Problem ------- `maps.Copy(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId)` value-copies the `RulePolicy` struct into each projected map entry, but `RulePolicy.AllowedProcesses` is a `[]string` — the slice header is copied (each map entry has its own header), but the underlying backing array is shared with the source profile. A downstream `append` on the projected slice (or any in-place mutation) silently extends the source slice's backing array, breaking the projection's pure-transform contract. Same shape of hazard the ExecCall.Args clone already addresses in `extractExecsByPath`. Fix --- Iterate `cp.Spec.PolicyByRuleId` and deep-copy each entry: allocate a fresh `[]string` for `AllowedProcesses`, copy the elements, then build the projected RulePolicy from the new slice + the scalar AllowedContainer. Source profile is now isolated from any downstream mutation on the projected map's slice values. `maps` package import dropped since we no longer call `maps.Copy`. Pinning test ------------ `TestApply_PolicyByRuleId_AllowedProcessesDeepCopied` constructs a profile with `{"R1006": {AllowedProcesses: ["runc"]}}`, projects it, appends `"evilshim"` to the projected slice, and asserts the source's slice is unchanged. Fails on the pre-fix code (shared backing array extends both).

coderabbitai · 2026-05-20T16:37:57Z

Warning

Rate limit exceeded

@entlein has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 24 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d9adcb32-a87f-4f6a-b233-d68521fc5834

📥 Commits

Reviewing files that changed from the base of the PR and between 58247d7 and 813c2df.

📒 Files selected for processing (8)

pkg/objectcache/containerprofilecache/projection_apply.go
pkg/objectcache/containerprofilecache/projection_apply_test.go
pkg/rulemanager/cel/libraries/applicationprofile/exec.go
pkg/rulemanager/cel/libraries/applicationprofile/exec_bench_test.go
pkg/rulemanager/cel/libraries/applicationprofile/open.go
pkg/rulemanager/cel/libraries/applicationprofile/open_bench_test.go
pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
pkg/rulemanager/cel/libraries/networkneighborhood/network_bench_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch release/v0.3.26/tier-4-crds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-20T17:17:13Z

Performance Benchmark Results

Node-Agent Resource Usage

Metric	BEFORE	AFTER	Delta
Avg CPU (cores)	0.171	0.179	+4.6%
Peak CPU (cores)	0.180	0.192	+6.9%
Avg Memory (MiB)	300.333	269.078	-10.4%
Peak Memory (MiB)	301.426	275.871	-8.5%

Dedup Effectiveness (AFTER only)

Event Type	Passed	Deduped	Ratio
capabilities	0	0	N/A
hardlink	6000	0	0.0%
http	1711	119449	98.6%
network	904	77944	98.9%
open	35392	619925	94.6%
symlink	6000	0	0.0%
syscall	982	1892	65.8%

Event Counters

Metric	BEFORE	AFTER
capability_counter	10	7
dns_counter	1438	1433
exec_counter	7195	7194
network_counter	94626	94570
open_counter	787396	786367
syscall_counter	3495	3504

Entlein added 4 commits May 19, 2026 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tier-4 (CRDs / projection): deep-copy PolicyByRuleId.AllowedProcesses#51

tier-4 (CRDs / projection): deep-copy PolicyByRuleId.AllowedProcesses#51
entlein wants to merge 4 commits into
mainfrom
release/v0.3.26/tier-4-crds

entlein commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026

Rate limit exceeded

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

entlein commented May 20, 2026

Tier 4 of the v0.3.26 release chain

Problem

Fix

Pinning test

Why no benchmarks

Component test evidence

Test plan

Uh oh!

coderabbitai Bot commented May 20, 2026

Rate limit exceeded

Uh oh!

github-actions Bot commented May 20, 2026

Performance Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant