Skip to content

tier-4 (CRDs / projection): deep-copy PolicyByRuleId.AllowedProcesses#51

Open
entlein wants to merge 4 commits into
mainfrom
release/v0.3.26/tier-4-crds
Open

tier-4 (CRDs / projection): deep-copy PolicyByRuleId.AllowedProcesses#51
entlein wants to merge 4 commits into
mainfrom
release/v0.3.26/tier-4-crds

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 20, 2026

Tier 4 of the v0.3.26 release chain

CRDs / projection consumer layer. Branches off node-agent#50 (tier-3 network). Pairs with storage main / tier-2 (the CollapseConfig CRD has been on storage main since PR #31; no storage tier-4 branch needed).

Addresses the CodeRabbit PR #43 Major finding on pkg/objectcache/containerprofilecache/projection_apply.go:44, deferred from the NA #46 batch-1 PR body:

projection_apply.go (PolicyByRuleId site) — Shallow-copy hazard for nested maps.

Problem

maps.Copy(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId)

maps.Copy value-copies the RulePolicy struct into each projected map entry, but RulePolicy.AllowedProcesses is a []string — the slice header is copied (each map entry has its own header), but the underlying backing array is shared with the source profile. A downstream append on the projected slice (or any in-place mutation) silently extends the source slice's backing array, breaking the projection's pure-transform contract.

Same shape of hazard the ExecCall.Args clone already addresses in extractExecsByPath.

Fix

Iterate cp.Spec.PolicyByRuleId and deep-copy each entry — fresh []string for AllowedProcesses, copy elements, then build the projected RulePolicy from the new slice plus the scalar AllowedContainer. Source profile is now isolated from any downstream mutation on the projected map's slice values.

Pinning test

TestApply_PolicyByRuleId_AllowedProcessesDeepCopied constructs a profile with {"R1006": {AllowedProcesses: ["runc"]}}, projects it, appends "evilshim" to the projected slice, and asserts the source's slice is unchanged. Fails on the pre-fix code (shared backing array extends both).

Why no benchmarks

Projection runs once per profile load / refresh (cold path), not per event. The deep-copy adds an O(N) allocation per projected entry where N is len(AllowedProcesses) — typically 1-3 entries with 1-5 processes each. No meaningful change in hot-path cost.

Component test evidence

Run 26174376833 (NA Component Tests on release/v0.3.26/tier-4-crds with STORAGE_TAG=tier2 + STORAGE_REF=1159de1f):

  • 25/27 component matrix legs green — same shape as tier-1 / tier-2 baseline
  • Test_16_ApNotStuckOnRestart ✅ — confirms tier-3's one-off failure was an environmental flake (no NA code change between tier-3 and tier-4 that could have fixed Test_16; the deep-copy fix isolates a projection contract that doesn't touch Test_16's path)
  • Pre-existing flakes: Test_09_FalsePositiveTest + Test_10_MalwareDetectionTest (same as every tier and main baseline)
  • 1 workflow-plumbing trigger-integration-tests failure — same auto-trigger branch-name issue addressed in storage#36

Test plan

  • go build ./... clean
  • go test ./pkg/objectcache/containerprofilecache/... -count=1 green
  • TestApply_PolicyByRuleId_AllowedProcessesDeepCopied pin passes; fails on pre-fix code (verified by reverting projection_apply.go under stash + re-running)
  • Component tests 25/27 green on paired tier-2 image (run 26174376833)
  • Test_16 confirmed flake — green on this tier's CT

Entlein added 4 commits May 19, 2026 21:31
…true)

Addresses Matthias's upstream PR kubescape#811 blocking review (2026-05-19T10:35Z):

  Blocking: projectField() still keeps dynamic open paths only in
  cp.Opens.Patterns when cp.Opens.All == true. Removing the Patterns
  scan here makes ap.was_path_opened_with_suffix() / _with_prefix()
  return false for dynamic entries in pass-through mode, so rules
  that omit profileDataRequired.opens regress. If the old
  strings.HasPrefix/HasSuffix behavior on pattern text is too
  permissive, we still need a narrower fallback here instead of
  ignoring Patterns entirely.

Context
-------

Pre-PR-kubescape#811, CodeRabbit (PR #43 review on open.go:79) flagged that
running strings.HasSuffix("/var/log/pods/*/volumes/...", "foo.log")
returns false even though the pattern's concrete realisations might
end in "foo.log" — a false negative. We addressed it by SKIPPING
the Patterns scan in the All=true branch. Matthias's review reverses
that: skipping is too aggressive and silently regresses rules that
omit profileDataRequired.opens (which rely on the All=true pass-through
behavior).

Fix
---

Narrower fallback — when Opens.All == true and the Values scan didn't
match, scan Patterns with split semantics:

  patternConcreteSuffix(p)  → literal text after the LAST wildcard
                              segment in p; "" if p ends in a wildcard
                              segment.
  patternConcretePrefix(p)  → literal text before the FIRST wildcard
                              segment in p; "" if p starts with a
                              wildcard segment.

Suffix/prefix matcher then:
  for each pattern:
    tail/head := patternConcrete{Suffix,Prefix}(pattern)
    if tail/head == "":
        return true   // wildcard tail/head → permissive (concrete
                      // realisations could match ANY suffix/prefix)
    if strings.HasSuffix/HasPrefix(tail/head, query):
        return true

Boundary slashes are KEPT in the returned concrete suffix/prefix so
queries with leading/trailing slashes match correctly (every concrete
realisation has the boundary slash too).

Tests
-----

Updated previous "PatternsNotScanned" tests (which pinned the
pre-kubescape#811 CR fix) to reflect Matthias's new contract:

  TestWasPathOpenedWithSuffix_PatternsScannedWithConcreteTail
    pattern "/var/log/⋯/foo.log", query ".log"   → true (tail "/foo.log" has suffix ".log")
    pattern "/var/log/⋯/foo.log", query ".txt"   → false

  TestWasPathOpenedWithSuffix_PatternWildcardTail_Permissive
    pattern "/var/log/pods/*", any suffix         → true

  TestWasPathOpenedWithPrefix_PatternsScannedWithConcreteHead
    pattern "/var/⋯/log/foo", query "/var/"      → true (head "/var/" has prefix "/var/")
    pattern "/var/⋯/log/foo", query "/etc/"      → false

  TestWasPathOpenedWithPrefix_PatternWildcardHead_Permissive
    pattern "*/run", any prefix                   → true

  TestPatternConcreteSuffix_AndPrefix
    pins the splitter helpers against 7 representative shapes
    (no_wildcards, trailing_star, leading_star, mid_ellipsis,
    both_mid, lone_star, lone_ellipsis).

Benchmarks (new file: open_bench_test.go)
-----------------------------------------

Helper itself — zero allocation across all shapes:

  BenchmarkPatternConcreteSuffix//var/log/⋯/foo.log   11.5 ns/op  0 B/op  0 allocs/op
  BenchmarkPatternConcreteSuffix//var/log/pods/*      10.0 ns/op  0 B/op  0 allocs/op
  BenchmarkPatternConcreteSuffix//var/log/foo.log      9.4 ns/op  0 B/op  0 allocs/op
  BenchmarkPatternConcreteSuffix/*                     1.6 ns/op  0 B/op  0 allocs/op
  BenchmarkPatternConcreteSuffix//var/⋯/log/⋯/foo.log 14.8 ns/op  0 B/op  0 allocs/op

Full matcher (50 Values + 10 Patterns) — the Patterns scan adds
~20 ns on top of the Values-only path with no extra allocations
(the steady 56 B/op / 3 allocs/op is pre-existing CEL value-wrapping
cost, unrelated to this change):

  BenchmarkWasPathOpenedWithSuffix_AllMode/values_only         372 ns/op  56 B/op  3 allocs/op
  BenchmarkWasPathOpenedWithSuffix_AllMode/patterns_concrete   391 ns/op  56 B/op  3 allocs/op
  BenchmarkWasPathOpenedWithSuffix_AllMode/patterns_wildcard   393 ns/op  56 B/op  3 allocs/op
  BenchmarkWasPathOpenedWithPrefix_AllMode/values_only         351 ns/op  56 B/op  3 allocs/op
  BenchmarkWasPathOpenedWithPrefix_AllMode/patterns_concrete   395 ns/op  56 B/op  3 allocs/op
  BenchmarkWasPathOpenedWithPrefix_AllMode/patterns_wildcard   428 ns/op  56 B/op  3 allocs/op
Addresses the CodeRabbit PR #43 Major finding on
`pkg/rulemanager/cel/libraries/applicationprofile/exec.go:144`,
deferred from the NA #46 batch-1 PR body. Tracked there as:

  exec.go:144 — Exact path match should return false when args don't
  match, not fall through to pattern matching.

Problem
-------

Pre-fix `wasExecutedWithArgs` flow on exact-path hit:

  if pathStr in cp.Execs.Values:
      if pathStr in cp.ExecsByPath:
          if CompareExecArgs(...) → return true
          // else: FALL THROUGH to pattern matching
      else:
          return true (back-compat no-constraint)
  for pattern in cp.Execs.Patterns:
      ...

The fall-through on `CompareExecArgs == false` would let a LOOSER
pattern (e.g. `/usr/bin/*` with profileArgs `[*]`) shadow the
exact-path entry's strict args constraint. A rule-author who lists
`{Path: /usr/bin/curl, Args: [-X, GET]}` expects that ONLY GET
requests pass — but if their profile also contains a permissive
`/usr/bin/⋯` pattern, the fall-through silently allows DELETE/POST/…
via the pattern.

Fix
---

When the exact path IS in Values AND ExecsByPath has an entry, the
constraint result IS the answer for this path. Return it directly —
no fall-through:

  if pathStr in cp.Execs.Values:
      if pathStr in cp.ExecsByPath:
          return CompareExecArgs(profileArgs, runtimeArgs)
      // else: back-compat no-constraint
      return true
  // path NOT in Values — try Patterns
  for pattern in cp.Execs.Patterns: ...

Pattern-loop fall-through stays as-is — different patterns CAN encode
different `(path, args)` shapes and a non-matching one shouldn't kill
the loop. The asymmetry is intentional: exact-path-with-constraint is
producer-authored authoritative, pattern-with-constraint is one
candidate among many.

Bench (`-count=3 -benchtime=200ms`)
-----------------------------------

  exact_path_args_match           ~890 ns/op  2136 B/op  20 allocs/op
  exact_path_args_mismatch        ~860 ns/op  2136 B/op  20 allocs/op
  pattern_path_args_match         ~450 ns/op   360 B/op  13 allocs/op

The exact-path mismatch case is no longer paying the pattern-loop
cost it used to (and used to silently return true via the pattern
shadow). The remaining alloc/op floor is pre-existing CEL value
wrapping (matcher unchanged).
…AddressPortProtocol

No code change to the matchers — main already has the correct
matchIPField + matchDNSField shape (ProjectedField.All is explicitly
NOT consulted as a short-circuit; see line 24-29 of network.go, which
addresses CodeRabbit PR #43 R-NET-7).

This PR captures the per-call cost of the network matchers as a
documented baseline against future refactors. Numbers below are the
median of 3 runs, count=3, benchtime=200ms, arm64.

matchIPField (Values-hit / Patterns-CIDR / `*` sentinel / miss):
  values_only_hit       6.1 ns/op    0 B/op    0 allocs/op
  patterns_cidr_hit     263 ns/op   248 B/op   12 allocs/op
  any_sentinel          78  ns/op    80 B/op    2 allocs/op
  miss                  47  ns/op     8 B/op    1 allocs/op

matchDNSField (Values-exact / canon / Patterns-wildcard / miss):
  values_exact_hit              6.5 ns/op   0 B/op   0 allocs/op
  values_trailing_dot_canon     6.5 ns/op   0 B/op   0 allocs/op
  patterns_leading_wildcard_hit 234 ns/op  184 B/op  4 allocs/op
  miss                          21  ns/op    0 B/op   0 allocs/op

wasAddressPortProtocolInEgress (full CEL surface):
  hit_via_values   100 ns/op  120 B/op   8 allocs/op
  hit_via_cidr     269 ns/op  280 B/op  15 allocs/op
  miss             236 ns/op  256 B/op  13 allocs/op

Known gap (NOT addressed in tier-3, deferred to projection-v2):

  wasAddressPortProtocolIn* currently ignores port + protocol. The
  matcher validates their shape (port range, protocol string type)
  but the actual match only consults the address via matchIPField.
  See the existing comment at network.go:164-166. Fixing this needs
  an AddressPortsByAddr projection slice on ProjectedField; that's
  a coordinated projection-side change and will land separately.

  CodeRabbit PR #43 noted this as a Major; the production impact is
  bounded because rules that genuinely care about (address, port,
  protocol) triples don't currently exist in the default rule set.

The bench numbers show the CEL value-wrapping overhead (~50-100 ns,
~100 B, ~8 allocs per call) dominates the matcher cost — that's
pre-existing and shared across all CEL libraries.
Addresses the CodeRabbit PR #43 Major finding on
`pkg/objectcache/containerprofilecache/projection_apply.go:44`,
deferred from the NA #46 batch-1 PR body:

  projection_apply.go (PolicyByRuleId site) — Shallow-copy hazard
  for nested maps.

Problem
-------

`maps.Copy(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId)` value-copies
the `RulePolicy` struct into each projected map entry, but
`RulePolicy.AllowedProcesses` is a `[]string` — the slice header is
copied (each map entry has its own header), but the underlying
backing array is shared with the source profile. A downstream
`append` on the projected slice (or any in-place mutation) silently
extends the source slice's backing array, breaking the projection's
pure-transform contract. Same shape of hazard the ExecCall.Args
clone already addresses in `extractExecsByPath`.

Fix
---

Iterate `cp.Spec.PolicyByRuleId` and deep-copy each entry: allocate
a fresh `[]string` for `AllowedProcesses`, copy the elements, then
build the projected RulePolicy from the new slice + the scalar
AllowedContainer. Source profile is now isolated from any downstream
mutation on the projected map's slice values.

`maps` package import dropped since we no longer call `maps.Copy`.

Pinning test
------------

`TestApply_PolicyByRuleId_AllowedProcessesDeepCopied` constructs a
profile with `{"R1006": {AllowedProcesses: ["runc"]}}`, projects it,
appends `"evilshim"` to the projected slice, and asserts the source's
slice is unchanged. Fails on the pre-fix code (shared backing array
extends both).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@entlein has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 24 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d9adcb32-a87f-4f6a-b233-d68521fc5834

📥 Commits

Reviewing files that changed from the base of the PR and between 58247d7 and 813c2df.

📒 Files selected for processing (8)
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/projection_apply_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec_bench_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_bench_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network_bench_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/v0.3.26/tier-4-crds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.171 0.179 +4.6%
Peak CPU (cores) 0.180 0.192 +6.9%
Avg Memory (MiB) 300.333 269.078 -10.4%
Peak Memory (MiB) 301.426 275.871 -8.5%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1711 119449 98.6%
network 904 77944 98.9%
open 35392 619925 94.6%
symlink 6000 0 0.0%
syscall 982 1892 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 10 7
dns_counter 1438 1433
exec_counter 7195 7194
network_counter 94626 94570
open_counter 787396 786367
syscall_counter 3495 3504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant