Skip to content

merge: upstream/main + recover wildcards/signing on projection-v1#43

Merged
entlein merged 52 commits into
mainfrom
merge/upstream-profile-rearch
May 16, 2026
Merged

merge: upstream/main + recover wildcards/signing on projection-v1#43
entlein merged 52 commits into
mainfrom
merge/upstream-profile-rearch

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 13, 2026

Summary

Brings the node-agent fork's `main` up to upstream parity AND recovers the fork's wildcard + tamper-detection behaviour on top of upstream's projection-v1 reshape.

Supersedes PR #41 (wildcards) and PR #42 (matcher cache). The matcher cache is intentionally NOT in this branch — upstream's projection-v1 (kubescape#788/kubescape#799) already pre-compiles + caches per-profile, so a per-call matcher cache would duplicate effort.

Upstream commits merged (8)

SHA Subject Notes
kubescape#803 improve logging for missing profileDataRequired clean
kubescape#800 exec path symmetric resolver clean
kubescape#797 learning-period label on TS CPs clean
kubescape#798 syft v1.32.0-ks.2 perf clean
kubescape#791 client CA from kubelet config clean
kubescape#772 services from API clean
kubescape#799 Rule-Aware Profile Projection major reshape (53 files, +3131/-440)
kubescape#788 Replace AP+NN cache with CP major reshape (51 files, +4643/-1997)

Wildcard + signing recovery (fork-only, layered ON TOP of projection)

Network wildcards (spec §5.7, §5.8)

  • `projectField` accepts a per-surface `isDynamic` classifier (was `isPathSurface` bool). Path surfaces use `containsDynamicSegment` (⋯); network surfaces use `isNetworkIPWildcard` / `isNetworkDNSWildcard` so CIDRs, `*` sentinel, and DNS wildcards route to `Patterns` rather than `Values`.
  • `extractEgressAddresses` / `extractIngressAddresses` now pull `n.IPAddresses[]` (storage v0.0.2 list form) alongside the deprecated singular `n.IPAddress`.
  • CEL helpers (`nn.was_address_in_`, `nn.is_domain_in_`) consult `Values` + `All` + `Patterns` via `networkmatch.MatchIP` / `MatchDNS`.
  • `matchIPField` canonicalises observed IPs (`net.ParseIP`) so IPv6 expanded forms and `::ffff:` IPv4-mapped addresses still hit the Values fast path.
  • `matchDNSField` normalises trailing dots on observed names against Values.

Exec-args matching (restored on projection)

  • `ProjectedContainerProfile.ExecsByPath map[string][]string` carries the per-Path Args slice. Upstream's projection-v1 explicitly dropped argv matching as "future work"; this re-adds it cleanly.
  • `wasExecutedWithArgs` runs `dynamicpathdetector.CompareExecArgs` against the matched profile entry. Back-compat: a path with no `ExecsByPath` entry matches with no argv constraint.

Tamper detection (signing)

  • `ContainerProfileCacheImpl.tamperAlertExporter` + `tamperEmitted` fields re-added on the new upstream struct. The R1016 "Signed profile tampered" wiring keeps working.

Mock parity

  • `RuleObjectCacheMock.GetProjectedContainerProfile` mirrors the production projection: routes wildcards to Patterns, populates `ExecsByPath`.
  • `ensureProjectedAllInit` fixed (no longer mis-sets `All=true`, which is the match-any sentinel, not a comprehensiveness hint).

Known port/protocol gap (out-of-scope here, in the spec)

`was_address_port_protocol_in_*` still degrades to address-only matching on the projection. Restoring port/protocol requires an `AddressPortsByAddr` projection field that upstream noted as future work. Spec §4.7 describes per-neighbor port lists; the unit tests document the degradation (`wildcard_test.go:340-380`). No production rule in `tests/chart/templates/node-agent/default-rules.yaml` uses this helper, so component tests stay green.

Test plan

  • `go build ./...` clean
  • 46/48 packages green (`pkg/containerwatcher/v2/tracers` and `pkg/validator` are pre-existing eBPF kernel-permission failures, reproduce on main)
  • Wildcard tests: leading-* RFC 4592, mid-⋯, trailing-*, CIDR membership, IPv6 canonicalisation, IPv4-mapped, trailing-dot normalisation
  • Exec-args restored: `TestExecWithArgsInProfile` + `TestExecWithArgsWildcardInProfile` both pass
  • Fixture-walk: 20 user-facing YAML examples parse strict against v1beta1 + behave per the headers
  • Tamper detection: `pkg/objectcache/containerprofilecache` package tests green

Storage dep

`go.mod` pinned to `k8sstormcenter/storage@b23d85f0` = `merge/upstream-profile-rearch` on the storage fork (companion PR #31 there).

matthyx and others added 23 commits April 27, 2026 12:35
* feat: foundation for ContainerProfileCache unification (steps 1, 2, 5-early)

Additive-only scaffolding for the upcoming migration from the two
workload-keyed caches (applicationprofilecache + networkneighborhoodcache)
to a single container-keyed ContainerProfileCache. No consumers are
rewired yet; all new code is unused.

- Storage client: GetContainerProfile(namespace, name) on ProfileClient
  interface + *Storage impl + mock.
- ContainerProfileCache interface + stub impl (methods return zero values;
  filled in by step 3/4).
- Prometheus metrics: nodeagent_user_profile_legacy_loads_total{kind,completeness}
  deprecation counter + reconciler SLO metrics (entries gauge, hit/miss
  counter, tick duration histogram, eviction counter) registered up front
  so later steps emit cleanly.

Plan artifacts in .omc/plans/; approved by ralplan Planner/Architect/Critic
consensus (v2, iteration 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: ContainerProfileCacheImpl + projection + shared-pointer fast-path (steps 3, 3.5, 4)

- CachedContainerProfile entry with Shared/RV/UserAP/UserNNRV fields
- Option A+ fast-path: shared storage pointer when no user overlay
- projection.go ports mergeContainers/mergeNetworkNeighbors from legacy caches
- partial-profile detection with dedup'd WARN log + completeness metric label
- Event-path delete with WithLock+ReleaseLock (Critic #2 lock-gap fix)
- Unit tests T4 (projection) + T6 (callstack parity) + fast-path identity

Step 5 (reconciler) and legacy deletion land in follow-ups.

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: ContainerProfileCache reconciler with evict + refresh (step 5)

- tickLoop drives evict + refresh on one goroutine, refresh gated by atomic
- reconcileOnce evicts entries whose pod is gone or container stopped
- refreshAllEntries snapshots IDs then refreshes outside Range to avoid a
  SafeMap RLock/WLock deadlock (rebuildEntry calls Set)
- isContainerRunning(pod, entry, id): containerID primary, (Name, PodUID)
  fallback for pre-running init containers with empty ContainerID
- ctx.Err() honored inside Range callbacks for graceful shutdown
- T8 end-to-end test: user-AP mutation -> cached projection reflects change

Plan: .omc/plans/containerprofile-cache-unification-consensus.md
Consensus deltas applied: #1 (isContainerRunning signature), #3 (ctx.Err),
#4 (extend fast-skip to overlay RVs), #5 (T8 test), #7 (RPC-cost comment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: profilehelper CP->legacy-shape shims + ContainerProfileCache aggregator wiring (step 6a)

Adds the ContainerProfileCache reader to the ObjectCache aggregator interface
so profilehelper can read CP and synthesize the legacy ApplicationProfileContainer /
NetworkNeighborhoodContainer shapes for callers that haven't migrated yet.

- pkg/objectcache/objectcache_interface.go: add ContainerProfileCache() to
  aggregator interface + mock (both AP/NN stay for 6a-6c transit)
- pkg/objectcache/v1/objectcache.go: add cp field, 5-arg NewObjectCache,
  ContainerProfileCache() accessor
- pkg/objectcache/v1/mock.go: RuleObjectCacheMock implements CP surface +
  Get/SetContainerProfile test helpers, Start stub
- pkg/rulemanager/profilehelper/profilehelper.go:
  - GetContainerProfile(objectCache, id) returns (*CP, syncChecksum, error)
    — the forward API
  - GetContainerApplicationProfile + GetContainerNetworkNeighborhood rewritten
    as ~30-LOC CP->legacy-shape shims (consensus delta #2). Marked deprecated;
    step 6c deletes them after CEL callers migrate.
- cmd/main.go: construct ContainerProfileCache alongside APC+NNC, pass to
  NewObjectCache; mock-path uses ContainerProfileCacheMock
- test call sites updated for 5-arg NewObjectCache

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: migrate 20 CEL call sites to GetContainerProfile (step 6b)

- applicationprofile/{capability,exec,http,open,syscall}.go: read fields
  directly off cp.Spec instead of the per-container AP shape
- networkneighborhood/network.go: read Ingress/Egress/LabelSelector off
  cp.Spec directly
- pkg/objectcache/v1/mock.go: extend RuleObjectCacheMock so
  SetApplicationProfile / SetNetworkNeighborhood also project into the
  unified ContainerProfile, and GetContainerProfile honours the shared
  container-ID registry (preserves "invalid container ID -> no profile"
  semantics for existing tests)
- profilehelper CP->legacy shims remain in place; step 6c removes them

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: delete profilehelper shims + migrate rule_manager + creator (step 6c)

- pkg/rulemanager/profilehelper/profilehelper.go: delete
  GetContainerApplicationProfile, GetContainerNetworkNeighborhood,
  GetApplicationProfile, GetNetworkNeighborhood, GetContainerFromApplicationProfile,
  GetContainerFromNetworkNeighborhood — CP-direct API is the only surface now
- pkg/rulemanager/rule_manager.go:
  - :202, :399 call profilehelper.GetContainerProfile instead of the shim
  - HasFinalApplicationProfile reads cp via ContainerProfileCache().GetContainerProfile;
    method name preserved (external API on RuleManagerInterface per plan v2 §2.4)
- pkg/rulemanager/rulepolicy.go: Validate takes *v1beta1.ContainerProfile
  and reads cp.Spec.PolicyByRuleId
- pkg/rulemanager/ruleadapters/creator.go: both AP + NN branches use
  ContainerProfileCache().GetContainerProfileState (unified state source)

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: ObjectCache aggregator CP-only + collapse 2 callbacks to 1 (step 6d)

- pkg/objectcache/objectcache_interface.go: drop ApplicationProfileCache()
  and NetworkNeighborhoodCache() methods — the aggregator is now
  {K8s, ContainerProfile, Dns}
- pkg/objectcache/v1/objectcache.go: 3-arg NewObjectCache(k, cp, dc)
- pkg/containerwatcher/v2/container_watcher_collection.go:63-64: two
  ContainerCallback subscriptions (APC + NNC) collapse to one (CPC)
- cmd/main.go: both branches (runtime-detection + mock) construct only
  ContainerProfileCache + Dns; legacy APC/NNC wiring removed with startup
  log: "ContainerProfileCache active; legacy AP/NN caches removed"
- test call sites updated for 3-arg NewObjectCache

Legacy packages still physically present (imports retained where still
referenced, e.g. callstackcache); step 8 deletes them entirely.

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: delete legacy AP/NN cache packages + move callstackcache (step 8)

- git rm -r pkg/objectcache/applicationprofilecache/ (766 LOC)
- git rm -r pkg/objectcache/networkneighborhoodcache/ (758 LOC)
- git rm pkg/objectcache/applicationprofilecache_interface.go
- git rm pkg/objectcache/networkneighborhoodcache_interface.go
- mv pkg/objectcache/applicationprofilecache/callstackcache/
    -> pkg/objectcache/callstackcache/ (domain-agnostic, shared)
- Update 4 importers: containerprofilecache_interface.go, v1/mock.go,
  containerprofilecache.go, reconciler.go
- RuleObjectCacheMock drops ApplicationProfileCache()/NetworkNeighborhoodCache()
  accessor methods; SetApplicationProfile/SetNetworkNeighborhood remain as
  test helpers that project into the unified CP
- projection.go comments kept as historical source pointers — git history
  preserves the originals

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: add T2 init-eviction, T5 packages-deleted, T7 lock-stress (step 9 partial)

- tests/containerprofilecache/packages_deleted_test.go: go/packages
  dep-graph assertion that legacy AP/NN paths are absent
- tests/containerprofilecache/lock_stress_test.go: 100-goroutine
  interleaved seed/read for same container IDs, 5s budget, race-safe
- tests/containerprofilecache/init_eviction_test.go: T2a (event-path
  evict) + T2b (reconciler-path evict for missed RemoveContainer)
- tests/containerprofilecache/helpers_test.go: shared test builders
- pkg/objectcache/containerprofilecache: export ReconcileOnce and
  SeedEntryForTest as out-of-package test hooks
- Makefile: check-legacy-packages target

T1 (golden-alert parity) and T3 (memory benchmark) are release-checklist
items per plan v2 §2.7 — the pre-migration baselines those tests require
can no longer be captured from this branch.

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Phase 4 review P1 findings

1. Drop ReleaseLock on delete paths (containerprofilecache.go deleteContainer,
   reconciler.go reconcileOnce). Security review flagged a race where the
   deleted mutex could be orphaned while a concurrent GetLock creates a new
   one, breaking mutual exclusion for the same container ID. Trade-off:
   bounded memory growth of stale lock entries, proportional to container
   churn — acceptable for a node-agent lifetime.

2. Extract emitOverlayMetrics helper (metrics.go) to de-duplicate the
   ~20-line overlay metric/deprecation-warn block between buildEntry
   (addContainer path) and rebuildEntry (refresh path). Keeps the two in
   lockstep — code review flagged silent drift risk.

Not addressed in this commit (plan-accepted tradeoffs, follow-up work):
- Shared-pointer read-only invariant is convention-enforced, not type-
  enforced (plan v2 §2.3 step 7, ADR consequences). Retaining as-is;
  downstream consumers must not mutate.
- Storage RPC context propagation (requires storage.ProfileClient interface
  change, out of scope for this migration).

Plan: .omc/plans/containerprofile-cache-unification-consensus.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: retry pending ContainerProfile GETs when CP appears after container-start

Component tests on PR kubescape#788 regressed with "All alerts: []" and 54+
"container X not found in container-profile cache" log entries. Root cause:
addContainer did a one-shot GetContainerProfile at EventTypeAddContainer
time and bailed on 404. But the CP is created asynchronously by
containerprofilemanager ~60s AFTER container-start, so the one-shot GET
almost always missed; the cache entry was never created; rule evaluation
short-circuited as "no profile".

Legacy caches hid this via a periodic ListProfiles scan that picked up
late-arriving profiles on the next tick. The point-lookup model dropped
that scan. This commit adds an equivalent: a pending-container retry path
in the reconciler.

Changes:
- CachedContainerProfile unchanged; new pendingContainer struct captures
  (container, sharedData, cpName) needed to retry the initial GET.
- ContainerProfileCacheImpl.pending SafeMap records containerIDs waiting
  for their CP to land in storage.
- addContainer extracts the populate/GET into tryPopulateEntry. On miss
  (err or nil CP) it records a pending entry; the per-container goroutine
  exits. No more waiting 10 min inside addContainerWithTimeout.
- reconciler.retryPendingEntries iterates pending under per-container
  locks, re-issues the GET, and promotes via tryPopulateEntry on success.
- reconcileOnce gains a pending GC pass: containers whose pod is gone or
  whose status is not Running get dropped from pending so we don't retry
  forever on terminated containers.
- deleteContainer also clears from pending on EventTypeRemoveContainer.
- metrics: cache_entries gauge gains a "pending" kind; reconciler
  eviction counter gets a "pending_pod_stopped" reason.

Tests:
- TestRetryPendingEntries_CPCreatedAfterAdd: 404 on add -> pending; CP
  arrives in storage -> one tick promotes; exactly 2 GetCP calls.
- TestRetryPendingEntries_PodGoneIsGCed: pending entry dropped when its
  pod is no longer present in k8s cache.

Full findings and resume doc at
  .omc/plans/containerprofile-cache-component-test-findings.md

Follow-up plan updated at
  .omc/plans/containerprofile-cache-followups.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: cache correctness — right CP slug, partial-on-restart, overlay refs, resurrection guard

PR kubescape#788 component tests continued failing after the pending-retry fix.
Deep investigation uncovered a fundamental slug misuse and three reviewer-
reported correctness gaps. All fixed here.

### Primary bug: wrong slug function

plan v2 §2.3 asserted that GetOneTimeSlug(false) was deterministic. It is
NOT — implementation at k8s-interface v0.0.206:
  func (id *InstanceID) GetOneTimeSlug(noContainer bool) (string, error) {
      u := uuid.New()
      hexSuffix := hex.EncodeToString(u[:])
      ...
  }
So containerprofilemanager.saveContainerProfile writes a *time-series* CP
per tick with a fresh UUID suffix, and the storage-side
ContainerProfileProcessor.consolidateKeyTimeSeries writes the consolidated
profile at the STABLE slug (GetSlug(false), no UUID).

The cache was querying for CPs at GetOneTimeSlug(false), so every GET 404'd
forever — even with the pending-retry in place. 13 component tests failed
with "All alerts: []" and 38+ "container X not found in container-profile
cache" log entries.

Switched addContainer to GetSlug(false). The refresh path inherits the
corrected name via entry.CPName.

### Reviewer #1: resurrection during refresh

refreshAllEntries snapshots entries without a lock. Between snapshot and
per-entry lock acquisition, deleteContainer or reconcile-evict may have
removed the entry. Previously, rebuildEntry's c.entries.Set(id, newEntry)
would resurrect the dead container.

Added a load-under-lock guard at the top of refreshOneEntry.

### Reviewer #2: overlay handling regressions (two parts)

(a) tryPopulateEntry returned "pending" on base-CP 404 BEFORE trying
user-AP/NN. Containers with only a user-defined profile (no base CP yet)
got no entry. Restructured: fetch base CP and user-AP/NN independently;
populate if ANY source is available; synthesize an empty base CP when only
the overlay exists so projection has something to merge into.

(b) UserAPRef / UserNNRef were only recorded on successful fetch. A
transient 404 on add would permanently drop the overlay intent — the
refresh path had nothing to re-fetch. Now, when the label is set, the
refs are always recorded, using the label's name and the container's
namespace. Refresh retries the fetch each tick.

### Reviewer #3: partial profiles reused across container restart

tryPopulateEntry blindly used whatever CP existed at the stable slug,
including Partial completions from the previous container incarnation.
Legacy caches explicitly deleted Partial profiles on non-PreRunning
restart so rule evaluation fell through to "no profile" until Full
arrived.

Now: if CP.completion == Partial && !sharedData.PreRunningContainer, we
treat the CP as absent → stay pending → retry each tick. When the CP
becomes Full (or the container stops), the pending state resolves.

The inverse is preserved: PreRunningContainer (agent-restart scenario)
accepts the Partial CP as-is so Test_19's "alert on partial profile"
semantics still work.

### Tests

Five new unit tests, all race-clean:
- TestPartialCP_NonPreRunning_StaysPending
- TestPartialCP_PreRunning_Accepted
- TestOverlayLabel_TransientFetchFailure_RefsRetained
- TestRefreshDoesNotResurrectDeletedEntry
- TestUserDefinedProfileOnly_NoBaseCP

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: read workload-level AP/NN as primary data source

The storage server's consolidated ContainerProfile is not exposed via the
public k8s API — ContainerProfiles().Get(stableName) returns 404 even after
consolidation runs. Only time-series CPs (named <stable>-<UUID>) and the
server-aggregated ApplicationProfile / NetworkNeighborhood CRs at the
workload-name are queryable.

The component tests' WaitForApplicationProfileCompletion waits for the
workload-level AP/NN completion — that's what actually exists. The legacy
caches read these directly; we do the same now while the server-side
consolidated-CP plumbing is completed.

Changes:
- addContainer computes both cpName (per-container, forward-compat) and
  workloadName (per-workload, where AP/NN live) via GetSlug(false) and
  GetSlug(true) respectively.
- tryPopulateEntry fetches consolidated CP (kept for forward-compat),
  workload AP, and workload NN. Treats the workload AP/NN as the primary
  data source when the consolidated CP isn't available.
- projection pre-merges workloadAP + workloadNN onto the base (synthesized
  when CP is 404), then buildEntry applies user-overlay AP/NN on top.
- Partial-on-restart gate extended to cover workload AP/NN too — non
  PreRunning containers ignore partial workload profiles until they
  become Full, mirroring legacy deletion-on-restart semantics.
- pendingContainer gains workloadName so retries re-fetch the right CRs.
- fakeProfileClient gains overlayOnly field so tests can scope AP/NN
  returns to the overlay name; existing TestOverlayPath_DeepCopies updated
  accordingly.

Forward-compat: once storage publishes a queryable consolidated CP at
cpName, its fetch becomes primary and the workload AP/NN path becomes a
fallback. No API changes are required to make that transition — just drop
the workload-level fetches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* debug: add tick-loop start log + change-detection log in reconciler

* fix: remove overly-aggressive pending GC that dropped entries before retry

CI run 24781030436 (commit ce32919) proved the reconciler IS ticking with
retryPendingEntries running, but the pending-GC pass in reconcileOnce was
dropping every pending entry on the first tick (pending_before=4 →
pending_after=0 at the FIRST tick, before retryPendingEntries could run).

Root cause: the GC pass asked k8sObjectCache.GetPod(ns, pod) and also
checked isContainerRunning. On a busy node, the k8s pod cache and
ContainerStatuses lag the containerwatcher Add event by tens of seconds.
So "pod not found" or "container not yet Running" routinely returned true
for a container that had just been registered, causing GC to remove the
pending entry immediately. Retries then ran against an empty pending map
→ no promotions → alerts fired without profile → test failure.

Change: remove the pending GC entirely. Cleanup for terminated containers
flows through deleteContainer (EventTypeRemoveContainer) which clears
both entries and pending under the per-container lock. Memory growth is
bounded by the node's container churn (containers that never got a
profile during their lifetime).

Test updated: TestRetryPendingEntries_PodGoneIsGCed replaced by
TestPendingEntriesAreNotGCedBeforeRetry which asserts the new semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: merge user-managed AP/NN and refresh workload-level sources

Two component-test regressions in PR kubescape#788:

Fix A (Test_12 / Test_13): the cache now reads the user-managed
ApplicationProfile and NetworkNeighborhood published at
"ug-<workloadName>" and projects them onto the base profile as a
dedicated ladder pass. Legacy caches did this via the
`kubescape.io/managed-by: User` annotation in handleUserManagedProfile;
we read them directly by their well-known name.

Fix B (Test_17 / Test_19): the reconciler refresh path re-fetches the
workload-level AP/NN (and user-managed / label-referenced overlays) on
every tick, not just the consolidated CP. This propagates the Status=
"ready" -> "completed" transition into the cached ProfileState, which
flips fail_on_profile from false to true at rule-eval time.

CachedContainerProfile gained WorkloadName plus WorkloadAPRV /
WorkloadNNRV / UserManagedAPRV / UserManagedNNRV fields so the refresh
can fast-skip when every source's RV matches what's cached.
refreshOneEntry's rebuild now runs the same projection ladder as
tryPopulateEntry: base CP (or synthesized) → workload AP+NN →
user-managed (ug-) AP+NN → label-referenced user AP+NN.

Also:
- Tick-loop log only fires when entries OR pending count actually moved
  (previously fired whenever pending_before > 0, producing per-tick
  noise while a stuck-pending entry waited for profile data).
- fakeProfileClient in tests returns userManagedAP/userManagedNN when
  the requested name starts with "ug-".
- New tests: TestWorkloadAPMerged_AndRefreshUpdatesStatus (Fix B
  happy-path) and TestUserManagedProfileMerged (Fix A happy-path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reconcileOnce no longer evicts on pod-cache lag, only on Terminated

CI run 24783250693 (commit 32a76c0) showed reconcileOnce evicting live
entries every tick: "entries_before:10, entries_after:0" within 5 seconds
of the agent starting. Same class of bug as the pending-GC fix (c45803f):
the k8s pod cache lags ContainerCallback Add events by tens of seconds,
and evicting on "GetPod returns nil OR !isContainerRunning" churned every
entry before any rules could evaluate.

Change reconcileOnce eviction gate:
- If pod is missing from k8s cache: DO NOT evict. Cache lag is transient;
  deleteContainer handles real-world cleanup via EventTypeRemoveContainer.
- If pod present and container has clearly Terminated: evict (preserves
  init-container eviction for Test_02 and T2 acceptance).
- If pod present and container in Waiting state: retain (new container
  creation, init-container pre-run both legitimately pass through Waiting).

New helper isContainerTerminated mirrors isContainerRunning but gates on
State.Terminated only; "not found in any status" treated as terminated.

Tests:
- TestReconcilerEvictsWhenPodMissing → TestReconcilerKeepsEntryWhenPodMissing
- New TestReconcilerEvictsTerminatedContainer
- New TestReconcilerKeepsWaitingContainer

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: drop workload-level AP/NN fetch; CP-direct reading is authoritative

The workload-level AP/NN fetch added in d27be01 was a workaround for the
eviction/GC bugs (fixed in c45803f and d9ae0ac), not an architectural
need. The consolidated ContainerProfile IS queryable at the GetSlug(false)
name once storage aggregation runs; the cache simply needs to wait on the
pending-retry path.

This reverts the workload-AP/NN read while keeping:
- consolidated CP as the single base-profile source
- user-managed AP/NN at "ug-<workloadName>" (merged on top) — still needed
  because user-managed profiles are authored independently and are not
  consolidated into the CP server-side
- user-defined overlay via pod UserDefinedProfileMetadataKey label
- eviction fix (d9ae0ac), GC fix (c45803f), resurrection guard

Removed:
- workload-AP/NN fetch in tryPopulateEntry and refreshOneEntry
- WorkloadAPRV / WorkloadNNRV fields on CachedContainerProfile and the
  corresponding rebuildEntryFromSources ladder pass
- Partial-on-restart gate for workload AP/NN (only applies to CP now)
- Synth-CP annotation fallback chain (simplified to Completed/Full)

Tests:
- TestWorkloadAPMerged_AndRefreshUpdatesStatus → TestRefreshUpdatesCPStatus
  (CP now the source; RV transition propagates Status)
- TestUserManagedProfileMerged rewired to use a real base CP + ug- overlay
  instead of workloadAP + ug- overlay

This matches the migration plan's original intent: CP-direct, AP/NN only
as user overlays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: synthetic entry CPName override, PodUID backfill, phase-labeled reconciler histogram

Three review findings from the post-green audit.

### 1 (High) — synthetic entry stored the wrong CPName

When tryPopulateEntry synthesized a CP (consolidated CP still 404), the
synthetic name was workloadName or overlayName, and buildEntry persisted
entry.CPName = cp.Name (i.e. the synthetic name). refreshOneEntry then
queried the synthetic name instead of the real GetSlug(false) name; with
the stored RV also empty, the fast-skip's "absent matches empty" branch
kept the synthetic entry forever and the real consolidated CP could never
replace it.

Fix: after buildEntry, override entry.CPName = cpName (the real
GetSlug(false) result passed into tryPopulateEntry).

### 2 (Medium) — PodUID never backfilled

buildEntry only sets PodUID when the pod is already in k8sObjectCache at
add time. On busy nodes the pod cache lags, so addContainer often runs
before the pod lands and PodUID stays "". isContainerTerminated's
empty-ContainerID fallback matches against (ContainerName, PodUID);
when PodUID == "" and the status also has empty UID, the loop falls
through and returns true (treat as terminated) — evicting a still-live
init container. rebuildEntryFromSources copied prev.PodUID unchanged, so
the error never healed.

Fix: in rebuildEntryFromSources, if prev.PodUID is empty AND the pod is
now in the k8s cache, use the fresh UID.

### 3 (Low) — reconciler duration histogram mixed two phases

tickLoop (evict) and refreshAllEntries (refresh) both emitted
ReportContainerProfileReconcilerDuration into the same plain Histogram,
so nodeagent_containerprofile_reconciler_duration_seconds was a blend of
two very different workloads. Plan v2 §2.9 had specified a HistogramVec
with a "phase" label from the start.

Fix: MetricsManager.ReportContainerProfileReconcilerDuration(phase, d).
Prometheus implementation becomes a HistogramVec with phase label.
tickLoop emits phase="evict", refreshAllEntries emits phase="refresh".
MetricsMock/MetricsNoop signatures updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address all CodeRabbit review comments on PR kubescape#788

- ContainerProfileCacheMock.GetContainerProfileState returns synthetic
  error state instead of nil, matching the real impl's contract
- Remove IgnoreContainer check on EventTypeRemoveContainer to prevent
  stale entries when pod labels change after Add
- Deep-copy userAP/userNN in mergeApplicationProfile and
  mergeNetworkNeighborhood to eliminate aliasing of nested slices
  (Execs[i].Args, Opens[i].Flags, MatchExpressions[i].Values, etc.)
  into the cached ContainerProfile
- Fix Shared=true bug: buildEntry now takes userManagedApplied bool;
  fast-path only sets Shared=true when no overlay was applied at all,
  matching rebuildEntryFromSources logic in reconciler.go
- isContainerTerminated returns false when all status slices are empty
  (kubelet lag guard for brand-new pods)
- Fix misplaced doc comment above GetContainerProfile in storage layer
- Remove unused (*stubStorage).setCP test helper
- Lock stress test evict path now uses ContainerCallback(Remove) to
  exercise deleteContainer and per-container locking
- RuleObjectCacheMock stores per-container profiles in cpByContainerName;
  GetContainerProfile resolves via InstanceID.GetContainerName();
  GetContainerProfileState returns synthetic error state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: thread context.Context through ProfileClient and add per-call RPC budget

All five ProfileClient methods now accept ctx as their first argument so
callers can enforce cancellation and deadline propagation. Each storage RPC
in the reconciler is wrapped via refreshRPC(ctx, ...) which applies a
configurable per-call timeout (config.StorageRPCBudget, default 5 s) on top
of the parent context, preventing a slow API server from stalling an entire
reconciler burst. Tests cover the fast-skip, rebuild, and context-cancellation
mid-RPC paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: shared-pointer race-fuzz test + WarmContainerLocksForTest helper

Add TestSharedPointerReadersDoNotCorruptCache: 50 concurrent readers
traverse the returned *ContainerProfile slices while a writer goroutine
alternately calls RefreshAllEntriesForTest + SeedEntryForTest to keep
entry rebuilds active. Runs for 500ms under -race, proving the shared-
pointer fast-path never produces a concurrent read/write pair.

Also add TestSharedPointerFastPathPreservesPointerIdentity: after a
refresh against a storage object with a newer RV, the new entry's
Profile pointer IS the storage object (Shared=true, no DeepCopy), which
keeps the T3 memory budget intact.

Fix the pre-existing goradd/maps SafeMap initialisation race in
TestLockStressAddEvictInterleaved by pre-warming containerLocks via the
new WarmContainerLocksForTest helper (the previous pre-warm via
SeedEntryForTest only covered the entries SafeMap, not containerLocks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: document SetApplicationProfile / SetNetworkNeighborhood field partition in mock

Add a block comment above RuleObjectCacheMock spelling out the non-overlapping
cp.Spec field partition between the two setters and the first-container-wins
rule for r.cp. Without this, future callers risk aliasing NN fields into an
AP-only profile or vice-versa.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: T8 integration mirror, mock setter contract doc, SeedEntryWithOverlayForTest

Add SeedEntryWithOverlayForTest helper so out-of-package integration tests can
set UserAPRef / UserNNRef (which use the internal namespacedName type) without
requiring the type to be exported.

Mirror TestT8_EndToEndRefreshUpdatesProjection at tests/containerprofilecache/
using only the public + test-helper API: seeds an entry with a stale UserAPRV,
mutates storage to apV2 (RV=51), asserts RefreshAllEntriesForTest rebuilds the
projection with the new execs and drops the stale ones.

Add top-of-file block comment to RuleObjectCacheMock documenting the non-
overlapping AP-fields / NN-fields partition between SetApplicationProfile and
SetNetworkNeighborhood and the first-container-wins rule for r.cp.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address Phase 4 code-review findings

- reconciler.go: simplify dead-code cpErr/rpcErr guard (refreshRPC returns
  exactly cpErr; the rpcErr != nil && cpErr == nil branch could never fire)
- reconciler_test.go: make blockingProfileClient.blocked a buffered chan(1)
  with a blocking send so the signal is never silently dropped; bump
  rpcBudget to 100ms and timeout to 2s to reduce flakiness on loaded CI
- containerprofilecache.go: extract defaultStorageRPCBudget const alongside
  defaultReconcileInterval for discoverability
- shared_pointer_race_test.go: fix gofmt const-block alignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: preserve cached entry when overlay AP/NN fetch fails transiently

Before this fix, a refreshRPC timeout on any overlay GET (user-managed
ug-<workload> AP/NN or user-defined label-referenced AP/NN) left the
overlay variable nil with the error silently discarded. The RV comparison
then saw rvOf(nil)="" != cached RV (e.g. "50"), treated it as a removal,
and rebuilt the entry without the overlay — temporarily stripping
user-managed/user-defined profile data from the cache and altering
alerting until the next successful tick.

Fix: capture each overlay's fetch error and, when it is non-nil and the
entry already has a non-empty cached RV for that overlay, return early
and keep the existing entry unchanged. Legitimate deletions (nil with
err==nil) still propagate correctly. Mirrors the existing CP error-
preservation logic at refreshOneEntry:272-288.

Add TestRefreshPreservesEntryOnTransientOverlayError covering all four
overlay fetch paths (user-managed AP, user-managed NN, user-defined AP,
user-defined NN) via a new overlayErrorClient stub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review issues on PR kubescape#788

- Rename 5 CP cache metrics from nodeagent_* to node_agent_* to match
  the existing metric namespace convention used across node-agent.
- Route all 5 storage GETs in tryPopulateEntry through refreshRPC so
  they respect the per-call SLO (default 5s); prevents a hung GET from
  blocking the entire reconciler tick loop when called from
  retryPendingEntries.
- Add WarmPendingForTest helper to pre-initialise the pending SafeMap
  before concurrent test phases, preventing the goradd/maps
  nil-check-before-lock initialisation race.
- Pre-warm pending SafeMap in TestLockStressAddEvictInterleaved and
  poll for async deleteContainer goroutines to drain before asserting
  goroutine count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: distinct RNG seed per stress-test worker

Pass worker index into each goroutine closure and mix it into the
rand.NewSource seed (time.Now().UnixNano() + int64(worker)), so that
100 concurrently-launched goroutines don't all receive the same
nanosecond timestamp and end up with identical add/evict sequences.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: move test helpers out of production source into testing.go

The six *ForTest / ReconcileOnce helpers were previously mixed into
containerprofilecache.go alongside production logic. Move them to a
dedicated testing.go file in the same package.

export_test.go is the idiomatic alternative but is compiled only when
running tests in the same directory; test packages in other directories
(tests/containerprofilecache/) import the non-test version of the
package and never see _test.go contents. A plain testing.go is the
correct pattern here — it signals "test support" by name and groups all
scaffolding in one place, while remaining importable by any test binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: move integration tests into package dir; use export_test.go

export_test.go (package containerprofilecache) is only compiled during
`go test` so test helpers never enter the production binary. This only
works when callers are in the same directory; the prior layout put tests
in tests/containerprofilecache/ (a separate package), forcing helpers
into a plain testing.go that shipped in the binary.

Moving the six test files into pkg/objectcache/containerprofilecache/
as package containerprofilecache_test fixes this correctly:
- export_test.go replaces testing.go (test-binary-only)
- package declaration: containerprofilecache_integration → containerprofilecache_test
- packages_deleted_test.go Dir path: ../.. → ../../.. (module root)
- tests/containerprofilecache/ directory removed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: nil out overlay pointers when k8s client returns zero-value on 404

The Kubernetes generated client (gentype.Client.Get) pre-allocates a
zero-value struct before the HTTP call and returns it as the result even
on error (e.g. 404 not-found). In refreshOneEntry, the four overlay
fetch paths (userManagedAP, userManagedNN, userAP, userNN) guarded only
the "transient error with cached RV → keep old entry" branch; the
"first-time 404, no cached RV" branch fell through with a non-nil
empty-ObjectMeta struct still in the pointer, which reached
rebuildEntryFromSources → emitOverlayMetrics and logged spurious
"user-authored legacy profile merged" warnings with empty
namespace/name/resourceVersion fields.

Add an explicit nil-out after each non-returning error branch, mirroring
the pattern already used in tryPopulateEntry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
…vice file handling (kubescape#791)

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
…kubescape#798)

* perf: disable file-digest/metadata/executable catalogers

These three catalogers iterate every file in the scan tree and dominate
transient allocation, but their outputs are not consumed by the OOM-relevant
SBOM path. Disabling them saves ~200 MB peak RSS on gitlab-ee (main) and
stacks with upstream selective-indexing + binary-prefilter improvements to
~1.12 GB total (vs 1.62 GB baseline, fits 1.5 GB cgroup).
Signed-off-by: Ben <ben@armosec.io>

* deps: switch to kubescape/syft v1.32.0-ks.2 for memory reduction

Routes anchore/syft imports to the kubescape fork via replace directive.
The fork carries selective indexing + binary-cataloger pre-filtering on
top of v1.32.0; combined with the file-cataloger disable in the parent
commit, this reduces gitlab-ee scan peak RSS from 1,621 MB to 1,123 MB.

Refs: NAUT-1283
Signed-off-by: Ben <ben@armosec.io>

* fix: check dep.Replace for actual fork version; add cataloger removals to sidecar

- packageVersion() now returns dep.Replace.Version when present so the fork
  tag (v1.32.0-ks.2) propagates to runtime metadata and version-gating logic
- pkg/sbomscanner/v1/server.go: add the same WithCatalogerSelection/WithRemovals
  as sbom_manager.go so both SBOM paths drop file-digest/metadata/executable
  catalogers and stay in consistent memory behaviour
Signed-off-by: Ben <ben@armosec.io>

* fix: keep syft tool version at required version

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Signed-off-by: Ben <ben@armosec.io>
Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The application-profile recorder in ReportFileExec derives the path it
stores into the AP from `args[0]`, while the rule-side resolver
(`parse.get_exec_path` in pkg/rulemanager/cel/libraries/parse/parse.go)
falls back to `comm` when `args[0]` is empty. The asymmetry causes
"Unexpected process launched" (R0001) and other ap.was_executed-based
rules to fire on processes that are present in the application profile.

Trigger: fexecve / execveat with AT_EMPTY_PATH. modern libpam (>= 1.5)
invokes its helpers (unix_chkpwd, unix_update, ...) via fexecve to avoid
TOCTOU on the helper path. The kernel implements fexecve as
execveat(fd, "", argv, envp, AT_EMPTY_PATH) — pathname is empty by
design.

Inspektor Gadget's trace_exec puts the syscall pathname into args[0]
and reads argv from index 1 (gadgets/trace_exec/program.bpf.c:146-153).
For fexecve/execveat empty-pathname, this produces args = ["", argv[1]]
in the agent's exec event. The recorder then sets path = args[0] = ""
and the AP entry is unreachable to ap.was_executed("unix_chkpwd")
(which the rule-side resolver computes via the empty-args[0] -> comm
fallback).

Fix: derive the recorder's path the same way the rule-side does — prefer
exepath (the kernel-authoritative exe_file path, immune to argv[0]
spoofing too), then argv[0] when non-empty, then comm.

Concrete impact in production: 408 of 1976 Bonial I013 incidents on
production scoring-api APs are exactly this case — cron user-context
setup invokes pam_unix -> unix_chkpwd via fexecve, AP records path: ""
with args ["", "root"], rule looks up "unix_chkpwd" via comm fallback,
no match.

The new resolveExecPath helper is also more defensive against argv[0]
spoofing in general — exepath comes from task->mm->exe_file in the BPF
side and cannot be controlled by user code.

Verified locally on a kind cluster with kubescape v0.3.94: a pod that
loops execve (control) and execveat-AT_EMPTY_PATH (bug) reproduces the
production-shape AP entry on the unfixed code path.

Signed-off-by: Ben <ben@armosec.io>
* implement Rule-Aware Profile Projection

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* fix: address CodeRabbit review comments (batch 2)

- profiledata.go: reset receiver in UnmarshalJSON/YAML for ProfileDataRequired
  and FieldRequirement; add PatternObject unknown-field rejection
- function_cache.go: include SyncChecksum in cache key to invalidate on
  profile content changes (not only spec changes); iterate all extraKeyFn callbacks
- rule_manager.go: gate strict-validation rejection behind StrictValidation flag;
  coalesce specNotify bursts before recompile
- exec.go: document wasExecutedWithArgs v1 limitation for rule authors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* docs: document wasExecutedWithArgs v1 path-only matching limitation

Add a CEL Helper Limitations table to the Detection Rules section noting
that wasExecutedWithArgs currently performs path-only matching (equivalent
to wasExecuted) and does not validate the argument list in v1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: pass-through all profile data when no rules declare profileDataRequired

When InUse=false (no rule declared a requirement for a field), projectField
was returning an empty ProjectedField{}, causing CEL helpers to see no profile
data and fire false-positive alerts for every exec/open/capability/etc.

Fix: treat InUse=false as All=true (pass-through), so existing rules that omit
profileDataRequired continue working with the full raw profile.

Update TestApply_NilSpec, TestApply_DynamicNotRetainedWhenNotInUse (renamed),
and TestSpecChange_TriggersReprojection to reflect the new pass-through semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update stale comments and strengthen reprojection test

- projection_apply.go: update Apply doc-comment and dynamic-patterns
  comment to reflect pass-through semantics (InUse=false retains all data)
- reconciler_test.go: add SpecHash assertions to TestSpecChange to prove
  reprojection actually occurred rather than testing pass-through twice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: improve error logging for user-managed resource fetch failures

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: add profileDataRequired field for rule-aware projection requirements

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: enhance profileDataRequired field to allow additional properties for rule-aware projection

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: update profileDataRequired field to preserve unknown fields for rule-aware projection

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: run malicious job from /app to use the rule watched path

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: change working directory for malicious job to /var/log

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: increase timeout for helm upgrade and kubectl wait in component tests; update malicious job to include command and args

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: update malicious job working directory to /tmp and modify command for service account token access

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: update malicious job to read environment variables from /proc/self/environ

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: create marker file in /var/lib/r0002-test for malicious job

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* feat: enable file access anomalies detection (R0002)

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

---------

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* get services from API, removing sidecar requirement

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>

* fix: add timeout and file-based fallback to LoadServiceURLs

- Bound HTTP service discovery to 10 s so a slow/unreachable API
  cannot stall node-agent startup; failure is handled gracefully by
  the existing nil-check at the call site.
- Restore SERVICES env var / /etc/config/services.json fallback
  (using ServiceDiscoveryFileV3) so sidecar deployments retain
  scan-failure reporting without requiring migration to API_URL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
… surface

Living documentation for the feat/network-wildcards work. Each fixture
is a complete, kubectl-applicable NetworkNeighborhood document
exercising ONE edge case in the v0.0.2 wildcard surface. Test_34
(forthcoming) consumes them directly; users learning the syntax can
copy-paste them as authoritative examples.

Coverage:
  01 — IPv4 literal in ipAddresses[]
  02 — IPv6 literal (canonicalisation)
  03 — IPv4 CIDR
  04 — IPv6 CIDR
  05 — '*' sentinel for ANY IP (with discouragement annotation)
  06 — 0.0.0.0/0 + ::/0 (RFC-aligned alternative to '*')
  07 — mixed list (literal + CIDR + sentinel)
  08 — backward-compat singular ipAddress
  09 — DNS literal
  10 — DNS leading '*' (RFC 4592)
  11 — DNS mid '⋯' (DynamicIdentifier)
  12 — DNS trailing '*' (one or more, never zero)
  13 — trailing-dot normalisation
  14 — '**' recursive — admission MUST reject
  15 — egress + ingress on same container, direction isolation
  16 — egress: [] NONE (declared zero-egress)
  17 — realistic Stripe API + cluster DNS
  18 — Kubernetes service-FQDN via mid '⋯' (the user's case)
  19 — port + protocol + CIDR composed
  20 — multi-container pod, different rules per container

README.md indexes all fixtures and lists the wildcard token vocabulary.

Each fixture's header comment lists the edge case, expected outcomes,
match path, spec reference, and operational guidance. Ready to be
consumed by node-agent's Test_34_NetworkWildcardSurface (forthcoming)
and by storage's networkmatch unit tests via testdata-style references.
Replaces byte-equality with the v0.0.2 wildcard-aware matchers from
storage's pkg/registry/file/networkmatch — applied symmetrically to
all six nn.* CEL functions (egress + ingress mirror images):

  nn.was_address_in_egress / _in_ingress
  nn.is_domain_in_egress   / _in_ingress
  nn.was_address_port_protocol_in_egress / _in_ingress

Each function now walks BOTH the deprecated singular field
(IPAddress / DNS, byte-equality, back-compat) AND the new plural
field (IPAddresses / DNSNames, wildcard-aware) on each NetworkNeighbor
entry. A profile that uses only the deprecated form behaves exactly
as before; a profile that uses the new form gains CIDR + wildcard
matching with no rule-side changes required.

Two helpers (neighborMatchesIP / neighborMatchesDNS) factor the
two-list walk so the six call sites stay readable. Compiled-form
caching of the matcher across calls is deferred to a follow-up — the
existing cel functionCache still memoises (containerID, observed)
tuples, so the per-call MatchIP/MatchDNS overhead only fires on
cache misses.

Tests cover:
  - CIDR membership across egress/ingress
  - '*' sentinel for any IP
  - leading-* DNS wildcard (RFC 4592, exactly one label)
  - mid-⋯ DynamicLabel (the kubernetes service-FQDN case)
  - trailing-dot resilience
  - direction isolation (egress and ingress lists are walked
    independently — same address allowed on one direction
    must NOT match the other)
  - back-compat: deprecated singular IPAddress/DNS still works
  - mixed: profile with one entry using singular, another using plural
  - composed match: CIDR + port + protocol on the granular variant

go.mod: temporary local-path replace for kubescape/storage so the
node-agent picks up the in-flight feat/network-wildcards work; user
flips back to fork ref before pushing.
TestFixturesParse: every YAML under tests/resources/network-wildcards/
parses against the v1beta1 NetworkNeighborhood schema. The fixtures
double as authoritative user-facing syntax documentation, so a fixture
that fails to parse is a documentation bug.

TestFixturesMatchExpectedBehaviour: representative observed→match
triples for each major edge case (literal IP, CIDR, '*' sentinel,
deprecated singular IPAddress, leading-* DNS RFC 4592, mid-⋯
DynamicLabel, direction isolation between egress and ingress) are
exercised through the actual nn.* CEL functions. If a fixture's
header comment says '10.1.2.3 → match' and the matcher disagrees,
ONE of them is wrong; this test pins both.

True end-to-end Test_34_NetworkWildcardSurface (kubectl-applies the
fixtures against a live cluster) belongs in the iximiuz lab; that
job is left for the lab pass once the storage + node-agent images
ship via the fork CI.
Local replace points at ../storage so the fork ref isn't fetched.
User reverts both go.mod and go.sum before pushing the branch.
Updates the storage replace to a pseudo-version on the fork that includes
the v0.0.2 wildcard surface (pkg/registry/file/networkmatch/, IPAddresses
schema field, REST validation). Build and tests stay green against the
pinned ref.

The .claude/ entry on .gitignore prevents the agent state directory from
being tracked accidentally.
Five findings, all legit, all fixed:

- Port range guard (Major): wasAddressPortProtocolInEgress/Ingress now
  reject portInt outside [0, 65535] BEFORE narrowing to int32. Without
  this, a CEL value like 4294967739 wraps to 443 and would falsely
  match a port-443 entry. New TestWasAddressPortProtocolInEgress_
  PortWrapRejected pins the contract.

- neighborMatchesDNS now routes the deprecated singular DNS field
  through MatchDNS (single-element slice) instead of raw string
  equality, so back-compat behaviour gets the same trailing-dot
  stripping + lowercasing as the new DNSNames[]. New
  TestIsDomainInEgress_DeprecatedDNS_TrailingDotParity pins this.

- Direction-isolation fixture test now exercises BOTH
  wasAddressInEgress and wasAddressInIngress for each observation,
  via a new ipBothCheck struct. The prior version only checked egress,
  so a regression that broke ingress matching would have slipped through.

- TestFixturesParse uses yaml.UnmarshalStrict so a typo in any user-
  facing fixture (the YAML files double as documentation) fails the
  test instead of silently parsing.

- README clarifies that fixture 14 is intentionally rejected at
  admission and shouldn't be kubectl-applied — points readers at the
  index entry so they don't try to use it as a template.

Also bumps the storage replace to e1263bf6, which carries storage's CR
fixes (deprecated IPAddress validation, ValidateUpdate now also runs
network-profile validation, field-path assertions in admission tests).
Pulls in storage's CR round-2 fixes: deterministic admission error
ordering across container groups, and field-path assertions on the
ValidateUpdate test.
Pulls in storage's deprecated-DNS validation parity fix.
Two findings, both nitpick-level, both applied:

- Remove the unused 'maps', 'objectcache', 'objectcachev1' imports
  from fixtures_test.go along with the blank-identifier _ = ... lines
  at the bottom that existed only to silence the unused-import error.
  buildLibWithContainer is defined in wildcard_test.go (same package),
  so fixtures_test.go has no real need for those imports.

- Route the deprecated singular IPAddress through networkmatch.MatchIP
  for symmetry with the deprecated singular DNS (which round 1 already
  routed through MatchDNS). Both deprecated fields now get the same
  canonicalisation (IPv6 expanded forms, IPv4-mapped IPv6) as the new
  list fields. New TestWasAddressInEgress_DeprecatedIPAddress_
  IPv6Canonicalisation pins this.
…nd 3)

CR caught that the round-2 routing of deprecated IPAddress through
MatchIP had a documentation gap: existing tests only proved literal
+ canonical (IPv6) matching, never the wildcard/CIDR semantics that
MatchIP now also enables on the deprecated field.

Adds TestWasAddressInEgress_DeprecatedIPAddress_AcceptsWildcardAndCIDR
which pins the contract: deprecated singular field accepts the SAME
wildcard token vocabulary as the new list form — '*' sentinel,
CIDRs, 0.0.0.0/0 and ::/0 alternatives. Comment on neighborMatchesIP
documents this is intentional unification, not accidental.
…scape#803)

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Profile-checksum-invalidated cache of compiled networkmatch.IPMatcher /
DNSMatcher per (containerID, neighborIndex). The previous code path
re-compiled every NetworkNeighbor's entries on each CEL function-cache
miss; this PR builds each matcher at most once per profile-checksum
lifetime and reuses it across subsequent misses.

Design:

  matcherCache (sync.Map) inside nnLibrary, zero-value safe so existing
  test fixtures that construct nnLibrary{} directly continue to work
  without changes.

  Per-container entry tagged with the profile's SyncChecksumMetadataKey
  annotation. On lookup: if checksum matches, reuse; else allocate a
  fresh containerMatchers and store with LoadOrStore (concurrent-safe).

  Per-neighbor matchers are nil-init and lazily compiled on first use,
  so a profile with 10 egress entries that only ever fires through 2 of
  them pays compile cost for only those 2.

Benchmarks (arm64, -benchtime=1s):

  IP, realistic profile (5 neighbors x 3 entries, observation misses all):
    Cold (per-call recompile): 1733 ns/op   1920 B/op   76 allocs/op
    Hot  (cached matchers)   :  177 ns/op     32 B/op    2 allocs/op
                             ~ -90% time, -98% bytes, -97% allocs

  DNS, realistic profile:
    Cold: 1219 ns/op   1800 B/op   41 allocs/op
    Hot :  318 ns/op    272 B/op    7 allocs/op
                             ~ -74% time, -85% bytes, -83% allocs

  Churning profile (checksum flips every iteration — pathological):
    1527 ns/op   1936 B/op   77 allocs/op
    Matches cold path: cache overhead itself is negligible; the savings
    come strictly from amortising compile across stable-checksum windows.

In production this stacks on top of the existing CEL functionCache
(which already absorbs same-(containerID,observed) cache hits). The
matcher cache catches what slips through: unique-observation cache
misses within a profile-checksum lifetime.

Touched:
  - matcher_cache.go             new file: cache impl
  - matcher_cache_bench_test.go  new file: comparison bench
  - network.go                   use cached matchers in all 6 CEL fns
  - nn.go                        matcherCache field on nnLibrary
…s replace (CR #42)

Two findings from CodeRabbit round 1, both fixed:

1. Stale-entry shape race in getOrBuild (Major)

   Old code used LoadOrStore on the staleness path and only replaced
   on checksum mismatch — but a shape mismatch (neighbor count change)
   could leak the stale entry to a caller whose profile has a different
   shape, which then index-panics in ipMatcher/dnsMatcher.

   Fix: when staleness is detected (by checksum OR shape), always
   Store unconditionally. Worst-case contention: several goroutines
   build shape-correct fresh entries and one Store wins; all callers
   still see a shape-correct entry. Orphans get GC'd.

2. Unsynchronised lazy-init of per-neighbor matchers (Critical)

   neighborMatchers.ip / .dns were *Matcher with a non-atomic 'if nil
   then build then assign' pattern — a real data race.

   Fix: switched to atomic.Pointer[networkmatch.IPMatcher] (and DNS).
   First-build callers may race on Compile but only one pointer wins
   via CompareAndSwap; everyone returns the winning matcher. Pure
   functions (no shared state) so duplicate Compile work is wasteful
   but not incorrect.

New tests in matcher_cache_test.go pin the contract:
  - TestMatcherCache_ConcurrentFirstBuild: 64 goroutines racing on
    the same slot, run under -race, asserts matchers are populated
    exactly once
  - TestMatcherCache_StaleEntryReplaced: shape-mismatch path returns
    a fresh containerMatchers, not the stale one
  - TestMatcherCache_ChecksumPreservedAcrossCalls: same checksum hits
    cache (no rebuild)

Benchmarks re-run after atomic.Pointer switch — negligible impact
(177 → 186 ns/op, still 8x faster than cold path). All headline
savings preserved.
…und 2)

Without the barrier, goroutine launch jitter staggers first-call
arrivals, hiding any unsynchronised-write data race during the
first-build window. With the barrier, all 64 goroutines hit the
contended path simultaneously when close(start) fires — much tighter
race-detector coverage of the atomic.Pointer.CompareAndSwap path.
Upstream brings:
- kubescape#788 Replace AP+NN cache with ContainerProfileCacheImpl (51 files)
- kubescape#799 Rule-Aware Profile Projection (53 files, projection_apply/compile/trie)
- kubescape#800 fix(exec path symmetric resolver)
- kubescape#797 learning-period label
- kubescape#798 syft v1.32.0-ks.2 perf
- kubescape#791 client CA from kubelet config
- kubescape#803 logging improvements
- kubescape#772 services from API

Conflicts resolved by taking upstream's version for the reshape-owned
packages (containerprofilecache, metricsmanager, profilehelper,
applicationprofile/*, rule_manager, rulepolicy, mock.go) and dropping
the matcher_cache.go layer entirely — upstream's ProjectedContainerProfile
already pre-compiles + caches per-profile, so a per-call matcher cache
on top would duplicate effort.

Fork-only tamper detection wiring restored on the new
ContainerProfileCacheImpl struct (tamperAlertExporter + tamperEmitted
fields). cmd/main.go updated to the new NewRulesWatcher signature.
Storage pinned to k8sstormcenter/storage@b23d85f0 (merge/upstream-profile-rearch
of the storage fork, which carries our networkmatch + IPAddresses
schema + upstream's clean-standalone-pods PR).

Wildcard behavior recovery (re-routing nn.* CEL functions through
networkmatch on the projected Patterns surface) lands as the next
commit; this one is the mechanical merge landing point.
…ction

Upstream's projection-v1 (PR kubescape#799) explicitly dropped two pieces of
behaviour that the fork's earlier wildcard work relied on:

  1. Network-surface Patterns (CIDRs, '*' sentinels, DNS leading-/mid-/
     trailing-wildcards) were never populated because projectField only
     routed entries to Patterns on path surfaces.
  2. wasExecutedWithArgs() degraded to path-only matching — 'args list
     is validated but not matched against', with a comment that
     ExecArgsByPath is 'future work'.

This commit re-introduces both, layered cleanly on top of the
projection rather than working around it:

Network wildcards (spec §5.7, §5.8)
- projectField: third parameter is now an isDynamic classifier rather
  than the isPathSurface bool. Path surfaces pass containsDynamicSegment;
  network surfaces pass isNetworkIPWildcard or isNetworkDNSWildcard
  (each fork-defined here).
- extractEgressAddresses / extractIngressAddresses now also pull the
  v0.0.2 IPAddresses[] list-form alongside the deprecated singular
  IPAddress (storage's networkmatch wildcards land in Patterns).
- CEL helpers (nn.was_address_in_*, nn.is_domain_in_*) now consult
  Values, All, and Patterns via networkmatch.MatchIP / MatchDNS.
- matchIPField canonicalises observed IPs (net.ParseIP) so IPv6 expanded
  forms and ::ffff: IPv4-mapped addresses hit the Values fast path.
- matchDNSField normalises trailing dots on observed and tries both
  forms against Values.

Exec-args matching restored
- ProjectedContainerProfile gains ExecsByPath map[string][]string —
  the per-Path Args slice from cp.Spec.Execs.
- extractExecsByPath populates it in projection_apply.
- wasExecutedWithArgs runs dynamicpathdetector.CompareExecArgs against
  the matched profile entry. Back-compat: a path with no ExecsByPath
  entry matches with no argv constraint (preserves old wasExecuted-
  equivalent behaviour for partial profiles).

Mock parity
- RuleObjectCacheMock.GetProjectedContainerProfile now routes the same
  classifications (network wildcards → Patterns, path dynamics →
  Patterns, ExecsByPath populated). Tests no longer need a real cache.
- ensureProjectedAllInit no longer mis-sets All=true (that's the
  match-any sentinel, not a comprehensiveness hint).

Tamper detection survives
- tamperAlertExporter + tamperEmitted fields re-added to the new
  ContainerProfileCacheImpl so the R1016 wiring keeps working.
- exporters import added.

Storage pin: k8sstormcenter/storage @ b23d85f0 (merge/upstream-
profile-rearch, which carries networkmatch + IPAddresses schema +
upstream's clean-standalone-pods).

Known port/protocol regression (degrade-noted in tests):
- was_address_port_protocol_in_egress / _in_ingress still degrade to
  address-only — port/protocol granularity needs an AddressPortsByAddr
  projection field which upstream noted as future work. Updated unit
  tests document the degradation; the only production rules that would
  exercise this didn't use the helper anyway.

Full suite: 46/48 packages green. 2 failing (containerwatcher/v2/tracers,
validator) are pre-existing eBPF kernel-privilege issues that reproduce
on main without root.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough
📝 Walkthrough
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch merge/upstream-profile-rearch

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/component-tests.yaml:
- Line 114: Update the GitHub Actions step that references the setup action:
replace the usage of actions/setup-go@v4 with actions/setup-go@v5 to adopt the
recommended version; locate the workflow step that currently uses the symbol
"actions/setup-go@v4" and change it to "actions/setup-go@v5", ensuring no other
changes to the step configuration are required and the Go version inputs remain
intact.

In `@cmd/main.go`:
- Around line 398-401: When calling config.LoadServiceURLs in main (the block
that sets failureReporter via sbommanagerv1.NewHTTPSbomFailureReporter), ensure
failures are logged: if svcErr != nil log a warning/error with svcErr.Error()
and contextual message that service URL load failed, and if svcErr == nil but
services.GetReportReceiverHttpUrl() == "" log a warning that SBOM failure
reporting is disabled due to empty report receiver URL; keep the existing
successful-case log for enabled reporting. This ensures both LoadServiceURLs
errors and empty URL cases are explicitly logged with context.

In `@pkg/objectcache/containerprofilecache/projection_apply.go`:
- Around line 217-228: extractExecsByPath currently aliases
cp.Spec.Execs[*].Args into the returned map which permits external mutation of
the original ContainerProfile; to fix, ensure you clone each ExecCall.Args slice
before assigning into the map inside extractExecsByPath (detect nil vs empty
slice properly), i.e. for each e in cp.Spec.Execs copy the contents of e.Args
into a new slice and store that under m[e.Path] so mutations of the projected
map do not affect cp.Spec.Execs.

In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 200-202: The loop that does a blocking send to each notifier
(iterating over notifiers and doing *n <-
rulebindingmanager.RuleBindingNotify{}) can hang RefreshRuleBindingsRules() if a
receiver is slow; change the fan-out to perform non-blocking sends or bounded
enqueue with drop/coalesce. Replace the direct blocking send in the loop with a
select that attempts to send on *n and falls back to either a default branch
(drop the event) or a timed case (e.g., use a short timeout) so slow/unread
channels do not block; alternatively ensure notifier channels are buffered and
coalesce by only sending when the channel is empty to avoid queuing duplicates.
Target the notifier send in the loop over notifiers and update the send strategy
accordingly.

In `@pkg/rulemanager/cel/libraries/applicationprofile/open_test.go`:
- Around line 203-208: The test in open_test.go weakens the
ap.was_path_opened_with_flags contract by expecting true when flags ("O_WRONLY")
don't match the recorded event; update the test case (the one with name "Path
matches but flags don't match" / containerID "test-container-id" / path
"/etc/passwd") to set expectedResult to false so the
ap.was_path_opened_with_flags helper continues to require flag equality, or
alternatively rename/deprecate the helper to a path-only name if you
intentionally want path-only semantics; ensure the change is made in the test
case for ap.was_path_opened_with_flags so behavior remains consistent.

In `@pkg/rulemanager/cel/libraries/cache/function_cache.go`:
- Around line 87-91: The code currently assumes values[0] is non-nil before
calling values[0].Value(), which can panic; update the guard in the function
that uses the values slice (the branch that checks "if len(values) == 0 || oc ==
nil") to also verify values[0] != nil (and ideally values[0].Value() != nil)
before calling Value(); if nil, return "" same as when the slice is empty, then
proceed with the type assertion to containerIDStr only after this nil-check to
avoid panics during rule evaluation.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/network.go`:
- Around line 43-45: The code currently treats ProjectedField.All as an
unconditional match by returning true for any field that has field.All set;
remove that unconditional return and instead treat ProjectedField.All as a
projection/retention mode (i.e., do not implicitly match unknown IPs/DNS) —
change the branch in the function containing "if field.All { return true }" so
it either skips special-casing (continue evaluation) or only matches when the
profile explicitly contains the value, and implement proper wildcard handling
via an explicit wildcard flag (e.g., ProjectedField.Wildcard) if wildcard
semantics are required; apply the same change to the other occurrence around the
second block (lines 66-68).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 79cb8537-b2c8-4c97-9adf-2009c1d5b532

📥 Commits

Reviewing files that changed from the base of the PR and between a3fdc33 and d714d21.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (79)
  • .github/workflows/component-tests.yaml
  • .gitignore
  • README.md
  • cmd/main.go
  • go.mod
  • pkg/config/config.go
  • pkg/containerprofilemanager/v1/event_reporting.go
  • pkg/containerprofilemanager/v1/event_reporting_test.go
  • pkg/metricsmanager/metrics_manager_interface.go
  • pkg/metricsmanager/metrics_manager_mock.go
  • pkg/metricsmanager/metrics_manager_noop.go
  • pkg/metricsmanager/prometheus/prometheus.go
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/objectcache/containerprofilecache/containerprofilecache_test.go
  • pkg/objectcache/containerprofilecache/init_eviction_test.go
  • pkg/objectcache/containerprofilecache/lock_stress_test.go
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/projection_apply_test.go
  • pkg/objectcache/containerprofilecache/projection_compile.go
  • pkg/objectcache/containerprofilecache/projection_compile_test.go
  • pkg/objectcache/containerprofilecache/projection_trie.go
  • pkg/objectcache/containerprofilecache/projection_trie_test.go
  • pkg/objectcache/containerprofilecache/reconciler.go
  • pkg/objectcache/containerprofilecache/reconciler_test.go
  • pkg/objectcache/containerprofilecache/shared_pointer_race_test.go
  • pkg/objectcache/containerprofilecache/t8_overlay_refresh_test.go
  • pkg/objectcache/containerprofilecache_interface.go
  • pkg/objectcache/projection_types.go
  • pkg/objectcache/v1/mock.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulemanager/cel/cel.go
  • pkg/rulemanager/cel/libraries/applicationprofile/ap.go
  • pkg/rulemanager/cel/libraries/applicationprofile/capability.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/http.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/syscall.go
  • pkg/rulemanager/cel/libraries/cache/function_cache.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/fixtures_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/nn.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/wildcard_test.go
  • pkg/rulemanager/profilehelper/profilehelper.go
  • pkg/rulemanager/rule_manager.go
  • pkg/rulemanager/rulepolicy.go
  • pkg/rulemanager/types/v1/profiledata.go
  • pkg/rulemanager/types/v1/profiledata_test.go
  • pkg/rulemanager/types/v1/types.go
  • tests/chart/crds/rules.crd.yaml
  • tests/chart/templates/node-agent/configmap.yaml
  • tests/chart/templates/node-agent/default-rules.yaml
  • tests/chart/values.yaml
  • tests/component_test.go
  • tests/resources/malicious-job.yaml
  • tests/resources/network-wildcards/01-literal-ipv4.yaml
  • tests/resources/network-wildcards/02-literal-ipv6.yaml
  • tests/resources/network-wildcards/03-cidr-ipv4.yaml
  • tests/resources/network-wildcards/04-cidr-ipv6.yaml
  • tests/resources/network-wildcards/05-any-ip-sentinel.yaml
  • tests/resources/network-wildcards/06-any-as-cidr.yaml
  • tests/resources/network-wildcards/07-mixed-ip-list.yaml
  • tests/resources/network-wildcards/08-deprecated-ipaddress.yaml
  • tests/resources/network-wildcards/09-dns-literal.yaml
  • tests/resources/network-wildcards/10-dns-leading-wildcard.yaml
  • tests/resources/network-wildcards/11-dns-mid-ellipsis.yaml
  • tests/resources/network-wildcards/12-dns-trailing-star.yaml
  • tests/resources/network-wildcards/13-dns-trailing-dot-normalisation.yaml
  • tests/resources/network-wildcards/14-recursive-star-rejected.yaml
  • tests/resources/network-wildcards/15-egress-and-ingress.yaml
  • tests/resources/network-wildcards/16-egress-none.yaml
  • tests/resources/network-wildcards/17-realistic-stripe-api.yaml
  • tests/resources/network-wildcards/18-cluster-dns-via-mid-ellipsis.yaml
  • tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml
  • tests/resources/network-wildcards/20-multi-container-mixed-wildcards.yaml
  • tests/resources/network-wildcards/README.md

Comment thread .github/workflows/component-tests.yaml Outdated
Comment thread cmd/main.go Outdated
Comment thread pkg/objectcache/containerprofilecache/projection_apply.go
Comment thread pkg/rulebindingmanager/cache/cache.go
Comment thread pkg/rulemanager/cel/libraries/applicationprofile/open_test.go Outdated
Comment thread pkg/rulemanager/cel/libraries/cache/function_cache.go Outdated
Comment thread pkg/rulemanager/cel/libraries/networkneighborhood/network.go Outdated
Two findings on code introduced by the wildcards recovery commit; the
other 5 findings touch upstream code I didn't modify in this rebase
and are out of scope.

CRITICAL — drop field.All short-circuit in matchIPField/matchDNSField:

  ProjectedField.All is the producer-side flag set by projectField when
  no rule declared profileDataRequired for the surface (pass-through
  retention mode). In that mode projectField already populates Values
  with every raw entry, so the Values lookup catches the match.
  Treating All=true as a 'match any input' sentinel in the consumer
  would let an unknown IP/DNS match even when absent from the profile
  — a false-positive admission bug.

  Removed both All short-circuits. Values + Patterns lookups cover the
  semantic correctly: pass-through projects everything into Values;
  rule-declared mode filters Values to the declared subset and routes
  wildcards to Patterns. Either way, an unknown observation falls
  through to false.

MAJOR — clone Args slice in extractExecsByPath:

  Apply() is contract-bound to be a pure transform of the source
  profile. extractExecsByPath was aliasing cp.Spec.Execs[i].Args into
  the projected map, so a consumer mutating the projected slice could
  silently corrupt the underlying CRD pointer. Cloned via copy() so
  Apply stays observably pure.

Skipped (upstream code not touched in this rebase):
  - actions/setup-go@v4 in component-tests.yaml
  - Silent SBOM URL error in cmd/main.go
  - Blocking channel send in rulebindingmanager/cache.go
  - Pre-existing was_path_opened_with_flags behaviour in open_test.go
  - Nil CEL arg guard in function_cache.go
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 13, 2026

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Comments resolved and changes approved.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/rulemanager/cel/libraries/applicationprofile/ap.go (1)

51-304: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Factor the overload scaffolding into one helper.

The metric gate, profile-hash cache scoping, and ConvertProfileNotAvailableErrToBool path are now duplicated in every binding. Centralizing that flow will make it much harder for future helpers to miss one of the three behaviors.

♻️ Sketch of the extraction
+func (l *apLibrary) bindCachedBoolHelper(
+	overloadName string,
+	metricName string,
+	argTypes []*cel.Type,
+	fn func(...ref.Val) ref.Val,
+) cel.FunctionOpt {
+	expectedArgs := len(argTypes)
+
+	return cel.Overload(
+		overloadName, argTypes, cel.BoolType,
+		cel.FunctionBinding(func(values ...ref.Val) ref.Val {
+			if len(values) != expectedArgs {
+				return types.NewErr("expected %d arguments, got %d", expectedArgs, len(values))
+			}
+			if l.detailedMetrics && l.metrics != nil {
+				l.metrics.IncHelperCall(metricName)
+			}
+			cachedFunc := l.functionCache.WithCache(fn, metricName, cache.HashForContainerProfile(l.objectCache))
+			return cache.ConvertProfileNotAvailableErrToBool(cachedFunc(values...), false)
+		}),
+	)
+}
+
 func (l *apLibrary) Declarations() map[string][]cel.FunctionOpt {
 	return map[string][]cel.FunctionOpt{
 		"ap.was_executed": {
-			cel.Overload(
-				"ap_was_executed", []*cel.Type{cel.StringType, cel.StringType}, cel.BoolType,
-				cel.FunctionBinding(func(values ...ref.Val) ref.Val {
-					...
-				}),
-			),
+			l.bindCachedBoolHelper(
+				"ap_was_executed",
+				"ap.was_executed",
+				[]*cel.Type{cel.StringType, cel.StringType},
+				func(args ...ref.Val) ref.Val { return l.wasExecuted(args[0], args[1]) },
+			),
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go` around lines 51 -
304, The Declarations() overload bindings duplicate the same scaffold (metrics
gate l.detailedMetrics & l.metrics.IncHelperCall, wrapping the domain call,
using l.functionCache.WithCache(...,
cache.HashForContainerProfile(l.objectCache)), and post-processing with
cache.ConvertProfileNotAvailableErrToBool), so extract a helper (e.g.,
buildBoolOverload or makeAPBoolBinding) that accepts the overload name, expected
arity/types, the domain call wrapper (like l.wasExecuted, l.wasExecutedWithArgs,
l.wasPathOpened, l.wasPathOpenedWithSuffix, l.wasPathOpenedWithPrefix,
l.wasSyscallUsed, l.wasCapabilityUsed, l.wasEndpointAccessed,
l.wasEndpointAccessedWithMethod, l.wasEndpointAccessedWithMethods,
l.wasEndpointAccessedWithPrefix, l.wasEndpointAccessedWithSuffix,
l.wasHostAccessed), and the metric key; have the helper perform the arg-count
check, metrics increment, create wrapperFunc calling the provided domain
function, call l.functionCache.WithCache(wrapperFunc, metricKey,
cache.HashForContainerProfile(l.objectCache)), invoke the cached func with
values..., and return cache.ConvertProfileNotAvailableErrToBool(result, false);
then replace each inline cel.FunctionBinding in Declarations() with a call to
that helper to remove duplication.
♻️ Duplicate comments (1)
tests/component_test.go (1)

3537-3605: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on ExecIntoPod errors in Test_32 subtests.

These subtests still log err but continue. If exec fails, both R0001 and R0040 may be absent, producing false passes for “must stay silent” cases.

✅ Minimal fix
 t.Run("sh_dash_c_matches_wildcard_trailing", func(t *testing.T) {
   wl := setup(t)
   stdout, stderr, err := wl.ExecIntoPod([]string{"sh", "-c", "echo hi"}, "curl")
   t.Logf("sh -c 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+  require.NoError(t, err, "exec must succeed before asserting R0001/R0040 behavior")
@@
 t.Run("sh_dash_x_mismatches_R0040", func(t *testing.T) {
@@
   t.Logf("sh -x 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+  require.NoError(t, err, "exec must succeed before asserting R0001/R0040 behavior")
@@
 t.Run("echo_hello_matches_wildcard_trailing", func(t *testing.T) {
@@
   t.Logf("echo hello world from test → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+  require.NoError(t, err, "exec must succeed before asserting R0001/R0040 behavior")
@@
 t.Run("echo_goodbye_mismatches_R0040", func(t *testing.T) {
@@
   t.Logf("echo goodbye world → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+  require.NoError(t, err, "exec must succeed before asserting R0001/R0040 behavior")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/component_test.go` around lines 3537 - 3605, The subtests call
wl.ExecIntoPod(...) and log err but continue, which can hide execution failures;
update each Test_32 subtest (e.g., the blocks calling ExecIntoPod in
"sh_dash_c_matches_wildcard_trailing", "sh_dash_x_mismatches_R0040",
"echo_hello_matches_wildcard_trailing", "echo_goodbye_mismatches_R0040") to fail
fast when ExecIntoPod returns an error by asserting success immediately (use
require.NoError(t, err) or t.Fatalf with a clear message) right after the
ExecIntoPod call before calling waitAlerts/assertR0001Silent/countByRule so
tests don't produce false passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 128-176: Handlers AddHandler, ModifyHandler, and DeleteHandler are
calling addRuleBinding, modifiedRuleBinding, and deleteRuleBinding (and their
downstream helpers) without passing the watcher ctx, allowing long-running k8s
work to continue even after context cancellation while c.mutex is held; fix by
threading the supplied ctx into those calls (change signatures of
addRuleBinding, modifiedRuleBinding, deleteRuleBinding and any internal helpers
that currently use context.Background() to accept a context) and replace
internal uses of context.Background() with the propagated ctx so cancellation
unblocks work and avoids holding c.mutex during canceled operations.

---

Outside diff comments:
In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go`:
- Around line 51-304: The Declarations() overload bindings duplicate the same
scaffold (metrics gate l.detailedMetrics & l.metrics.IncHelperCall, wrapping the
domain call, using l.functionCache.WithCache(...,
cache.HashForContainerProfile(l.objectCache)), and post-processing with
cache.ConvertProfileNotAvailableErrToBool), so extract a helper (e.g.,
buildBoolOverload or makeAPBoolBinding) that accepts the overload name, expected
arity/types, the domain call wrapper (like l.wasExecuted, l.wasExecutedWithArgs,
l.wasPathOpened, l.wasPathOpenedWithSuffix, l.wasPathOpenedWithPrefix,
l.wasSyscallUsed, l.wasCapabilityUsed, l.wasEndpointAccessed,
l.wasEndpointAccessedWithMethod, l.wasEndpointAccessedWithMethods,
l.wasEndpointAccessedWithPrefix, l.wasEndpointAccessedWithSuffix,
l.wasHostAccessed), and the metric key; have the helper perform the arg-count
check, metrics increment, create wrapperFunc calling the provided domain
function, call l.functionCache.WithCache(wrapperFunc, metricKey,
cache.HashForContainerProfile(l.objectCache)), invoke the cached func with
values..., and return cache.ConvertProfileNotAvailableErrToBool(result, false);
then replace each inline cel.FunctionBinding in Declarations() with a call to
that helper to remove duplication.

---

Duplicate comments:
In `@tests/component_test.go`:
- Around line 3537-3605: The subtests call wl.ExecIntoPod(...) and log err but
continue, which can hide execution failures; update each Test_32 subtest (e.g.,
the blocks calling ExecIntoPod in "sh_dash_c_matches_wildcard_trailing",
"sh_dash_x_mismatches_R0040", "echo_hello_matches_wildcard_trailing",
"echo_goodbye_mismatches_R0040") to fail fast when ExecIntoPod returns an error
by asserting success immediately (use require.NoError(t, err) or t.Fatalf with a
clear message) right after the ExecIntoPod call before calling
waitAlerts/assertR0001Silent/countByRule so tests don't produce false passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1730d651-bd2c-4ce6-8f21-6415cbe798c4

📥 Commits

Reviewing files that changed from the base of the PR and between 656deb5 and f3e3d20.

📒 Files selected for processing (9)
  • pkg/objectcache/containerprofilecache/projection.go
  • pkg/objectcache/containerprofilecache/test32_projection_test.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/ap.go
  • pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
  • tests/component_test.go

Comment thread pkg/rulebindingmanager/cache/cache.go
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.226 0.222 -1.8%
Peak CPU (cores) 0.238 0.233 -2.1%
Avg Memory (MiB) 336.288 269.195 -20.0%
Peak Memory (MiB) 338.117 270.387 -20.0%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1762 119391 98.5%
network 903 77994 98.9%
open 29849 621788 95.4%
symlink 6000 0 0.0%
syscall 972 1878 65.9%
Event Counters
Metric BEFORE AFTER
capability_counter 9 9
dns_counter 1401 1437
exec_counter 7266 7188
network_counter 95068 94536
open_counter 789901 781827
syscall_counter 3520 3586

addRuleBinding's K8s List calls used context.Background(), which
leaked goroutines past the watcher's context cancellation. Thread
ctx from AddHandler/ModifyHandler/DeleteHandler through the helpers
into the namespaces.List + pods.List calls.

deleteRuleBinding takes ctx for signature parity and to future-proof
against the helper growing K8s API calls. modifiedRuleBinding
propagates ctx into its delete+add calls.

Adds TestRBCacheHelpers_CtxFirstArg as a compile-time + runtime
regression guard on the ctx-first-arg contract.
@entlein
Copy link
Copy Markdown
Author

entlein commented May 15, 2026

@coderabbitai review

Addresses the major finding on cache.go:176: ctx is now propagated through addRuleBinding / deleteRuleBinding / modifiedRuleBinding so the watcher's cancellation signal reaches the K8s List calls. New TestRBCacheHelpers_CtxFirstArg pins the contract.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the ctx propagation in cache.go
rg -n "ctx" pkg/rulebindingmanager/cache/cache.go -A2 -B2 | head -80

echo "---"
# Check the new test
rg -n "TestRBCacheHelpers_CtxFirstArg" pkg/rulebindingmanager/cache/ -A 30

Length of output: 7321


@entlein: The ctx propagation fix looks solid. Here's a quick verification summary:

pkg/rulebindingmanager/cache/cache.go

  • addRuleBinding(ctx context.Context, ...), modifiedRuleBinding(ctx context.Context, ...), and deleteRuleBinding(ctx context.Context, ...) all now carry the watcher's ctx as their first argument.
  • The two K8s API calls that previously would have used context.Background() now use the propagated ctx:
    • Line 270: Namespaces().List(ctx, ...)
    • Line 285: Pods(...).List(ctx, ...)
  • AddHandler, ModifyHandler, and DeleteHandler all correctly thread their ctx down to the respective helpers.

pkg/rulebindingmanager/cache/cache_test.goTestRBCacheHelpers_CtxFirstArg

  • Typed function variable assignments (var addFn func(context.Context, ...) ...) provide a compile-time guard — signature drift causes a build failure immediately.
  • The reflect check adds a runtime assertion with a diagnostic that explicitly references the original finding (PR #43 / cache.go:176).

This is a robust approach to preventing regression. Triggering a full review of the PR now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (1)
tests/component_test.go (1)

3546-3548: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if ExecIntoPod fails before asserting R0001/R0040 outcomes.

Line 3546, Line 3565, Line 3583, and Line 3602 assert alert behavior without first requiring success of the preceding exec call. In the “must stay silent” cases, a failed exec can incorrectly pass the subtest (no alerts observed). Add require.NoError(t, err) immediately after each ExecIntoPod(...) in these subtests.

Suggested minimal fix
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"sh", "-c", "echo hi"}, "curl")
 		t.Logf("sh -c 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err)
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"sh", "-x", "echo hi"}, "curl")
 		t.Logf("sh -x 'echo hi' → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err)
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"echo", "hello", "world", "from", "test"}, "curl")
 		t.Logf("echo hello world from test → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err)
@@
 		stdout, stderr, err := wl.ExecIntoPod([]string{"echo", "goodbye", "world"}, "curl")
 		t.Logf("echo goodbye world → err=%v stdout=%q stderr=%q", err, stdout, stderr)
+		require.NoError(t, err)

Also applies to: 3565-3567, 3583-3585, 3602-3604

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/component_test.go` around lines 3546 - 3548, Tests are asserting alert
outcomes without verifying ExecIntoPod succeeded; update each affected subtest
to fail-fast by adding require.NoError(t, err) immediately after the
ExecIntoPod(...) call (before calling assertR0001Silent, assert.Equal,
countByRule, etc.), so references like ExecIntoPod, assertR0001Silent,
countByRule and assert.Equal only run when err is nil; apply this change to the
subtests that currently call ExecIntoPod and then assert R0001/R0040 outcomes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectcache/containerprofilecache/containerprofilecache.go`:
- Around line 617-636: Update the comment in
ContainerProfileCacheImpl.SetProjectionSpec to describe the nudge channel's
coalescing semantics: explain that c.nudge is a buffered channel of capacity 1
and the non-blocking send intentionally drops extra signals because the
reconciler will always observe the latest spec (c.currentSpec) on its next
reconciliation tick, making multiple nudges idempotent/safe; mention that
specGeneration and optional metrics (metricsManager.IncProjectionSpecHashChange)
still record each change even if the channel send is coalesced so state and
observability remain correct.

In `@pkg/rulemanager/cel/libraries/applicationprofile/exec.go`:
- Around line 93-118: Exact-path entries in cp.Execs.Values must short-circuit
decision: if pathStr is present and cp.ExecsByPath[pathStr] exists but
dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) returns false,
return types.Bool(false) immediately instead of falling through to
cp.Execs.Patterns; update the block that checks cp.Execs.Values to return false
on non-matching args (while preserving the existing behavior that a missing
cp.ExecsByPath entry returns true), so that exact-match logic in the function
(referencing cp.Execs.Values, cp.ExecsByPath,
dynamicpathdetector.CompareExecArgs, cp.Execs.Patterns and
dynamicpathdetector.CompareDynamic) always takes precedence over pattern
matching.

In `@pkg/rulemanager/cel/libraries/applicationprofile/http.go`:
- Around line 74-84: The current path-only loop in the projection (iterating
cp.Endpoints.Values and cp.Endpoints.Patterns and returning types.Bool(true) on
dynamicpathdetector.CompareDynamic) makes
ap.was_endpoint_accessed_with_method(s) ignore HTTP verbs; change the logic to
perform verb-aware matching: after a path match, consult the endpoint->methods
mapping (e.g., the method map associated with cp.Endpoints or whatever structure
stores methods for each path) to verify the requested method is present before
returning true; if the method map is not yet implemented or the method is not
found, return a conservative non-match (types.Bool(false>) or the established
error path rather than returning true. Apply the same change to the other
similar block that mirrors this logic (the other ap.was_endpoint_* projection
code).

In `@pkg/rulemanager/cel/libraries/cache/function_cache.go`:
- Around line 115-117: The loop over extraKeyFn can panic if a nil callback is
present; update the loop that constructs key (the for _, fn := range extraKeyFn
block) to skip nil entries before invoking them (e.g., if fn == nil { continue
}) so you only call fn(values) when fn is non-nil, preserving the existing key
concatenation logic.

In `@pkg/rulemanager/cel/libraries/parse/parse.go`:
- Around line 69-77: The code currently returns an absolute argv[0] when
argv0[0]=='/' before checking exepathStr; change the precedence so that
exepathStr is treated as authoritative when non-empty: in the function handling
argv0/exepath (the code referencing argv0 and exepathStr and returning
types.String), first check if exepathStr != "" and return
types.String(exepathStr), and only if exepathStr is empty fall back to
validating argv0 (e.g., ensure len(argv0)>0 and argv0[0]=='/') before returning
types.String(argv0); update any related comments to reflect that event.exepath
is authoritative.

In `@pkg/rulemanager/profilehelper/profilehelper.go`:
- Around line 12-16: GetProjectedContainerProfile currently dereferences
objectCache by calling objectCache.ContainerProfileCache() without checking for
nil; add a nil-check for objectCache at the start of
GetProjectedContainerProfile and return an appropriate error (e.g., "nil object
cache") if it's nil before calling objectCache.ContainerProfileCache(), keeping
the existing error behavior for a nil ContainerProfileCache.

In `@README.md`:
- Around line 347-352: The README entry for "wasExecutedWithArgs(containerID,
path, args)" is stale: update the v1 Behaviour cell to state that it matches
both path and arguments (including wildcard-aware matching) instead of saying
args are ignored, and remove or replace the incorrect note claiming per-argument
matching (ExecArgsByPath) will be added in a future version; ensure the table
text explicitly references wasExecutedWithArgs and ExecArgsByPath so readers
know current argument-matching behavior is supported in v1.

In `@tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml`:
- Around line 3-18: Update the fixture comment to reflect the current known
limitation that the matcher is address-only for the
was_address_port_protocol_in_* family (e.g.,
nn.was_address_port_protocol_in_egress) instead of enforcing full
(IP,port,protocol) tuples: change the expectations and explanatory text to
explicitly state “address-only matching — ports/protocols not currently
enforced” (or add a “Known limitation” note) so the documented behavior no
longer asserts port/protocol enforcement that the implementation does not
perform.

In `@tests/resources/network-wildcards/README.md`:
- Line 74: Replace the incorrect acronym "SBoBs" with "SBOMs" in the migration
note that reads "Producers writing v0.0.2-conformant SBoBs SHOULD use
`ipAddresses` (plural)." — update the string "SBoBs" to "SBOMs" so the text
correctly reads "Producers writing v0.0.2-conformant SBOMs SHOULD use
`ipAddresses` (plural)."

---

Duplicate comments:
In `@tests/component_test.go`:
- Around line 3546-3548: Tests are asserting alert outcomes without verifying
ExecIntoPod succeeded; update each affected subtest to fail-fast by adding
require.NoError(t, err) immediately after the ExecIntoPod(...) call (before
calling assertR0001Silent, assert.Equal, countByRule, etc.), so references like
ExecIntoPod, assertR0001Silent, countByRule and assert.Equal only run when err
is nil; apply this change to the subtests that currently call ExecIntoPod and
then assert R0001/R0040 outcomes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8b2e3cd7-b6cb-415f-8545-50fc0d5278e3

📥 Commits

Reviewing files that changed from the base of the PR and between a3fdc33 and 8b34f59.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (86)
  • .github/workflows/component-tests.yaml
  • .gitignore
  • README.md
  • cmd/main.go
  • go.mod
  • pkg/config/config.go
  • pkg/containerprofilemanager/v1/event_reporting.go
  • pkg/containerprofilemanager/v1/event_reporting_test.go
  • pkg/exporters/alert_manager.go
  • pkg/metricsmanager/metrics_manager_interface.go
  • pkg/metricsmanager/metrics_manager_mock.go
  • pkg/metricsmanager/metrics_manager_noop.go
  • pkg/metricsmanager/prometheus/prometheus.go
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/objectcache/containerprofilecache/containerprofilecache_test.go
  • pkg/objectcache/containerprofilecache/init_eviction_test.go
  • pkg/objectcache/containerprofilecache/lock_stress_test.go
  • pkg/objectcache/containerprofilecache/projection.go
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/projection_apply_test.go
  • pkg/objectcache/containerprofilecache/projection_compile.go
  • pkg/objectcache/containerprofilecache/projection_compile_test.go
  • pkg/objectcache/containerprofilecache/projection_trie.go
  • pkg/objectcache/containerprofilecache/projection_trie_test.go
  • pkg/objectcache/containerprofilecache/reconciler.go
  • pkg/objectcache/containerprofilecache/reconciler_test.go
  • pkg/objectcache/containerprofilecache/shared_pointer_race_test.go
  • pkg/objectcache/containerprofilecache/t8_overlay_refresh_test.go
  • pkg/objectcache/containerprofilecache/tamper_alert_test.go
  • pkg/objectcache/containerprofilecache/test32_projection_test.go
  • pkg/objectcache/containerprofilecache_interface.go
  • pkg/objectcache/projection_types.go
  • pkg/objectcache/v1/mock.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/rulemanager/cel/cel.go
  • pkg/rulemanager/cel/libraries/applicationprofile/ap.go
  • pkg/rulemanager/cel/libraries/applicationprofile/capability.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/http.go
  • pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/syscall.go
  • pkg/rulemanager/cel/libraries/cache/function_cache.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/fixtures_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/nn.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/wildcard_test.go
  • pkg/rulemanager/cel/libraries/parse/parse.go
  • pkg/rulemanager/cel/libraries/parse/parselib.go
  • pkg/rulemanager/cel/libraries/parse/parsing_test.go
  • pkg/rulemanager/profilehelper/profilehelper.go
  • pkg/rulemanager/rule_manager.go
  • pkg/rulemanager/rulepolicy.go
  • pkg/rulemanager/types/v1/profiledata.go
  • pkg/rulemanager/types/v1/profiledata_test.go
  • pkg/rulemanager/types/v1/types.go
  • tests/chart/templates/node-agent/default-rule-binding.yaml
  • tests/chart/templates/node-agent/default-rules.yaml
  • tests/component_test.go
  • tests/resources/malicious-job.yaml
  • tests/resources/network-wildcards/01-literal-ipv4.yaml
  • tests/resources/network-wildcards/02-literal-ipv6.yaml
  • tests/resources/network-wildcards/03-cidr-ipv4.yaml
  • tests/resources/network-wildcards/04-cidr-ipv6.yaml
  • tests/resources/network-wildcards/05-any-ip-sentinel.yaml
  • tests/resources/network-wildcards/06-any-as-cidr.yaml
  • tests/resources/network-wildcards/07-mixed-ip-list.yaml
  • tests/resources/network-wildcards/08-deprecated-ipaddress.yaml
  • tests/resources/network-wildcards/09-dns-literal.yaml
  • tests/resources/network-wildcards/10-dns-leading-wildcard.yaml
  • tests/resources/network-wildcards/11-dns-mid-ellipsis.yaml
  • tests/resources/network-wildcards/12-dns-trailing-star.yaml
  • tests/resources/network-wildcards/13-dns-trailing-dot-normalisation.yaml
  • tests/resources/network-wildcards/14-recursive-star-rejected.yaml
  • tests/resources/network-wildcards/15-egress-and-ingress.yaml
  • tests/resources/network-wildcards/16-egress-none.yaml
  • tests/resources/network-wildcards/17-realistic-stripe-api.yaml
  • tests/resources/network-wildcards/18-cluster-dns-via-mid-ellipsis.yaml
  • tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml
  • tests/resources/network-wildcards/20-multi-container-mixed-wildcards.yaml
  • tests/resources/network-wildcards/README.md

Comment thread pkg/objectcache/containerprofilecache/containerprofilecache.go
Comment on lines +93 to +118
// Exact path match: walk the profile's Args for that path via
// CompareExecArgs (handles ⋯ single-arg and * zero-or-more tokens).
if _, ok := cp.Execs.Values[pathStr]; ok {
if profileArgs, ok := cp.ExecsByPath[pathStr]; ok {
if dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) {
return types.Bool(true)
}
} else {
// No ExecsByPath entry for this path — back-compat: treat as
// "no argv constraint", match.
return types.Bool(true)
}
}
// Pattern path match: dynamic-segment paths in cp.Execs.Patterns.
// Args matching mirrors the exact-path case.
for _, execPath := range cp.Execs.Patterns {
if dynamicpathdetector.CompareDynamic(execPath, pathStr) {
if profileArgs, ok := cp.ExecsByPath[execPath]; ok {
if dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) {
return types.Bool(true)
}
} else {
return types.Bool(true)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exact path match should return false when args don't match, not fall through to pattern matching.

If pathStr exists in cp.Execs.Values and has a corresponding ExecsByPath entry with non-matching args, the function falls through to check cp.Execs.Patterns. This allows an exact path with strict args to be bypassed by a less-specific dynamic pattern, which seems incorrect—exact matches should take precedence.

Proposed fix
 	// Exact path match: walk the profile's Args for that path via
 	// CompareExecArgs (handles ⋯ single-arg and * zero-or-more tokens).
 	if _, ok := cp.Execs.Values[pathStr]; ok {
 		if profileArgs, ok := cp.ExecsByPath[pathStr]; ok {
 			if dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) {
 				return types.Bool(true)
 			}
+			// Exact path matched but args did not — do not fall through to
+			// pattern matching; return false after checking pod spec.
+			if l.isExecInPodSpec(containerID, path).Value().(bool) {
+				return types.Bool(true)
+			}
+			return types.Bool(false)
 		} else {
 			// No ExecsByPath entry for this path — back-compat: treat as
 			// "no argv constraint", match.
 			return types.Bool(true)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/exec.go` around lines 93 -
118, Exact-path entries in cp.Execs.Values must short-circuit decision: if
pathStr is present and cp.ExecsByPath[pathStr] exists but
dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) returns false,
return types.Bool(false) immediately instead of falling through to
cp.Execs.Patterns; update the block that checks cp.Execs.Values to return false
on non-matching args (while preserving the existing behavior that a missing
cp.ExecsByPath entry returns true), so that exact-match logic in the function
(referencing cp.Execs.Values, cp.ExecsByPath,
dynamicpathdetector.CompareExecArgs, cp.Execs.Patterns and
dynamicpathdetector.CompareDynamic) always takes precedence over pattern
matching.

Comment on lines +74 to 84
// EndpointMethodsByPath is out of scope for v1 — check path membership only.
for ep := range cp.Endpoints.Values {
if dynamicpathdetector.CompareDynamic(ep, endpointStr) {
return types.Bool(true)
}
}
for _, ep := range cp.Endpoints.Patterns {
if dynamicpathdetector.CompareDynamic(ep, endpointStr) {
return types.Bool(true)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Method-aware helpers currently ignore HTTP method semantics.

ap.was_endpoint_accessed_with_method and ap.was_endpoint_accessed_with_methods now return true on path match alone. This can hide method violations (e.g., POST queried, only GET observed), which weakens rule enforcement.

Please restore verb-aware matching in projection (e.g., method map per path). Until that exists, return a conservative non-match/error path instead of path-only success.

Also applies to: 113-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/http.go` around lines 74 -
84, The current path-only loop in the projection (iterating cp.Endpoints.Values
and cp.Endpoints.Patterns and returning types.Bool(true) on
dynamicpathdetector.CompareDynamic) makes
ap.was_endpoint_accessed_with_method(s) ignore HTTP verbs; change the logic to
perform verb-aware matching: after a path match, consult the endpoint->methods
mapping (e.g., the method map associated with cp.Endpoints or whatever structure
stores methods for each path) to verify the requested method is present before
returning true; if the method map is not yet implemented or the method is not
found, return a conservative non-match (types.Bool(false>) or the established
error path rather than returning true. Apply the same change to the other
similar block that mirrors this logic (the other ap.was_endpoint_* projection
code).

Comment on lines +115 to +117
for _, fn := range extraKeyFn {
key += "|" + fn(values)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Skip nil extraKeyFn callbacks before invocation.

A nil callback in extraKeyFn will panic when called.

Suggested patch
-		for _, fn := range extraKeyFn {
-			key += "|" + fn(values)
+		for _, keyFn := range extraKeyFn {
+			if keyFn == nil {
+				continue
+			}
+			key += "|" + keyFn(values)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, fn := range extraKeyFn {
key += "|" + fn(values)
}
for _, keyFn := range extraKeyFn {
if keyFn == nil {
continue
}
key += "|" + keyFn(values)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/cache/function_cache.go` around lines 115 -
117, The loop over extraKeyFn can panic if a nil callback is present; update the
loop that constructs key (the for _, fn := range extraKeyFn block) to skip nil
entries before invoking them (e.g., if fn == nil { continue }) so you only call
fn(values) when fn is non-nil, preserving the existing key concatenation logic.

Comment on lines +69 to +77
// Tier 1: absolute argv[0] wins. Symlink-faithful.
if len(argv0) > 0 && argv0[0] == '/' {
return types.String(argv0)
}
}

// Tier 2: kernel-authoritative exepath when argv[0] is bare/empty.
if exepathStr != "" {
return types.String(exepathStr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Do not trust absolute argv[0] over exepath for identity.

Line 69–77 introduces a spoofing path: argv[0] is process-controlled, so an attacker can present an allowlisted absolute path while executing a different binary. Use event.exepath as authoritative when available.

Safer precedence sketch
-    // Tier 1: absolute argv[0] wins. Symlink-faithful.
-    if len(argv0) > 0 && argv0[0] == '/' {
-        return types.String(argv0)
-    }
+    // Prefer kernel-resolved executable path when present.
+    if exepathStr != "" {
+        return types.String(exepathStr)
+    }
+    // Fall back to argv[0] only when exepath is unavailable.
+    if len(argv0) > 0 && argv0[0] == '/' {
+        return types.String(argv0)
+    }
@@
-    // Tier 2: kernel-authoritative exepath when argv[0] is bare/empty.
-    if exepathStr != "" {
-        return types.String(exepathStr)
-    }
+    // exepath handled above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/parse/parse.go` around lines 69 - 77, The code
currently returns an absolute argv[0] when argv0[0]=='/' before checking
exepathStr; change the precedence so that exepathStr is treated as authoritative
when non-empty: in the function handling argv0/exepath (the code referencing
argv0 and exepathStr and returning types.String), first check if exepathStr !=
"" and return types.String(exepathStr), and only if exepathStr is empty fall
back to validating argv0 (e.g., ensure len(argv0)>0 and argv0[0]=='/') before
returning types.String(argv0); update any related comments to reflect that
event.exepath is authoritative.

Comment on lines +12 to 16
func GetProjectedContainerProfile(objectCache objectcache.ObjectCache, containerID string) (*objectcache.ProjectedContainerProfile, string, error) {
cpc := objectCache.ContainerProfileCache()
if cpc == nil {
return nil, "", errors.New("no container profile cache available")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard nil object cache before dereference.

This helper calls objectCache.ContainerProfileCache() without checking whether objectCache is nil, which can panic instead of returning an error.

Suggested patch
 func GetProjectedContainerProfile(objectCache objectcache.ObjectCache, containerID string) (*objectcache.ProjectedContainerProfile, string, error) {
+	if objectCache == nil {
+		return nil, "", errors.New("object cache is nil")
+	}
 	cpc := objectCache.ContainerProfileCache()
 	if cpc == nil {
 		return nil, "", errors.New("no container profile cache available")
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func GetProjectedContainerProfile(objectCache objectcache.ObjectCache, containerID string) (*objectcache.ProjectedContainerProfile, string, error) {
cpc := objectCache.ContainerProfileCache()
if cpc == nil {
return nil, "", errors.New("no container profile cache available")
}
func GetProjectedContainerProfile(objectCache objectcache.ObjectCache, containerID string) (*objectcache.ProjectedContainerProfile, string, error) {
if objectCache == nil {
return nil, "", errors.New("object cache is nil")
}
cpc := objectCache.ContainerProfileCache()
if cpc == nil {
return nil, "", errors.New("no container profile cache available")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/profilehelper/profilehelper.go` around lines 12 - 16,
GetProjectedContainerProfile currently dereferences objectCache by calling
objectCache.ContainerProfileCache() without checking for nil; add a nil-check
for objectCache at the start of GetProjectedContainerProfile and return an
appropriate error (e.g., "nil object cache") if it's nil before calling
objectCache.ContainerProfileCache(), keeping the existing error behavior for a
nil ContainerProfileCache.

Comment thread README.md
Comment on lines +347 to +352
### CEL Helper Limitations (v1)

| Helper | v1 Behaviour | Note |
|--------|-------------|------|
| `wasExecutedWithArgs(containerID, path, args)` | Equivalent to `wasExecuted(containerID, path)` — the `args` list is validated for type correctness but is **not** matched against the recorded argument list. Any execution of the given path returns `true` regardless of its arguments. | Full per-argument matching (`ExecArgsByPath`) will be added in a future version. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale wasExecutedWithArgs behavior docs.

This table entry contradicts current behavior: argument matching is active (including wildcard-aware matching), not path-only. Keeping this text will mislead rule authors and can cause incorrect detections.

Suggested doc fix
-| `wasExecutedWithArgs(containerID, path, args)` | Equivalent to `wasExecuted(containerID, path)` — the `args` list is validated for type correctness but is **not** matched against the recorded argument list. Any execution of the given path returns `true` regardless of its arguments. | Full per-argument matching (`ExecArgsByPath`) will be added in a future version. |
+| `wasExecutedWithArgs(containerID, path, args)` | Matches `path` **and** execution arguments against the projected profile. Argument matching supports anchored literal args and wildcard tokens used by the matcher. | Prefer this helper when argv constraints matter; use `wasExecuted` for path-only checks. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### CEL Helper Limitations (v1)
| Helper | v1 Behaviour | Note |
|--------|-------------|------|
| `wasExecutedWithArgs(containerID, path, args)` | Equivalent to `wasExecuted(containerID, path)` — the `args` list is validated for type correctness but is **not** matched against the recorded argument list. Any execution of the given path returns `true` regardless of its arguments. | Full per-argument matching (`ExecArgsByPath`) will be added in a future version. |
### CEL Helper Limitations (v1)
| Helper | v1 Behaviour | Note |
|--------|-------------|------|
| `wasExecutedWithArgs(containerID, path, args)` | Matches `path` **and** execution arguments against the projected profile. Argument matching supports anchored literal args and wildcard tokens used by the matcher. | Prefer this helper when argv constraints matter; use `wasExecuted` for path-only checks. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 347 - 352, The README entry for
"wasExecutedWithArgs(containerID, path, args)" is stale: update the v1 Behaviour
cell to state that it matches both path and arguments (including wildcard-aware
matching) instead of saying args are ignored, and remove or replace the
incorrect note claiming per-argument matching (ExecArgsByPath) will be added in
a future version; ensure the table text explicitly references
wasExecutedWithArgs and ExecArgsByPath so readers know current argument-matching
behavior is supported in v1.

Comment thread tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml
Comment thread tests/resources/network-wildcards/README.md
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.215 0.199 -7.1%
Peak CPU (cores) 0.226 0.212 -6.4%
Avg Memory (MiB) 315.837 276.587 -12.4%
Peak Memory (MiB) 317.051 278.277 -12.2%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 2 0 0.0%
hardlink 6000 0 0.0%
http 1764 119396 98.5%
network 907 77993 98.8%
open 31492 620260 95.2%
symlink 6000 0 0.0%
syscall 989 1887 65.6%
Event Counters
Metric BEFORE AFTER
capability_counter 9 9
dns_counter 1450 1426
exec_counter 7252 7132
network_counter 95379 93807
open_counter 789184 777139
syscall_counter 3666 3476

Entlein added 5 commits May 15, 2026 19:19
The CRD's profileDataRequired schema was dropped during the
upstream-rebase merge. Without this field permitted, K8s rejects any
rule manifest that declares profileDataRequired, which silently
disables the rule-aware projection feature: every rule's per-field
requirement is lost and the runtime falls back to pass-through.

Restores the field exactly as defined upstream so subsequent rule-YAML
restorations are accepted by the API server.
ConfigMap was missing the profileProjection JSON object that wires
the runtime knobs (detailedMetricsEnabled, strictValidation) through
to pkg/config/config.go. values.yaml was missing the corresponding
defaults under nodeAgent.config.profileProjection.

Restores both so the projection feature receives its intended
configuration at startup rather than zero-value fallbacks.
Each rule's per-field requirement declares which slice of the
application profile / network neighborhood must be projected into the
runtime cache for that rule. With no declarations, the CompileSpec
loop in pkg/objectcache/containerprofilecache/projection_compile.go
unions to an empty RuleProjectionSpec; the pass-through fallback in
projection_apply.go covers most cases but breaks the moment any
rule declares a requirement.

Restores declarations for: R0001 execs, R0002 opens (sensitive
prefixes + dockerenv/environ exacts), R0003 syscalls all, R0004
capabilities all, R0005 egressDomains all, R0006 opens suffix /token,
R0007 egressAddresses+execs, R0008 opens suffix /environ, R0009
syscalls exact bpf, R0010 opens prefix /etc/shadow, R0011
egressAddresses all, R1001 execs all, R1003 egressAddresses all,
R1004 execs all, R1006 syscalls exact unshare, R1009 egressAddresses
all, R1010 opens prefix /etc/shadow + /etc/sudoers, R1011 opens
exact /etc/ld.so.preload, R1012 opens prefix /etc/shadow +
/etc/sudoers, R1030 syscalls all.
state.includePrefixes and state.ports declare event-shape filters
that constrain WHICH events the rule even sees, independent of the
profile-projection slices. Without them the rules evaluate against
many more events than intended:

  R0006 (SA token access): includePrefixes /run/secrets +
        /var/run/secrets — restricts to the K8s SA mount paths
  R0008 (env vars from procfs): includePrefixes /proc — restricts to
        proc-fs paths
  R1009 (crypto port comms): ports 3333, 45700 — restricts to the
        well-known crypto-miner port pair

Restores the filters at the same position upstream uses (between
ruleExpression and profileDependency).
…(Optional)

The merge against upstream/main flipped R1006 (Process tries to
escape container) from profileDependency: 1 to 2. profileDependency
2 means the rule fires independent of profile state; 1 means it
fires when profile is partial OR complete. The rule semantically
needs the profile (was_syscall_used queries it), so upstream's 1
is correct — 2 produces noise during pod startup before the AP
has loaded.

Reverts to match upstream R1006 declaration.
@entlein
Copy link
Copy Markdown
Author

entlein commented May 15, 2026

@coderabbitai review

Adds 5 chart-audit fixes that restore upstream's profile-projection feature surface (lost during the merge):

  • 1b8f4b8 restore profileDataRequired field in rules CRD schema
  • 6371181 restore profileProjection runtime config block (configmap + values.yaml)
  • 8a63e3c restore profileDataRequired on all 20 rules (matches upstream byte-for-byte)
  • 007d760 restore state-block filters on R0006, R0008, R1009
  • 2e2077b revert R1006 profileDependency from 2 (NotRequired) to 1 (Optional)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the 5 chart-audit fix commits
git log --oneline -10

echo "=== CRD schema change ==="
git show 1b8f4b84 --stat

echo "=== profileProjection config block ==="
git show 63711813 --stat

echo "=== profileDataRequired on rules ==="
git show 8a63e3c1 --stat

echo "=== state-block filters ==="
git show 007d760d --stat

echo "=== R1006 profileDependency revert ==="
git show 2e2077b9 --stat

Length of output: 5728


🏁 Script executed:

#!/bin/bash
# Check current state of key files affected by the chart-audit fixes

echo "=== Rules CRD schema (profileDataRequired field) ==="
fd -e yaml -e json "runtimerulealertbinding\|rulesets\|rules" tests/chart/crds/ --exec grep -l "profileDataRequired" {} 2>/dev/null || echo "not found, checking all crds..."
fd -e yaml -e json . tests/chart/crds/ 2>/dev/null | head -20

echo "=== default-rules.yaml profileDataRequired presence ==="
grep -c "profileDataRequired" tests/chart/templates/node-agent/default-rules.yaml 2>/dev/null || echo "not found"

echo "=== configmap profileProjection ==="
fd -e yaml . tests/chart/templates/ --exec grep -l "profileProjection" {} 2>/dev/null

echo "=== values.yaml profileProjection ==="
grep -A5 "profileProjection" tests/chart/values.yaml 2>/dev/null || echo "not found in values.yaml"

echo "=== R1006 profileDependency value ==="
grep -A3 "R1006\|profileDependency" tests/chart/templates/node-agent/default-rules.yaml 2>/dev/null | head -30

Length of output: 1072


🏁 Script executed:

#!/bin/bash
# Verify R1006 profileDependency specifically
echo "=== R1006 block in default-rules.yaml ==="
grep -n "R1006" tests/chart/templates/node-agent/default-rules.yaml -A 15 | head -30

echo ""
echo "=== configmap.yaml profileProjection wiring ==="
grep -n "profileProjection\|detailedMetrics\|strictValidation" tests/chart/templates/node-agent/configmap.yaml

echo ""
echo "=== CRD profileDataRequired schema snippet ==="
grep -n "profileDataRequired" tests/chart/crds/rules.crd.yaml -A 5

echo ""
echo "=== state-block filters on R0006/R0008/R1009 ==="
for rule in R0006 R0008 R1009; do
  echo "--- $rule ---"
  grep -n "$rule" tests/chart/templates/node-agent/default-rules.yaml -A 20 | grep -E "includePrefixes|ports|state|$rule" | head -10
done

Length of output: 1601


@entlein: All 5 chart-audit fixes verified and look correct. Here's a quick summary:

Commit File(s) Status
1b8f4b8 CRD schema rules.crd.yaml line 78 — profileDataRequired: type: object, x-kubernetes-preserve-unknown-fields: true
6371181 configmap + values configmap.yaml line 40 wires {{- .Values.nodeAgent.config.profileProjection | toJson }}; values.yaml has detailedMetricsEnabled: true, strictValidation: false
8a63e3c 20 rules declarations default-rules.yaml — exactly 20 profileDataRequired entries, one per rule
007d760 state-block filters R0006 state.includePrefixes (line 214), R0008 state.includePrefixes (line 272), R1009 state.ports (line 585)
2e2077b R1006 revert profileDependency: 1 (line 515) — correctly set to Optional, not NotRequired

One thing worth noting: the CRD uses x-kubernetes-preserve-unknown-fields: true for profileDataRequired rather than a typed sub-schema. This is pragmatic (avoids schema churn as the field evolves), but means the API server won't validate the inner structure at admission time — a follow-up to add a typed sub-schema may be worth tracking.

Triggering a full review now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go (1)

213-250: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add at least one negative was_address_port_protocol_in_* case for absent address.

After the v1 degradation updates, this helper no longer has a false assertion path in this suite. Please keep one case where the address is not in profile to prevent an accidental “always true” regression.

Suggested test addition
+		{
+			name:           "Check absent egress address with port/protocol",
+			expression:     `nn.was_address_port_protocol_in_egress(containerID, "192.168.1.200", 80, "TCP")`,
+			expectedResult: false,
+		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go` around
lines 213 - 250, Add at least one negative test case that exercises
nn.was_address_port_protocol_in_egress or
nn.was_address_port_protocol_in_ingress with an address that is not present in
the profile and assert expectedResult: false; specifically, add a test entry
(alongside the existing table of test cases) that uses containerID and a clearly
absent IP (e.g., "203.0.113.5") with some port/protocol and sets expectedResult
to false so the suite contains a failing-path for the post-v1 degradation
behavior of
was_address_port_protocol_in_egress/was_address_port_protocol_in_ingress.
tests/chart/templates/node-agent/default-rules.yaml (1)

52-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

R0040 needs profileDataRequired.execs.

This rule calls both ap.was_executed(...) and ap.was_executed_with_args(...), but unlike the other exec-based rules it never declares the execs surface. Under rule-aware projection that leaves the helper without the data it needs, so the new rule can silently stop matching.

Suggested fix
       - name: "Unexpected process arguments"
         enabled: true
         id: "R0040"
         description: "Process path is allowed by profile but argument vector does not match any profile entry's arg pattern (literal or wildcard ⋯/*)"
@@
         profileDependency: 0
+        profileDataRequired:
+          execs: all
         severity: 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/chart/templates/node-agent/default-rules.yaml` around lines 52 - 75,
The new rule R0040 uses ap.was_executed(...) and ap.was_executed_with_args(...)
but does not declare the exec surface, so add profileDataRequired.execs to this
rule's metadata; update the rule block for "R0040" to include
profileDataRequired: { execs: true } (or the equivalent schema used elsewhere)
so the ap helper receives exec data when rule-aware projection runs and the
expressions ap.was_executed and ap.was_executed_with_args can operate correctly.
pkg/rulemanager/cel/libraries/applicationprofile/ap.go (1)

51-305: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep ap.was_path_opened_with_flags as a compatibility overload.

Declarations() no longer exports this helper. Any existing custom CEL rule that still calls ap.was_path_opened_with_flags will now fail compilation after upgrade, which is a hard break in the public rule surface. Unless every consumer was migrated in lockstep, keep a shim overload here instead of removing it outright.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go` around lines 51 -
305, Declarations() removed the ap.was_path_opened_with_flags overload which
breaks compatibility; re-add a shim overload named
"ap.was_path_opened_with_flags" in the returned map following the same pattern
as the other helpers: register a cel.Overload (mirror argument types used
previously), call l.metrics.IncHelperCall when l.detailedMetrics && l.metrics !=
nil, wrap the real implementation via wrapperFunc that calls
l.wasPathOpenedWithFlags (the compatibility shim) and pass it through
l.functionCache.WithCache with the same
cache.HashForContainerProfile(l.objectCache) keying, then return
cache.ConvertProfileNotAvailableErrToBool(result, false) so the "profile not
available" error is converted to false after caching.
♻️ Duplicate comments (6)
README.md (1)

347-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale wasExecutedWithArgs behavior documentation.

Line 351 contradicts current behavior: wasExecutedWithArgs now matches both path and args (via projected exec args), not path-only.

Suggested doc fix
-| `wasExecutedWithArgs(containerID, path, args)` | Equivalent to `wasExecuted(containerID, path)` — the `args` list is validated for type correctness but is **not** matched against the recorded argument list. Any execution of the given path returns `true` regardless of its arguments. | Full per-argument matching (`ExecArgsByPath`) will be added in a future version. |
+| `wasExecutedWithArgs(containerID, path, args)` | Matches `path` and execution arguments against the projected profile (`ExecsByPath`). | Use this when argv constraints matter; use `wasExecuted` for path-only checks. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 347 - 352, Update the CEL helper table entry for
wasExecutedWithArgs to reflect current behavior: change the v1 Behaviour to
state that wasExecutedWithArgs(containerID, path, args) matches both path and
args (it validates and compares projected exec args against the recorded
argument list) instead of implying it only matches path; update the Note to
remove the “will be added” remark and reference ExecArgsByPath only if still
relevant (or delete the future-work sentence), and ensure the table text
mentions that per-argument matching is implemented via projected exec args
comparison.
tests/resources/network-wildcards/README.md (1)

74-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix acronym typo: SBoBsSBOMs.

Line 74 has a typo that breaks terminology consistency.

Suggested fix
-Producers writing v0.0.2-conformant SBoBs SHOULD use `ipAddresses` (plural).
+Producers writing v0.0.2-conformant SBOMs SHOULD use `ipAddresses` (plural).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/resources/network-wildcards/README.md` at line 74, The README contains
a typographical error: the acronym "SBoBs" should be "SBOMs"; update the text
fragment "Producers writing v0.0.2-conformant SBoBs SHOULD use `ipAddresses`
(plural)." to read "Producers writing v0.0.2-conformant SBOMs SHOULD use
`ipAddresses` (plural)." so the terminology is consistent (locate the exact
string "SBoBs" in the README and replace it with "SBOMs").
pkg/rulemanager/profilehelper/profilehelper.go (1)

12-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard objectCache against nil before dereference.

Line 13 dereferences objectCache unconditionally; nil input can panic instead of returning an error.

Suggested fix
 func GetProjectedContainerProfile(objectCache objectcache.ObjectCache, containerID string) (*objectcache.ProjectedContainerProfile, string, error) {
+	if objectCache == nil {
+		return nil, "", errors.New("object cache is nil")
+	}
 	cpc := objectCache.ContainerProfileCache()
 	if cpc == nil {
 		return nil, "", errors.New("no container profile cache available")
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/profilehelper/profilehelper.go` around lines 12 - 16,
GetProjectedContainerProfile currently dereferences objectCache without checking
for nil; add an initial nil guard that returns (nil, "", error) when objectCache
is nil before calling objectCache.ContainerProfileCache(), so the function
safely handles a nil input and avoids a panic. Identify the function
GetProjectedContainerProfile and insert the check at its start, returning an
appropriate error message (e.g., "nil objectCache") if objectCache == nil, then
proceed to call ContainerProfileCache() as before.
pkg/rulemanager/cel/libraries/cache/function_cache.go (1)

115-117: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against nil callbacks in extraKeyFn before invocation.

A nil function in the extraKeyFn slice will panic when called at line 116. This was flagged in a previous review but remains unaddressed.

🛡️ Proposed fix
 		for _, fn := range extraKeyFn {
+			if fn == nil {
+				continue
+			}
 			key += "|" + fn(values)
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/cache/function_cache.go` around lines 115 -
117, The loop that builds the cache key calls functions from the extraKeyFn
slice without nil-checks, which can panic; update the iteration over extraKeyFn
(the loop using "for _, fn := range extraKeyFn" that appends to key using
fn(values)) to guard against nil callbacks by checking "if fn == nil" and
skipping (or handling) nil entries before invoking fn(values), leaving key and
values behavior unchanged otherwise.
pkg/rulemanager/cel/libraries/parse/parse.go (1)

60-82: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

argv[0] spoofing risk remains: attacker-controlled absolute argv[0] is trusted over kernel-resolved exepath.

The comment at lines 49-52 claims "the kernel only exposes an absolute argv[0] when execve was called with that exact path" — this is incorrect. The kernel does not validate argv[0]; it's entirely process-controlled and can be set to any string by the caller of execve().

An attacker can execute /usr/bin/malware with argv[0]="/bin/sh" (an absolute path), and this code will return "/bin/sh", causing the rule to match against the wrong profile entry.

The symlink-faithful rationale for busybox is valid, but it conflicts with security. Consider:

  1. Trusting exepath first (kernel-authoritative) and accepting the busybox mismatch as a false positive, or
  2. Only trusting argv[0] when it matches exepath's basename or is a known symlink resolution.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/parse/parse.go` around lines 60 - 82, The
current getExecPathWithExePath trusts an attacker-controlled absolute argv[0]
over the kernel-provided exepath (exepath param), which allows spoofing; change
the logic in getExecPathWithExePath so that when exepath (exepathStr) is
non-empty it is returned first (kernel-authoritative), and only accept argv[0]
as the exec path if either exepathStr is empty or argv[0] is explicitly
validated to match exepath (e.g., compare path.Base(argv0) ==
path.Base(exepathStr) or resolve symlinks/real paths before trusting argv0);
update callers (getExecPath and getExecPathWithExePath) to follow this
ordering/validation and ensure error handling still uses
types.MaybeNoSuchOverloadErr(exepath) when exepath is not a string.
pkg/rulemanager/cel/libraries/applicationprofile/http.go (1)

74-84: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Method-aware endpoint helpers still ignore HTTP methods.

Both helpers validate the method/methods argument and then return true on path match alone. That widens ap.was_endpoint_accessed_with_method(s) into a plain path check, so a profile that only observed GET /foo will also allow POST /foo. Either project methods per path or fail closed when method data is unavailable.

Also applies to: 113-123

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@go.mod`:
- Around line 70-72: The k8s.io/* modules are version-skewed: update the go.mod
entry for k8s.io/kubectl to v0.35.3 so it matches k8s.io/api,
k8s.io/apimachinery, and k8s.io/client-go (ensure the module string
"k8s.io/kubectl" is changed to version v0.35.3 and run `go mod tidy` / `go mod
download` to validate and regenerate the go.sum).

In `@pkg/objectcache/containerprofilecache/projection_apply.go`:
- Around line 40-44: The current shallow copy of cp.Spec.PolicyByRuleId into
pcp.PolicyByRuleId using maps.Copy allows RulePolicy.AllowedProcesses (a slice)
to remain shared and violates Apply's purity; instead, allocate
pcp.PolicyByRuleId and for each entry in cp.Spec.PolicyByRuleId copy the
RulePolicy value into a new variable, deep-copy its AllowedProcesses slice (make
a new slice and copy elements or use append to a nil slice) and store that new
RulePolicy in pcp.PolicyByRuleId so no internal slice references point back to
cp.Spec.PolicyByRuleId.

In `@pkg/objectcache/containerprofilecache/projection_trie.go`:
- Around line 5-10: The top-level comment for type trie incorrectly states it is
a byte-level trie; update the comment to state it is rune-keyed (using children
map[rune]*trie), explain it operates on Unicode runes rather than raw bytes, and
adjust complexity wording to reflect O(m) where m is the number of runes in the
query; keep the note that it’s used by FieldSpec for prefix and suffix
(reversed-insertion) matching and that terminal marks pattern ends.

In `@pkg/objectcache/v1/mock.go`:
- Around line 187-193: The mock currently forces pcp.Endpoints.All = true
whenever spec.Endpoints.InUse or !specInstalled, which misrepresents selective
projections; change the logic in the block that references specInstalled,
spec.Endpoints.InUse, cp.Spec.Endpoints so that you do NOT set pcp.Endpoints.All
= true unconditionally—instead set pcp.Endpoints.All = false and populate
pcp.Endpoints.Values from cp.Spec.Endpoints (pcp.Endpoints.Values[e.Endpoint] =
struct{}{}) to mirror selective exact/prefix/suffix behavior, and only set All =
true in the mock when the source projection explicitly indicates an "all"
projection (leave All untouched otherwise).
- Around line 167-176: The mock currently adds an explicit empty slice for
unconstrained execs, breaking the "absent means no argv constraint" contract;
update the loop that builds pcp.ExecsByPath so it only inserts an entry when
e.Args is non-nil (i.e., remove the branch that sets pcp.ExecsByPath[e.Path] =
[]string{} and instead skip adding any map key when e.Args == nil), keeping the
existing allocation of pcp.ExecsByPath and otherwise assigning
pcp.ExecsByPath[e.Path] = e.Args for constrained execs.

In `@pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go`:
- Around line 88-90: The test removed coverage for flag-aware checking; restore
it by adding a new integration test case alongside the existing path-only case
that calls ap.was_path_opened_with_flags(containerID, "/etc/passwd", O_RDONLY /*
or appropriate flag constant */) and asserts expectedResult true (or false as
appropriate), leaving the existing ap.was_path_opened(containerID,
"/etc/passwd") test intact; locate the test table in integration_test.go and add
an entry using the unique symbol ap.was_path_opened_with_flags to ensure both
behaviors are covered.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/network.go`:
- Around line 164-181: The helper currently ignores port/protocol projection and
always falls back to matchIPField, widening matches; change
was_address_port_protocol_in_* to detect whether AddressPortsByAddr is projected
and return a conservative "profile not available" result when it isn't.
Specifically, after calling profilehelper.GetProjectedContainerProfile (cp,
proj, err := ...), if err != nil return cache.NewProfileNotAvailableErr as
before, but if the projection/structure indicating AddressPortsByAddr is missing
(e.g., proj indicates projection missing or
cp.EgressAddresses.AddressPortsByAddr is nil/not populated) then return
cache.NewProfileNotAvailableErr(...) (or another conservative miss used
elsewhere) instead of calling matchIPField; only call matchIPField when the
address-only projection is valid and AddressPortsByAddr is intentionally absent.
Ensure you reference the symbols GetProjectedContainerProfile,
cp.EgressAddresses, AddressPortsByAddr, was_address_port_protocol_in_* and
matchIPField when implementing the check.

In `@tests/resources/network-wildcards/README.md`:
- Around line 14-16: Update the README text so the rejected example matches the
actual fixture content: replace the inline snippet `dnsNames: ["**"]` with the
fixture's value `dnsNames: ["**.example.com."]` (or otherwise mention
`**.example.com.`) so the README and the fixture are consistent; verify the
sentence that follows still reads correctly with the updated token example.

---

Outside diff comments:
In `@pkg/rulemanager/cel/libraries/applicationprofile/ap.go`:
- Around line 51-305: Declarations() removed the ap.was_path_opened_with_flags
overload which breaks compatibility; re-add a shim overload named
"ap.was_path_opened_with_flags" in the returned map following the same pattern
as the other helpers: register a cel.Overload (mirror argument types used
previously), call l.metrics.IncHelperCall when l.detailedMetrics && l.metrics !=
nil, wrap the real implementation via wrapperFunc that calls
l.wasPathOpenedWithFlags (the compatibility shim) and pass it through
l.functionCache.WithCache with the same
cache.HashForContainerProfile(l.objectCache) keying, then return
cache.ConvertProfileNotAvailableErrToBool(result, false) so the "profile not
available" error is converted to false after caching.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go`:
- Around line 213-250: Add at least one negative test case that exercises
nn.was_address_port_protocol_in_egress or
nn.was_address_port_protocol_in_ingress with an address that is not present in
the profile and assert expectedResult: false; specifically, add a test entry
(alongside the existing table of test cases) that uses containerID and a clearly
absent IP (e.g., "203.0.113.5") with some port/protocol and sets expectedResult
to false so the suite contains a failing-path for the post-v1 degradation
behavior of
was_address_port_protocol_in_egress/was_address_port_protocol_in_ingress.

In `@tests/chart/templates/node-agent/default-rules.yaml`:
- Around line 52-75: The new rule R0040 uses ap.was_executed(...) and
ap.was_executed_with_args(...) but does not declare the exec surface, so add
profileDataRequired.execs to this rule's metadata; update the rule block for
"R0040" to include profileDataRequired: { execs: true } (or the equivalent
schema used elsewhere) so the ap helper receives exec data when rule-aware
projection runs and the expressions ap.was_executed and
ap.was_executed_with_args can operate correctly.

---

Duplicate comments:
In `@pkg/rulemanager/cel/libraries/cache/function_cache.go`:
- Around line 115-117: The loop that builds the cache key calls functions from
the extraKeyFn slice without nil-checks, which can panic; update the iteration
over extraKeyFn (the loop using "for _, fn := range extraKeyFn" that appends to
key using fn(values)) to guard against nil callbacks by checking "if fn == nil"
and skipping (or handling) nil entries before invoking fn(values), leaving key
and values behavior unchanged otherwise.

In `@pkg/rulemanager/cel/libraries/parse/parse.go`:
- Around line 60-82: The current getExecPathWithExePath trusts an
attacker-controlled absolute argv[0] over the kernel-provided exepath (exepath
param), which allows spoofing; change the logic in getExecPathWithExePath so
that when exepath (exepathStr) is non-empty it is returned first
(kernel-authoritative), and only accept argv[0] as the exec path if either
exepathStr is empty or argv[0] is explicitly validated to match exepath (e.g.,
compare path.Base(argv0) == path.Base(exepathStr) or resolve symlinks/real paths
before trusting argv0); update callers (getExecPath and getExecPathWithExePath)
to follow this ordering/validation and ensure error handling still uses
types.MaybeNoSuchOverloadErr(exepath) when exepath is not a string.

In `@pkg/rulemanager/profilehelper/profilehelper.go`:
- Around line 12-16: GetProjectedContainerProfile currently dereferences
objectCache without checking for nil; add an initial nil guard that returns
(nil, "", error) when objectCache is nil before calling
objectCache.ContainerProfileCache(), so the function safely handles a nil input
and avoids a panic. Identify the function GetProjectedContainerProfile and
insert the check at its start, returning an appropriate error message (e.g.,
"nil objectCache") if objectCache == nil, then proceed to call
ContainerProfileCache() as before.

In `@README.md`:
- Around line 347-352: Update the CEL helper table entry for wasExecutedWithArgs
to reflect current behavior: change the v1 Behaviour to state that
wasExecutedWithArgs(containerID, path, args) matches both path and args (it
validates and compares projected exec args against the recorded argument list)
instead of implying it only matches path; update the Note to remove the “will be
added” remark and reference ExecArgsByPath only if still relevant (or delete the
future-work sentence), and ensure the table text mentions that per-argument
matching is implemented via projected exec args comparison.

In `@tests/resources/network-wildcards/README.md`:
- Line 74: The README contains a typographical error: the acronym "SBoBs" should
be "SBOMs"; update the text fragment "Producers writing v0.0.2-conformant SBoBs
SHOULD use `ipAddresses` (plural)." to read "Producers writing v0.0.2-conformant
SBOMs SHOULD use `ipAddresses` (plural)." so the terminology is consistent
(locate the exact string "SBoBs" in the README and replace it with "SBOMs").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 9c8850bd-d10b-45f8-bde1-807ee83675a3

📥 Commits

Reviewing files that changed from the base of the PR and between a3fdc33 and 2e2077b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (89)
  • .github/workflows/component-tests.yaml
  • .gitignore
  • README.md
  • cmd/main.go
  • go.mod
  • pkg/config/config.go
  • pkg/containerprofilemanager/v1/event_reporting.go
  • pkg/containerprofilemanager/v1/event_reporting_test.go
  • pkg/exporters/alert_manager.go
  • pkg/metricsmanager/metrics_manager_interface.go
  • pkg/metricsmanager/metrics_manager_mock.go
  • pkg/metricsmanager/metrics_manager_noop.go
  • pkg/metricsmanager/prometheus/prometheus.go
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/objectcache/containerprofilecache/containerprofilecache_test.go
  • pkg/objectcache/containerprofilecache/init_eviction_test.go
  • pkg/objectcache/containerprofilecache/lock_stress_test.go
  • pkg/objectcache/containerprofilecache/projection.go
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/projection_apply_test.go
  • pkg/objectcache/containerprofilecache/projection_compile.go
  • pkg/objectcache/containerprofilecache/projection_compile_test.go
  • pkg/objectcache/containerprofilecache/projection_trie.go
  • pkg/objectcache/containerprofilecache/projection_trie_test.go
  • pkg/objectcache/containerprofilecache/reconciler.go
  • pkg/objectcache/containerprofilecache/reconciler_test.go
  • pkg/objectcache/containerprofilecache/shared_pointer_race_test.go
  • pkg/objectcache/containerprofilecache/t8_overlay_refresh_test.go
  • pkg/objectcache/containerprofilecache/tamper_alert_test.go
  • pkg/objectcache/containerprofilecache/test32_projection_test.go
  • pkg/objectcache/containerprofilecache_interface.go
  • pkg/objectcache/projection_types.go
  • pkg/objectcache/v1/mock.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/rulemanager/cel/cel.go
  • pkg/rulemanager/cel/libraries/applicationprofile/ap.go
  • pkg/rulemanager/cel/libraries/applicationprofile/capability.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec.go
  • pkg/rulemanager/cel/libraries/applicationprofile/exec_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/http.go
  • pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open.go
  • pkg/rulemanager/cel/libraries/applicationprofile/open_test.go
  • pkg/rulemanager/cel/libraries/applicationprofile/syscall.go
  • pkg/rulemanager/cel/libraries/cache/function_cache.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/fixtures_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/integration_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/nn.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/wildcard_test.go
  • pkg/rulemanager/cel/libraries/parse/parse.go
  • pkg/rulemanager/cel/libraries/parse/parselib.go
  • pkg/rulemanager/cel/libraries/parse/parsing_test.go
  • pkg/rulemanager/profilehelper/profilehelper.go
  • pkg/rulemanager/rule_manager.go
  • pkg/rulemanager/rulepolicy.go
  • pkg/rulemanager/types/v1/profiledata.go
  • pkg/rulemanager/types/v1/profiledata_test.go
  • pkg/rulemanager/types/v1/types.go
  • tests/chart/crds/rules.crd.yaml
  • tests/chart/templates/node-agent/configmap.yaml
  • tests/chart/templates/node-agent/default-rule-binding.yaml
  • tests/chart/templates/node-agent/default-rules.yaml
  • tests/chart/values.yaml
  • tests/component_test.go
  • tests/resources/malicious-job.yaml
  • tests/resources/network-wildcards/01-literal-ipv4.yaml
  • tests/resources/network-wildcards/02-literal-ipv6.yaml
  • tests/resources/network-wildcards/03-cidr-ipv4.yaml
  • tests/resources/network-wildcards/04-cidr-ipv6.yaml
  • tests/resources/network-wildcards/05-any-ip-sentinel.yaml
  • tests/resources/network-wildcards/06-any-as-cidr.yaml
  • tests/resources/network-wildcards/07-mixed-ip-list.yaml
  • tests/resources/network-wildcards/08-deprecated-ipaddress.yaml
  • tests/resources/network-wildcards/09-dns-literal.yaml
  • tests/resources/network-wildcards/10-dns-leading-wildcard.yaml
  • tests/resources/network-wildcards/11-dns-mid-ellipsis.yaml
  • tests/resources/network-wildcards/12-dns-trailing-star.yaml
  • tests/resources/network-wildcards/13-dns-trailing-dot-normalisation.yaml
  • tests/resources/network-wildcards/14-recursive-star-rejected.yaml
  • tests/resources/network-wildcards/15-egress-and-ingress.yaml
  • tests/resources/network-wildcards/16-egress-none.yaml
  • tests/resources/network-wildcards/17-realistic-stripe-api.yaml
  • tests/resources/network-wildcards/18-cluster-dns-via-mid-ellipsis.yaml
  • tests/resources/network-wildcards/19-port-protocol-with-cidr.yaml
  • tests/resources/network-wildcards/20-multi-container-mixed-wildcards.yaml
  • tests/resources/network-wildcards/README.md

Comment thread go.mod
Comment on lines +70 to +72
k8s.io/api v0.35.3
k8s.io/apimachinery v0.35.3
k8s.io/client-go v0.35.3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

For Go Kubernetes modules, is it supported to mix k8s.io/api/apimachinery/client-go v0.35.x with k8s.io/kubectl v0.34.1? What is the official compatibility guidance?

💡 Result:

Mixing k8s.io/client-go v0.35.x with k8s.io/kubectl v0.34.1 in a single Go module is not recommended and can lead to dependency resolution conflicts [1][2]. Official Kubernetes guidance requires that all k8s.io/* modules (such as client-go, apimachinery, api, and kubectl) within a single project be kept at the same minor version [1][3]. Kubernetes libraries are designed to be used as a cohesive set, and mixing versions often results in build errors, runtime panics, or "module provides package" errors due to the way these packages share internal dependencies and type definitions [4][2][3]. Key points regarding compatibility: 1. Version Consistency: You should align all k8s.io dependencies to the same minor version (e.g., all at v0.34.x or all at v0.35.x) to ensure they are compatible with each other [3]. 2. Version Skew Policy: While the Kubernetes Version Skew Policy allows for a difference of one minor version between a compiled kubectl binary and a running kube-apiserver [5][6], this policy applies to the operational deployment of components, not to the Go module dependency management of a single application [1]. 3. Dependency Management: Because k8s.io/client-go and other components are often mirrored from the main Kubernetes repository, they share common underlying code [7][8][9]. Using mismatched versions in your go.mod file forces the Go module resolver to attempt to reconcile conflicting requirements, which is frequently unsupported and prone to failure [1][2]. To resolve this, update your go.mod file so that all k8s.io/ dependencies match the same minor version [3]. If you are targeting a specific Kubernetes cluster version, it is best practice to use the corresponding v0.x.y version for all Kubernetes modules [1][7].

Citations:


Align all k8s.io/ modules to the same minor version.*

The modules k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go are at v0.35.3, but k8s.io/kubectl remains at v0.34.1. This version skew is not supported by Kubernetes and will cause build errors, runtime panics, or module resolution failures. Update k8s.io/kubectl to v0.35.3 to match the other k8s.io dependencies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go.mod` around lines 70 - 72, The k8s.io/* modules are version-skewed: update
the go.mod entry for k8s.io/kubectl to v0.35.3 so it matches k8s.io/api,
k8s.io/apimachinery, and k8s.io/client-go (ensure the module string
"k8s.io/kubectl" is changed to version v0.35.3 and run `go mod tidy` / `go mod
download` to validate and regenerate the go.sum).

Comment on lines +40 to +44
// Shallow copy PolicyByRuleId — values are value-typed structs.
if len(cp.Spec.PolicyByRuleId) > 0 {
pcp.PolicyByRuleId = make(map[string]v1beta1.RulePolicy, len(cp.Spec.PolicyByRuleId))
maps.Copy(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Shallow-copy site in Apply:"
rg -n -C3 "maps.Copy\\(pcp.PolicyByRuleId, cp.Spec.PolicyByRuleId\\)" pkg/objectcache/containerprofilecache/projection_apply.go

echo
echo "RulePolicy definition (look for [] or map[ fields):"
rg -n -A80 -B2 "type RulePolicy struct"

echo
echo "DeepCopy methods for RulePolicy (if generated):"
rg -n -A20 -B2 "func \\(in \\*RulePolicy\\) DeepCopyInto|func \\(in \\*RulePolicy\\) DeepCopy\\("

Repository: k8sstormcenter/node-agent

Length of output: 454


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where v1beta1.RulePolicy is imported from
echo "=== Import statements in projection_apply.go ==="
head -30 pkg/objectcache/containerprofilecache/projection_apply.go | grep -E "import|v1beta1"

echo
echo "=== All files in v1beta1 or similar locations ==="
find . -type f -name "*.go" | grep -i v1beta1 | head -20

echo
echo "=== Search for RulePolicy definition across codebase ==="
rg -n "type RulePolicy struct" --type go

echo
echo "=== Alternative: search for RulePolicy in API definitions ==="
rg -n "RulePolicy" --type go -A 3 | head -100

Repository: k8sstormcenter/node-agent

Length of output: 418


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== All RulePolicy usages in codebase ==="
rg -n "RulePolicy" --type go | head -50

echo
echo "=== Check for vendor directory or go.mod ==="
ls -la go.mod go.sum 2>/dev/null || echo "go.mod/go.sum not found"

echo
echo "=== Look for DeepCopy patterns in the codebase ==="
rg -n "DeepCopy|DeepCopyInto" --type go | grep -i "rulepolicy\|policybyruleid" | head -20

echo
echo "=== Check projection_apply.go for full context around PolicyByRuleId ==="
cat -n pkg/objectcache/containerprofilecache/projection_apply.go | head -60

Repository: k8sstormcenter/node-agent

Length of output: 6305


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== MergePolicies function implementation ==="
cat -n pkg/utils/rulepolicy.go

echo
echo "=== Check projection_apply_test.go for RulePolicy test cases ==="
cat -n pkg/objectcache/containerprofilecache/projection_apply_test.go | grep -A 30 "PolicyByRuleId"

echo
echo "=== Check if Apply function has documented purity contract ==="
cat -n pkg/objectcache/containerprofilecache/projection_apply.go | head -40

Repository: k8sstormcenter/node-agent

Length of output: 4814


Deep-copy PolicyByRuleId values to preserve Apply purity.

The Apply function is documented as a "pure function: no I/O, no mutation of inputs." However, the shallow copy via maps.Copy creates a safety violation. RulePolicy contains a slice field (AllowedProcesses []string), which is a reference type. When maps.Copy copies the struct values, the slices within those structs still point to the same underlying array as the source. Any mutation of the projected RulePolicy.AllowedProcesses slice would mutate the original ContainerProfile, violating the purity contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/objectcache/containerprofilecache/projection_apply.go` around lines 40 -
44, The current shallow copy of cp.Spec.PolicyByRuleId into pcp.PolicyByRuleId
using maps.Copy allows RulePolicy.AllowedProcesses (a slice) to remain shared
and violates Apply's purity; instead, allocate pcp.PolicyByRuleId and for each
entry in cp.Spec.PolicyByRuleId copy the RulePolicy value into a new variable,
deep-copy its AllowedProcesses slice (make a new slice and copy elements or use
append to a nil slice) and store that new RulePolicy in pcp.PolicyByRuleId so no
internal slice references point back to cp.Spec.PolicyByRuleId.

Comment on lines +5 to +10
// trie implements a simple byte-level prefix trie for O(n) prefix matching
// where n is the length of the query string. Used by FieldSpec for prefix and
// suffix (reversed-insertion) matching.
type trie struct {
children map[rune]*trie
terminal bool // true if this node marks the end of an inserted pattern
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align trie comment with implementation (rune-keyed, not byte-level).

Line 5 currently documents a byte-level trie, but the implementation is rune-keyed (map[rune]*trie).

Suggested wording update
-// trie implements a simple byte-level prefix trie for O(n) prefix matching
+// trie implements a simple rune-level prefix trie for O(n) prefix matching
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// trie implements a simple byte-level prefix trie for O(n) prefix matching
// where n is the length of the query string. Used by FieldSpec for prefix and
// suffix (reversed-insertion) matching.
type trie struct {
children map[rune]*trie
terminal bool // true if this node marks the end of an inserted pattern
// trie implements a simple rune-level prefix trie for O(n) prefix matching
// where n is the length of the query string. Used by FieldSpec for prefix and
// suffix (reversed-insertion) matching.
type trie struct {
children map[rune]*trie
terminal bool // true if this node marks the end of an inserted pattern
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/objectcache/containerprofilecache/projection_trie.go` around lines 5 -
10, The top-level comment for type trie incorrectly states it is a byte-level
trie; update the comment to state it is rune-keyed (using children
map[rune]*trie), explain it operates on Unicode runes rather than raw bytes, and
adjust complexity wording to reflect O(m) where m is the number of runes in the
query; keep the note that it’s used by FieldSpec for prefix and suffix
(reversed-insertion) matching and that terminal marks pattern ends.

Comment on lines +167 to +176
// ExecsByPath: carry per-path Args so the exec-args wildcard matcher
// (was_executed_with_args / CompareExecArgs) keeps working.
pcp.ExecsByPath = make(map[string][]string, len(cp.Spec.Execs))
for _, e := range cp.Spec.Execs {
if e.Args == nil {
pcp.ExecsByPath[e.Path] = []string{}
continue
}
pcp.ExecsByPath[e.Path] = e.Args
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the “absent means no argv constraint” contract.

When e.Args == nil, this mock writes ExecsByPath[e.Path] = []string{}. The restored matcher contract is the opposite: no map entry means “match any argv”. With the current code, unconstrained execs become “must have zero args” in tests.

Suggested fix
 		pcp.ExecsByPath = make(map[string][]string, len(cp.Spec.Execs))
 		for _, e := range cp.Spec.Execs {
-			if e.Args == nil {
-				pcp.ExecsByPath[e.Path] = []string{}
-				continue
-			}
-			pcp.ExecsByPath[e.Path] = e.Args
+			if e.Args != nil {
+				pcp.ExecsByPath[e.Path] = e.Args
+			}
 		}
+		if len(pcp.ExecsByPath) == 0 {
+			pcp.ExecsByPath = nil
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// ExecsByPath: carry per-path Args so the exec-args wildcard matcher
// (was_executed_with_args / CompareExecArgs) keeps working.
pcp.ExecsByPath = make(map[string][]string, len(cp.Spec.Execs))
for _, e := range cp.Spec.Execs {
if e.Args == nil {
pcp.ExecsByPath[e.Path] = []string{}
continue
}
pcp.ExecsByPath[e.Path] = e.Args
}
// ExecsByPath: carry per-path Args so the exec-args wildcard matcher
// (was_executed_with_args / CompareExecArgs) keeps working.
pcp.ExecsByPath = make(map[string][]string, len(cp.Spec.Execs))
for _, e := range cp.Spec.Execs {
if e.Args != nil {
pcp.ExecsByPath[e.Path] = e.Args
}
}
if len(pcp.ExecsByPath) == 0 {
pcp.ExecsByPath = nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/objectcache/v1/mock.go` around lines 167 - 176, The mock currently adds
an explicit empty slice for unconstrained execs, breaking the "absent means no
argv constraint" contract; update the loop that builds pcp.ExecsByPath so it
only inserts an entry when e.Args is non-nil (i.e., remove the branch that sets
pcp.ExecsByPath[e.Path] = []string{} and instead skip adding any map key when
e.Args == nil), keeping the existing allocation of pcp.ExecsByPath and otherwise
assigning pcp.ExecsByPath[e.Path] = e.Args for constrained execs.

Comment on lines +187 to +193
if (!specInstalled || spec.Endpoints.InUse) && len(cp.Spec.Endpoints) > 0 {
pcp.Endpoints.All = true
pcp.Endpoints.Values = make(map[string]struct{}, len(cp.Spec.Endpoints))
for _, e := range cp.Spec.Endpoints {
pcp.Endpoints.Values[e.Endpoint] = struct{}{}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Mirror selective endpoint projection instead of forcing Endpoints.All=true.

Once spec.Endpoints.InUse is set, the mock currently retains every endpoint and marks the field as All. Production projection only does that for all; selective exact/prefix/suffix rules rely on All=false plus hit maps. With the current mock, tests can bypass undeclared-literal behavior and miss regressions in the HTTP helpers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/objectcache/v1/mock.go` around lines 187 - 193, The mock currently forces
pcp.Endpoints.All = true whenever spec.Endpoints.InUse or !specInstalled, which
misrepresents selective projections; change the logic in the block that
references specInstalled, spec.Endpoints.InUse, cp.Spec.Endpoints so that you do
NOT set pcp.Endpoints.All = true unconditionally—instead set pcp.Endpoints.All =
false and populate pcp.Endpoints.Values from cp.Spec.Endpoints
(pcp.Endpoints.Values[e.Endpoint] = struct{}{}) to mirror selective
exact/prefix/suffix behavior, and only set All = true in the mock when the
source projection explicitly indicates an "all" projection (leave All untouched
otherwise).

Comment on lines 88 to 90
name: "Check file access pattern",
expression: `ap.was_path_opened_with_flags(containerID, "/etc/passwd", ["O_RDONLY"])`,
expression: `ap.was_path_opened(containerID, "/etc/passwd")`,
expectedResult: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Restore flag-aware assertion coverage in this “all functions” integration suite.

This swap reduces integration coverage of ap.was_path_opened_with_flags(...). Keep the path-only check if needed, but add a separate flag-aware case so both behaviors remain covered.

Proposed patch
 		{
 			name:           "Check file access pattern",
 			expression:     `ap.was_path_opened(containerID, "/etc/passwd")`,
 			expectedResult: true,
 		},
+		{
+			name:           "Check file access pattern with flags",
+			expression:     `ap.was_path_opened_with_flags(containerID, "/etc/passwd", ["O_RDONLY"])`,
+			expectedResult: true,
+		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/applicationprofile/integration_test.go` around
lines 88 - 90, The test removed coverage for flag-aware checking; restore it by
adding a new integration test case alongside the existing path-only case that
calls ap.was_path_opened_with_flags(containerID, "/etc/passwd", O_RDONLY /* or
appropriate flag constant */) and asserts expectedResult true (or false as
appropriate), leaving the existing ap.was_path_opened(containerID,
"/etc/passwd") test intact; locate the test table in integration_test.go and add
an entry using the unique symbol ap.was_path_opened_with_flags to ensure both
behaviors are covered.

Comment on lines +164 to +181
// port/protocol projection (AddressPortsByAddr) is out of scope for the
// projection-v1 layer upstream landed; matchers degrade to address-only.
// Wildcards remain enforced via matchIPField.
portInt, ok := port.Value().(int64)
if !ok {
return types.MaybeNoSuchOverloadErr(port)
}
protocolStr, ok := protocol.Value().(string)
if !ok {
if portInt < 0 || portInt > 65535 {
return types.Bool(false)
}
if _, ok := protocol.Value().(string); !ok {
return types.MaybeNoSuchOverloadErr(protocol)
}

cp, _, err := profilehelper.GetContainerProfile(l.objectCache, containerIDStr)
cp, _, err := profilehelper.GetProjectedContainerProfile(l.objectCache, containerIDStr)
if err != nil {
return cache.NewProfileNotAvailableErr("%v", err)
}

for _, egress := range cp.Spec.Egress {
if egress.IPAddress == addressStr {
for _, portInfo := range egress.Ports {
if portInfo.Protocol == v1beta1.Protocol(protocolStr) && portInfo.Port != nil && *portInfo.Port == int32(portInt) {
return types.Bool(true)
}
}
}
}

return types.Bool(false)
return types.Bool(matchIPField(&cp.EgressAddresses, addressStr))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

was_address_port_protocol_in_* now ignores port and protocol.

After the type/range checks, both helpers only call matchIPField(...). A profile entry for TCP/443 therefore also matches UDP/53 as long as the address/CIDR matches, which weakens any rule that relies on the helper name’s 4-tuple semantics. If AddressPortsByAddr is not projected yet, return a conservative miss instead of silently widening the match.

Also applies to: 200-210

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/cel/libraries/networkneighborhood/network.go` around lines
164 - 181, The helper currently ignores port/protocol projection and always
falls back to matchIPField, widening matches; change
was_address_port_protocol_in_* to detect whether AddressPortsByAddr is projected
and return a conservative "profile not available" result when it isn't.
Specifically, after calling profilehelper.GetProjectedContainerProfile (cp,
proj, err := ...), if err != nil return cache.NewProfileNotAvailableErr as
before, but if the projection/structure indicating AddressPortsByAddr is missing
(e.g., proj indicates projection missing or
cp.EgressAddresses.AddressPortsByAddr is nil/not populated) then return
cache.NewProfileNotAvailableErr(...) (or another conservative miss used
elsewhere) instead of calling matchIPField; only call matchIPField when the
address-only projection is valid and AddressPortsByAddr is intentionally absent.
Ensure you reference the symbols GetProjectedContainerProfile,
cp.EgressAddresses, AddressPortsByAddr, was_address_port_protocol_in_* and
matchIPField when implementing the check.

Comment on lines +14 to +16
**rejected at admission** — it carries `dnsNames: ["**"]` to demonstrate
that the recursive-wildcard token is invalid v0.0.2 syntax. Don't `kubectl
apply` it; the apiserver will return a 400. The runtime matcher also
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct fixture reference to match the actual invalid example.

Line 14 says dnsNames: ["**"], but the corresponding fixture uses "**.example.com.". Please align the README text with the actual fixture content.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/resources/network-wildcards/README.md` around lines 14 - 16, Update the
README text so the rejected example matches the actual fixture content: replace
the inline snippet `dnsNames: ["**"]` with the fixture's value `dnsNames:
["**.example.com."]` (or otherwise mention `**.example.com.`) so the README and
the fixture are consistent; verify the sentence that follows still reads
correctly with the updated token example.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.214 0.205 -4.1%
Peak CPU (cores) 0.224 0.218 -2.6%
Avg Memory (MiB) 311.976 275.399 -11.7%
Peak Memory (MiB) 315.805 276.648 -12.4%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1709 119454 98.6%
network 897 68306 98.7%
open 30505 621254 95.3%
symlink 6000 0 0.0%
syscall 995 1897 65.6%
Event Counters
Metric BEFORE AFTER
capability_counter 7 6
dns_counter 45 200
exec_counter 7116 7288
network_counter 80238 83673
open_counter 775019 793769
syscall_counter 3382 3630

Entlein added 2 commits May 15, 2026 22:49
The helper was removed in an earlier cleanup pass on the grounds
that no default rule used it. That removal deleted upstream
functionality that other consumers (custom rule sets, external rule
authors) may rely on. CEL helpers are a stable public surface — they
shouldn't disappear from a fork merge.

Restores the declaration in ap.go (with cost-estimator entry) and
the implementation in open.go exactly as upstream defines it: path
match via dynamicpathdetector.CompareDynamic against Opens.Values
and Opens.Patterns, with the flags argument parsed and validated for
shape but not used for matching in v1 (the OpenFlagsByPath projection
slice remains out of scope).

Matches upstream/main byte-for-byte. No new tests required —
upstream's own unit test coverage is restored alongside the helper.
…ion test

Companion to the previous commit restoring the CEL helper. The
integration test case for "Check file access pattern" had been
narrowed to was_path_opened during the helper removal; restoring it
to the upstream form (was_path_opened_with_flags with a flags list)
keeps the CEL surface covered.
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.237 0.233 -1.8%
Peak CPU (cores) 0.244 0.241 -1.1%
Avg Memory (MiB) 287.875 264.999 -7.9%
Peak Memory (MiB) 290.461 266.746 -8.2%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1705 119455 98.6%
network 902 77997 98.9%
open 30758 620996 95.3%
symlink 6000 0 0.0%
syscall 996 1907 65.7%
Event Counters
Metric BEFORE AFTER
capability_counter 9 10
dns_counter 1441 1415
exec_counter 7248 7085
network_counter 95247 93177
open_counter 787457 770189
syscall_counter 3518 3645

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.175 0.180 +3.0%
Peak CPU (cores) 0.184 0.188 +2.3%
Avg Memory (MiB) 320.124 267.237 -16.5%
Peak Memory (MiB) 321.863 269.051 -16.4%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1767 119391 98.5%
network 902 77997 98.9%
open 32597 622736 95.0%
symlink 6000 0 0.0%
syscall 984 1893 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 9 10
dns_counter 1443 1440
exec_counter 7261 7204
network_counter 95400 94753
open_counter 793392 787970
syscall_counter 3640 3603

…e#807, kubescape#808) (#44)

* fix(containerprofilecache): honor verify result before merging user overlay

CodeRabbit upstream PR kubescape#808 / containerprofilecache.go:414 (Major).
The verifyUserApplicationProfile and verifyUserNetworkNeighborhood
methods already return a boolean reflecting verification outcome —
true when the overlay is unsigned OR when verification succeeded OR
in permissive mode (EnableSignatureVerification=false); false only
in strict mode on actual tamper.

The two call sites in projection-load were discarding that return,
so tampered overlays in strict mode silently merged anyway. The
R1016 alert was emitted but the protection was advisory only.

Now: when verify returns false (strict mode + tamper detected) the
overlay is nilled out before the merge step so the cache never
projects a known-tampered profile. Permissive mode is unchanged —
verify always returns true, the overlay still merges, R1016 still
fires.

New tests:

  - TestVerifyAP_StrictMode_ReturnsFalseOnTamper — sign + tamper an
    ApplicationProfile, construct a cache with
    EnableSignatureVerification=true, and assert
    verifyUserApplicationProfile returns false (caller drops overlay).
  - TestVerifyNN_StrictMode_ReturnsFalseOnTamper — symmetric pin for
    the NetworkNeighborhood path.

The existing legacy-permissive tamper test
(TestVerifyAP_TamperedProfile_PopulatesDedupMap) continues to pass
unchanged — that path still returns true with the R1016 emitted.

* docs(applicationprofile): rule-author contracts on opens-with-suffix/prefix + exec-args

Two doc-only fixes for CodeRabbit cross-PR advisory:

#7 (NA kubescape#807): ap.was_path_opened_with_suffix / _prefix — explicitly
document the false-negative gap when the projection is in pass-through
mode (cp.Opens.All=true). Wildcard Patterns are skipped from the
suffix/prefix scan because their token-bearing text doesn't safely
answer suffix questions. Rule authors who need wildcard-aware coverage
should either declare an Opens-projection slice (so SuffixHits/PrefixHits
become authoritative for the literals they care about) or use
ap.was_path_opened (which runs CompareDynamic over Patterns).

#8 (NA kubescape#807): wasExecutedWithArgs — document the three states of
ExecsByPath:
  1. Path absent from Execs.Values → exec not allowed, fall through.
  2. Path in Values, ABSENT from ExecsByPath → legacy back-compat
     "no argv constraint", match.
  3. Path in Values, PRESENT with empty arg list [] → explicit
     "ran with no args" constraint, NOT a wildcard.
The distinction is load-bearing for profile authors: an entry of
{Path: ..., Args: []} is a constraint, not a free pass.

No behavioural change. Tests pass unchanged.

* deps: pin stereoscope v0.1.9 + runtime-spec v1.2.1 (compat with kubescape/syft fork)

Storage rc1 bumped to syft v1.42.4 (CVE-2026-33481), which transitively
requires stereoscope v0.1.22 + runtime-spec v1.3.0. Those versions use
the new moby/moby/client submodule API, which is incompatible with
inspektor-gadget's moby/moby umbrella requirement on the node-agent
side (ambiguous-import wall — see issue #45).

Node-agent stays on kubescape/syft v1.32.0-ks.2 via the existing
replace, but transitive resolution from storage's go.mod pulls the
newer stereoscope into the build, breaking the build with:

  undefined: client.New
  undefined: client.PingOptions

Adds two replace directives to force the older transitive chain that
matches kubescape/syft v1.32.0-ks.2's expectations:

  github.com/anchore/stereoscope     => v0.1.9-0.20250826202322-...
  github.com/opencontainers/runtime-spec => v1.2.1

This is the minimum set needed for node-agent to build cleanly against
storage rc1 (with syft v1.42.4) while still using kubescape/syft on
its own side. Storage's CVE fix remains in effect at the storage
binary; node-agent's syft surface is unchanged.

Verified locally:
  go build ./...          ok
  go test ./pkg/objectcache/... ./pkg/rulemanager/... -count=1
                          → 30+ packages ok

---------

Co-authored-by: Entlein <eineintlein@gmail.com>
@entlein entlein merged commit 58247d7 into main May 16, 2026
2 checks passed
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.222 0.216 -2.7%
Peak CPU (cores) 0.238 0.236 -0.8%
Avg Memory (MiB) 268.092 265.967 -0.8%
Peak Memory (MiB) 269.539 268.070 -0.5%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 2 0 0.0%
hardlink 6000 0 0.0%
http 1704 119459 98.6%
network 903 78000 98.9%
open 31579 620182 95.2%
symlink 6000 0 0.0%
syscall 988 1884 65.6%
Event Counters
Metric BEFORE AFTER
capability_counter 9 10
dns_counter 1454 1452
exec_counter 7277 7266
network_counter 95726 95558
open_counter 790881 790988
syscall_counter 3653 3608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants