Skip to content

Fix bugbunny findings#5282

Open
geyslan wants to merge 9 commits into
aquasecurity:mainfrom
geyslan:bugbunny-findings
Open

Fix bugbunny findings#5282
geyslan wants to merge 9 commits into
aquasecurity:mainfrom
geyslan:bugbunny-findings

Conversation

@geyslan
Copy link
Copy Markdown
Member

@geyslan geyslan commented Apr 15, 2026

1. Explain what the PR does

15cb88e refactor(config): group BPF state and clean temp BTF

Consolidate BTF path, object bytes, and uprobe binary path under
BPFConfig. Track the temp directory explicitly via BTFTempDir so
Cleanup only removes directories we created, never user-supplied
BTF paths. Clear object bytes after module creation so memory can
be reclaimed. Remove the BTFHub temp tree when embedded unpack
fails.

--

6e51ba1 fix(build): auto-detect base remote in PR targets

The Makefile hardcoded origin/main as the base ref, breaking
workflows that name the upstream remote differently (e.g.
upstream, private). checkpatch.sh now probes remotes in
preference order (upstream, private, origin, then any other)
and all Make targets forward BASE_REF only when explicitly set.

Remove the unused LOGFROM variable.

--

5d7cf1c fix(detectors): abort load on duplicate list names

Duplicate shared-list names were handled fail-open: the loader
kept the first-seen value and logged a warning. Because
os.ReadDir returns files in lexical order, an attacker who can
write to the detector directory deterministically overrides any
legitimate list by choosing a lexically-earlier filename --
silently tampering detection rules without modifying the
original file.

Treat duplicate list names as fatal (matching the existing
fatal handling for duplicate event names) and escalate loader
errors from WARN to ERROR.

Detected by: bugbunny.ai

--

4e4a165 fix(ebpf): cap ARGS_ARR_T element count

An attacker can exec processes with huge argv/envp arrays,
causing the Go decoder to allocate unbounded slices from the
uncapped elem_num field -- leading to OOM kills under memory
pressure.

Cap elem_num to MAX_ARR_LEN (8192) in the eBPF helper before
it reaches the perf buffer, and add a matching Go-side cap as
defense-in-depth.

Detected by: bugbunny.ai

--

4edb89f fix(security)!: harden file ops against symlinks

Add SafeOpenFile (O_NOFOLLOW) and SafeRemoveAll (Lstat guard)
to fileutil and migrate all remaining file operation call sites
to resist symlink attacks and TOCTOU races:

  • Move default artifacts dir from /tmp/tracee to /var/lib/tracee
  • Replace os.RemoveAll with SafeRemoveAll for artifact cleanup
  • Replace os.Create/os.OpenFile with SafeOpenFile in output,
    traceectl, and BTF unpack paths
  • Tighten MkdirAll permissions from 0755 to 0750
  • Use os.Lstat to reject symlinked unix socket paths
  • Replace os.Chmod with fd-based Fchmod on unix socket listener
  • Add os.Lstat guard before plugin.Open to skip symlinked .so
  • Replace predictable /tmp/tracee-btf with os.MkdirTemp
  • Export DefaultArtifactsDir constant for test/caller use

BREAKING CHANGE: default artifacts directory moved from
/tmp/tracee to /var/lib/tracee; symlinked unix socket paths and
plugin .so files are now rejected; output directory permissions
tightened from 0755 to 0750.

--

3dcd173 fix(fileutil)!: use os.Root for symlink-safe I/O

Replace custom openat/mkdirat/renameat syscall wrappers with Go 1.24
os.Root, which confines all file operations to a directory tree.
Symlinks escaping the root and ".." traversals are rejected by the
kernel. OpenRootDir adds O_NOFOLLOW so a symlinked output directory
itself is also rejected (ELOOP).

Migrate Tracee.OutDir and pcaps.outputDirectory from *os.File to
*os.Root, updating all callers in capture.go, processor_funcs.go,
tracee.go, and the pcaps package.

Detected by: bugbunny.ai

BREAKING CHANGE: fileutil directory-relative functions now take
*os.Root instead of *os.File. OpenExistingDir, MkdirAt, and
ValidateRelativePath are removed. RenameAt takes a single root
instead of two directory handles.

--

67a0d1b fix(grpc)!: bind TCP listener to loopback by default

TCP gRPC always bound to 0.0.0.0 (all interfaces) with no way to
restrict it, no TLS, and no auth interceptors (CWE-306). Any
network-reachable client could invoke privileged RPCs such as
DisableEvent, StreamEvents, and ChangeLogLevel.

Default bare-port TCP addresses to 127.0.0.1, accept host:port
syntax (e.g. tcp:0.0.0.0:4466) for explicit wildcard bind, and
warn when binding to a non-loopback address.

Detected by: bugbunny.ai

BREAKING CHANGE: --server grpc-address=tcp:4466 now binds to
127.0.0.1:4466 instead of 0.0.0.0:4466. Use
tcp:0.0.0.0:4466 to restore the old behavior.

--

69721dd fix(server)!: bind HTTP to loopback by default

The default HTTP address was ":3366" (all interfaces), exposing
metrics, healthz, and pprof endpoints to the network without
authentication. This contradicted the documented default of
"localhost:3366".

Change the default to "127.0.0.1:3366" so the CLI matches the
documented behavior. Update the Helm chart to explicitly bind to
"0.0.0.0:3366" (required for kubelet probes and Prometheus
scraping) and add an opt-in NetworkPolicy (disabled by default)
that operators can enable to restrict HTTP ingress to specific
pods. Log a startup warning when using the loopback default so
operators notice the change immediately.

Detected by: bugbunny.ai

BREAKING CHANGE: The default HTTP server address changed from
":3366" (all interfaces) to "127.0.0.1:3366" (loopback only).
Remote Prometheus scraping or health checks will stop working
unless --server http-address=0.0.0.0:3366 is set explicitly.
Kubernetes Helm deployments are unaffected (chart sets 0.0.0.0
explicitly).

--

6edb6c4 fix(detectors): reject non-regular files in YAML loader

A FIFO or symlink-to-FIFO placed in a detector directory
causes os.ReadFile to block indefinitely, hanging the loader
goroutine and preventing detector ingestion (local DoS).
Guard LoadFromDirectory with entry.Type().IsRegular() so
non-regular files never reach any read path. Add Lstat-based
checks in peekFileType, ParseFile, and loadListFile as
defense-in-depth for direct callers.

Detected by: bugbunny.ai

--

2. Explain how to test it

3. Other comments

@geyslan geyslan requested a review from a team April 15, 2026 14:23
@geyslan geyslan self-assigned this Apr 15, 2026
@geyslan geyslan requested review from a team, Copilot and trvll and removed request for a team April 15, 2026 14:23
@geyslan geyslan requested a review from yanivagman April 15, 2026 14:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Tracee against multiple security issues reported by bugbunny.ai, focusing on safer filesystem operations (symlink/TOCTOU resistance), safer defaults for network listeners, and robustness against resource-exhaustion and detector-tampering vectors.

Changes:

  • Switch artifact/output filesystem operations to traversal-resistant patterns (Go 1.24 os.Root, O_NOFOLLOW, symlink checks) and update call sites accordingly.
  • Harden network exposure by defaulting HTTP and gRPC TCP listeners to loopback and documenting migration paths (including Helm defaults + NetworkPolicy).
  • Add/adjust guards for detector loading (duplicate list names fatal, reject non-regular YAML inputs) and cap argv/envp array decoding to prevent OOM.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/testutils/tracee_integrated.go Updates test artifacts output path to new default artifacts dir constant.
tests/integration/capture_test.go Updates integration test capture directory base to new artifacts dir default.
pkg/signatures/signature/signature.go Skips symlinked Go plugins before plugin.Open to reduce symlink attack surface.
pkg/server/http/server_test.go Trivial formatting change (trailing newline).
pkg/server/grpc/server_test.go Updates expected gRPC TCP address formatting to host:port form.
pkg/server/grpc/server.go Adds loopback warning for TCP binds; uses fd-based chmod for unix sockets; simplifies Address().
pkg/pcaps/pcaps.go Updates pcaps constructor to accept *os.Root output directory.
pkg/pcaps/common.go Migrates global output directory to *os.Root and adjusts related helpers/errors.
pkg/ebpf/tracee.go Migrates Tracee.OutDir to *os.Root and opens via fileutil.OpenRootDir; adapts artifacts map writer.
pkg/ebpf/capture.go Adjusts RenameAt call sites to new single-root signature.
pkg/ebpf/c/common/buffer.h Caps elem_num in ARGS_ARR_T to avoid unbounded allocations downstream.
pkg/detectors/yaml/parser.go Uses Lstat + regular-file enforcement before reading YAML to prevent FIFO/device hangs.
pkg/detectors/yaml/loader_test.go Adds regression test ensuring duplicate shared-list names abort directory load.
pkg/detectors/yaml/loader.go Skips non-regular files, aborts on duplicate list names, adds Lstat guard in type peeker.
pkg/detectors/yaml/list_loader.go Uses Lstat + regular-file enforcement for list files before reading.
pkg/detectors/events.go Escalates YAML load errors from WARN to ERROR.
pkg/cmd/initialize/bpfobject.go Uses MkdirTemp for BTF unpack directory and SafeOpenFile for BTF output creation.
pkg/cmd/flags/server_test.go Updates expectations for loopback-default server bindings (HTTP + gRPC).
pkg/cmd/flags/server.go Defaults HTTP to 127.0.0.1; parses gRPC tcp as host:port (bare port => loopback); guards unix socket symlinks.
pkg/cmd/flags/output.go Tightens output dir perms and uses SafeOpenFile for output file creation.
pkg/cmd/flags/artifacts_test.go Updates expected default artifacts output path to /var/lib/tracee/out.
pkg/cmd/flags/artifacts.go Exports DefaultArtifactsDir as /var/lib/tracee; uses SafeRemoveAll for artifact cleanup.
pkg/bufferdecoder/eventsreader.go Adds Go-side cap for ARGS_ARR_T element count as defense-in-depth.
docs/man/server.1 Documents loopback-default behavior for HTTP/gRPC and migration/security guidance.
docs/man/artifacts.1 Documents new default artifacts directory (/var/lib/tracee).
docs/docs/install/config/monitoring.md Updates monitoring docs for loopback default and adds migration guidance.
docs/docs/flags/server.1.md Updates server flag docs to reflect new TCP binding semantics and security notes.
docs/docs/flags/artifacts.1.md Updates artifacts docs to reflect new default artifacts directory.
deploy/helm/tracee/values.yaml Sets Helm default HTTP bind to 0.0.0.0:3366 and adds NetworkPolicy values.
deploy/helm/tracee/templates/networkpolicy.yaml Adds optional NetworkPolicy restricting ingress to Tracee HTTP port.
deploy/helm/tracee/templates/daemonset.yaml Uses helper to derive HTTP port for containerPort/probes.
deploy/helm/tracee/templates/_helpers.tpl Adds helper to extract HTTP port from configured address.
common/fileutil/fileutil_test.go Updates tests to os.Root APIs and adds symlink/path traversal security tests.
common/fileutil/fileutil.go Replaces openat-based helpers with Go 1.24 os.Root; adds SafeOpenFile and SafeRemoveAll.
cmd/traceectl/pkg/cmd/flags/output.go Tightens output dir perms and uses O_NOFOLLOW when creating output file.
Comments suppressed due to low confidence (1)

pkg/cmd/initialize/bpfobject.go:55

  • BpfObject now unpacks embedded BTF into a new os.MkdirTemp directory, but that temp directory is never cleaned up. Since the daemon may start repeatedly (and tests start Tracee many times), this can leave accumulating /tmp/tracee-btf-* directories on disk. Consider tracking the tempDir and removing it after the eBPF module has loaded the BTF (or copying the BTF into a deterministic cache directory under the artifacts dir).
	if !environment.OSBTFEnabled() && btfFilePath == "" {
		tempDir, mkErr := os.MkdirTemp("", "tracee-btf-*")
		if mkErr != nil {
			return errfmt.Errorf("could not create temp dir for BTF: %v", mkErr)
		}
		unpackBTFFile := filepath.Join(tempDir, "tracee.btf")
		err = unpackBTFHub(unpackBTFFile, osInfo)
		if err == nil {
			logger.Debugw("BTF: btfhub embedded BTF file", "file", unpackBTFFile)
			cfg.BTFObjPath = unpackBTFFile
		} else {
			logger.Debugw("BTF: error unpacking embedded BTFHUB file", "error", err)
		}
	}

Comment thread tests/testutils/tracee_integrated.go
Comment thread tests/integration/capture_test.go
Comment thread pkg/ebpf/tracee.go Outdated
Comment thread pkg/server/grpc/server.go
Comment thread common/fileutil/fileutil.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/detectors/yaml/loader.go:213

  • peekFileType uses os.ReadFile after the new Lstat check, which reads the entire YAML file into memory. This bypasses the MaxYAMLFileSize guard used in ParseFile/loadListFile and can reintroduce memory-exhaustion risk for large files in detector directories. Consider enforcing MaxYAMLFileSize here (or reading only a bounded prefix / using an io.LimitReader + yaml decoder) before unmarshalling.
func peekFileType(path string) (string, error) {
	info, err := os.Lstat(path)
	if err != nil {
		return "", err
	}
	if !info.Mode().IsRegular() {
		return "", fmt.Errorf("not a regular file: %s", path)
	}

	data, err := os.ReadFile(path)
	if err != nil {
		return "", err
	}

Comment thread pkg/signatures/signature/signature.go
Comment thread common/fileutil/fileutil.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Comment thread pkg/detectors/yaml/loader.go
Comment thread pkg/detectors/yaml/loader.go
Comment thread pkg/config/config.go
Comment thread pkg/cmd/flags/server.go
Comment thread pkg/config/config.go Outdated
Comment thread deploy/helm/tracee/values.yaml
Comment thread docs/docs/flags/server.1.md Outdated
@geyslan
Copy link
Copy Markdown
Member Author

geyslan commented May 27, 2026

@josedonizetti tks for reviewing it, NBD I'll return to that. Cheers.

Copilot AI review requested due to automatic review settings May 29, 2026 21:10
@geyslan geyslan force-pushed the bugbunny-findings branch from f3af0e2 to 0a89929 Compare May 29, 2026 21:10
geyslan added 4 commits May 29, 2026 18:10
A FIFO or symlink-to-FIFO placed in a detector directory
causes os.ReadFile to block indefinitely, hanging the loader
goroutine and preventing detector ingestion (local DoS).
Guard LoadFromDirectory with entry.Type().IsRegular() so
non-regular files never reach any read path. Add Lstat-based
checks in peekFileType, ParseFile, and loadListFile as
defense-in-depth for direct callers.

Detected by: bugbunny.ai
The default HTTP address was ":3366" (all interfaces), exposing
metrics, healthz, and pprof endpoints to the network without
authentication. This contradicted the documented default of
"localhost:3366".

Change the default to "127.0.0.1:3366" so the CLI matches the
documented behavior. Update the Helm chart to explicitly bind to
"0.0.0.0:3366" (required for kubelet probes and Prometheus
scraping) and add an opt-in NetworkPolicy (disabled by default)
that operators can enable to restrict HTTP ingress to specific
pods. Log a startup warning when using the loopback default so
operators notice the change immediately.

Detected by: bugbunny.ai

BREAKING CHANGE: The default HTTP server address changed from
":3366" (all interfaces) to "127.0.0.1:3366" (loopback only).
Remote Prometheus scraping or health checks will stop working
unless --server http-address=0.0.0.0:3366 is set explicitly.
Kubernetes Helm deployments are unaffected (chart sets 0.0.0.0
explicitly).
TCP gRPC always bound to 0.0.0.0 (all interfaces) with no way to
restrict it, no TLS, and no auth interceptors (CWE-306). Any
network-reachable client could invoke privileged RPCs such as
DisableEvent, StreamEvents, and ChangeLogLevel.

Default bare-port TCP addresses to 127.0.0.1, accept host:port
syntax (e.g. tcp:0.0.0.0:4466) for explicit wildcard bind, and
warn when binding to a non-loopback address.

Detected by: bugbunny.ai

BREAKING CHANGE: --server grpc-address=tcp:4466 now binds to
127.0.0.1:4466 instead of 0.0.0.0:4466. Use
tcp:0.0.0.0:4466 to restore the old behavior.
Replace custom openat/mkdirat/renameat syscall wrappers with Go 1.24
os.Root, which confines all file operations to a directory tree.
Symlinks escaping the root and ".." traversals are rejected by the
kernel. OpenRootDir adds O_NOFOLLOW so a symlinked output directory
itself is also rejected (ELOOP).

Migrate Tracee.OutDir and pcaps.outputDirectory from *os.File to
*os.Root, updating all callers in capture.go, processor_funcs.go,
tracee.go, and the pcaps package.

Detected by: bugbunny.ai

BREAKING CHANGE: fileutil directory-relative functions now take
*os.Root instead of *os.File. OpenExistingDir, MkdirAt, and
ValidateRelativePath are removed. RenameAt takes a single root
instead of two directory handles.
geyslan added 5 commits May 29, 2026 18:10
Add SafeOpenFile (O_NOFOLLOW) and SafeRemoveAll (Lstat guard)
to fileutil and migrate all remaining file operation call sites
to resist symlink attacks and TOCTOU races:

- Move default artifacts dir from /tmp/tracee to /var/lib/tracee
- Replace os.RemoveAll with SafeRemoveAll for artifact cleanup
- Replace os.Create/os.OpenFile with SafeOpenFile in output,
  traceectl, and BTF unpack paths
- Tighten MkdirAll permissions from 0755 to 0750
- Use os.Lstat to reject symlinked unix socket paths
- Replace os.Chmod with fd-based Fchmod on unix socket listener
- Add os.Lstat guard before plugin.Open to skip symlinked .so
- Replace predictable /tmp/tracee-btf with os.MkdirTemp
- Export DefaultArtifactsDir constant for test/caller use

BREAKING CHANGE: default artifacts directory moved from
/tmp/tracee to /var/lib/tracee; symlinked unix socket paths and
plugin .so files are now rejected; output directory permissions
tightened from 0755 to 0750.
An attacker can exec processes with huge argv/envp arrays,
causing the Go decoder to allocate unbounded slices from the
uncapped elem_num field -- leading to OOM kills under memory
pressure.

Cap elem_num to MAX_ARR_LEN (8192) in the eBPF helper before
it reaches the perf buffer, and add a matching Go-side cap as
defense-in-depth.

Detected by: bugbunny.ai
Duplicate shared-list names were handled fail-open: the loader
kept the first-seen value and logged a warning. Because
os.ReadDir returns files in lexical order, an attacker who can
write to the detector directory deterministically overrides any
legitimate list by choosing a lexically-earlier filename --
silently tampering detection rules without modifying the
original file.

Treat duplicate list names as fatal (matching the existing
fatal handling for duplicate event names) and escalate loader
errors from WARN to ERROR.

Detected by: bugbunny.ai
The Makefile hardcoded origin/main as the base ref, breaking
workflows that name the upstream remote differently (e.g.
upstream, private). checkpatch.sh now probes remotes in
preference order (upstream, private, origin, then any other)
and all Make targets forward BASE_REF only when explicitly set.

Remove the unused LOGFROM variable.
Consolidate BTF path, object bytes, and uprobe binary path under
BPFConfig. Track the temp directory explicitly via BTFTempDir so
Cleanup only removes directories we created, never user-supplied
BTF paths. Clear object bytes after module creation so memory can
be reclaimed. Remove the BTFHub temp tree when embedded unpack
fails.
@geyslan geyslan force-pushed the bugbunny-findings branch from 0a89929 to 15cb88e Compare May 29, 2026 21:10
@geyslan geyslan requested a review from josedonizetti May 29, 2026 21:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants