Skip to content

tests/zedagent: add zedagent integration test suite#1152

Open
eriknordmark wants to merge 8 commits into
lf-edge:masterfrom
eriknordmark:zedagent-integration-tests
Open

tests/zedagent: add zedagent integration test suite#1152
eriknordmark wants to merge 8 commits into
lf-edge:masterfrom
eriknordmark:zedagent-integration-tests

Conversation

@eriknordmark
Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark commented May 4, 2026

Summary

Adds an Eden testscript suite under tests/zedagent/ that exercises
the zedagent microservice end-to-end against a live EVE instance:
device info, metrics, app/NI info, attestation FSM, and the
/config/-ingest paths consulted at boot. Ten scenarios in two
manifests.

eden.zedagent.tests.txt (eight tests, run against an onboarded EVE):
device_info_completeness, config_items_and_status,
maintenance_mode, app_metrics_detail,
network_instance_info_metrics, attest_flow (skips without
eve.tpm), bootstrap_config_item_ingest, global_config_file_ingest.

The two /config-ingest scenarios stage a /config/bootstrap-config.pb
or /config/GlobalConfig/global.json after the device is onboarded,
freeze /persist/checkpoint so zedagent cannot recreate lastconfig,
and reboot. bootstrap_config_item_ingest polls
/run/zedagent/ConfigItemValueMap/global.json for its marker (which
survives because adam echoes the same configItems back).
global_config_file_ingest greps the newlog for the
"/config/GlobalConfig contains:" notice instead, because
parseConfigItems on adam's first fetch rebuilds globalConfig from
defaults + adam's-items and wipes any item adam doesn't push — the log
line is the only persistent signal.

eden.zedagent-preonboard.tests.txt (two tests, each owns its full
eden lifecycle — stop, wipe, eden setup with --eve-bootstrap-file
or --eve-config-dir, eden start, no onboard):
bootstrap_config_item_ingest_preonboard,
global_config_file_ingest_preonboard. With adam never learning the
device exists, /config-sourced ConfigItemValueMap stays authoritative
throughout the verification window — no controller-takeover race, no
log-grep fallback needed.

debug.enable.ssh rides inside the staged config in both preonboard
tests and gates sshd's iptables open via SSHAuthorizedKeys. SSH
readiness doubles as the bug-class canary for the lf-edge/eve#5584 /
lf-edge/eve#5775 cert-chain regression class: a zedagent that rejects
the bootstrap pb (or fails to apply GlobalConfig) leaves
SSHAuthorizedKeys unset and the test times out — a loud, race-free
signal.

zedagent_test.go provides TestInfo, TestMetric, and TestFlowLog
helpers that the testscripts invoke via the test command. The
preonboard scenarios use preonboard-template.json from #1165's
tests/network/testdata/.

Test plan

  • eden test ./tests/zedagent against an onboarded EVE — all eight
    post-onboard tests pass; attest_flow self-skips on non-TPM
    QEMU.
  • eden test ./tests/zedagent -s eden.zedagent-preonboard.tests.txt
    — both preonboard tests pass (~3.5 min total).

🤖 Generated with Claude Code

eriknordmark and others added 8 commits May 19, 2026 23:55
Add two escript scenarios covering the NIM startup file-ingest path that
no Go unit test reaches: cmd/nim/nim.go ingestDevicePortConfig() and
ingestDevicePortConfigFile(), plus the hasPersistLastconfig() short-circuit.

nim_lastconfig_blocks_ingest verifies that with /persist/checkpoint/lastconfig
present NIM emits the explicit suppression log line, leaves
/run/global/DevicePortConfig/ empty, and adds no "override" entry to
DevicePortConfigList.

nim_override_json_ingest verifies that with lastconfig deleted and the
/persist/checkpoint directory chattr +i'd to defeat zedagent's race to
recreate it, NIM picks up an override.json under /config/DevicePortConfig/,
copies it to /run/global/DevicePortConfig/, registers the "override" key
in DevicePortConfigList, and stamps ConfigSource.Origin = OVERRIDE (=3).

Both scripts mount /dev/sda4 (vfat CONFIG partition) at runtime to inject
the override file, since /config is a read-only tmpfs at runtime in the
QEMU/LinuxKit EVE image. They also document a non-obvious eden CLI gotcha:
`eden eve ssh '<multi-line>'` collapses newlines to spaces, so all
multi-command shell snippets must be joined with `;` or `&&` on a single
line.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the NIM startup-matrix coverage from 2 to 6 of the 6 rows in the
matrix documented in pkg/pillar/docs/nim-eden-test-plan.md.

nim_usb_json_ingest (R1, sibling of override case) — verifies that
ingestDevicePortConfigFile() derives DPC.Key from the file basename
when the JSON has no explicit Key field, so a usb.json results in
DPCList["usb"] not DPCList["override"].

nim_bootstrap_supersedes_override (R3) — verifies cmd/nim's bootstrap-
skip branch: when /config/bootstrap-config.pb is present, legacy
*.json files under /config/DevicePortConfig/ are NOT copied into
/run/global/. The test deliberately uses a 1-byte placeholder pb
because cmd/nim only checks file existence, not content.

nim_lastconfig_blocks_bootstrap (R6) — verifies the
expectBootstrapDPCs reset branch at nim.go:214: with lastconfig
present, NIM does not wait for an installer DPC even if
bootstrap-config.pb exists. Asserts steady-state (DPCList has
zedagent entry) and that the pb file is not consumed/deleted.

nim_bootstrap_only (R2) — exercises the full bootstrap-pb decode path
end-to-end: signed pb on /config, lastconfig deleted, NIM ingests via
zedagent's republish. Currently `skip`'d via [!exec:nim-bootstrap-pb-gen]
until that host-side helper is added; the binary's specification is
documented in the file's header.

All four reuse the patterns established by nim_lastconfig_blocks_ingest
and nim_override_json_ingest: /dev/sda4 mount for /config writes,
chattr +i on /persist/checkpoint to defeat zedagent's lastconfig-
recreate race, semicolon-joined ssh commands (multi-line single-quoted
ssh strings collapse newlines to spaces), and DPCList polling (durable)
rather than /run/global/ polling (tmpfs-wiped).

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds tests/network/cmd/nim-bootstrap-pb-gen, a host-side helper that
signs an EdgeDevConfig JSON into a /config/bootstrap-config.pb using
the eden controller signing key. The nim_bootstrap_*.txt testdata
invokes it at runtime to stage controllable bootstrap configs and
verify content-level round-trip behavior — a distinctive
Logicallabel baked into the bootstrap pb reappears in the device's
DevicePortConfigList, confirming end-to-end propagation from
/config to NIM's pubsub state.

The testdata writes to the CONFIG partition via
`eve config mount /run/<path>`, a device-agnostic interface that
exposes the persistent partition read/write regardless of which
block device backs it.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nim_dpcl_reapplied_after_reboot — two-reboot test that isolates the
pubsub persistent-publication reload path from any re-ingestion.
First reboot stages and ingests an override.json under a distinctive
Logicallabel "reapply-test" with TimePriority=1990-01-01 (the entry
stays in DPCList without becoming the active DPC, so eth0's
effective Logicallabel is unchanged for subsequent tests). Second
reboot removes the file and clears the chattr +i directory flag so
the override CANNOT be re-ingested. The DPCList entry surviving the
second reboot is positive evidence that NIM's pubDevicePortConfigList
(Persistent:true) is being reloaded on agent startup. Without this
isolation, every other test re-ingests the file it observes, masking
a regression in the persistent reload path.

nim_flowlog_acl_reconcile — toggle test that drives the flowlog
gate end to end: NetworkInstanceConfig with EnableFlowlog=false
produces no CONNMARK marking rules in iptables mangle table; with
EnableFlowlog=true the "SSH and Guacamole mark" multiport rule
(matching tcp dports 22,4822) appears via DpcReconciler's
getIntendedMarkingRules; deleting the NI removes the rules again.
Skips itself on kube EVE because r.HVTypeKube unconditionally
installs the marking rules regardless of EnableFlowlog, which
makes the iptables witness incapable of distinguishing the two
states.

Both tests use the established test idioms (eve config mount,
eve exec pillar jq, single-line ssh, defensive pre-cleanup,
3-consecutive-success ssh stabilization).

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LPS multi-port + wireless mismatch, diag-endpoint GCP propagation)

radio_silence_persistence (eclient) — extends radio_silence.txt by
toggling radio silence ON via the local-manager LPS server, then
rebooting EVE and asserting that
/run/zedagent/ZedAgentStatus/zedagent.json on the device still
carries RadioSilence.Imposed=true after the reboot (the pubsub
topic file is keyed by agent name, so the path is zedagent.json,
not global.json). The assertion reads EVE-side pubsub directly so
it does not depend on the LPS app having reconnected. Exercises
zedagent's lastradioconfig persistence on disk and NIM's
subZedAgentStatus subscription delivering the restored
ZedAgentStatus on a fresh boot through cmd/nim.handleZedAgentStatusImpl.

A reset-radio-state.sh helper wipes /persist/checkpoint/lastradioconfig
and reboots before the test starts, so the precondition
wait-radio-status=false converges even if a prior test or a prior
failed run left lastradioconfig with Imposed=true. The substantive
test claim is the post-reboot Imposed=true read; the test does not
depend on a post-reboot toggle-OFF round-trip, which can hang on
the EVE write path.

lps_all_mgmt_ports_overridden (eclient) — applies an LPS local
network config that overrides BOTH eth0 (DNS) and eth1 (MTU) on the
QEMU device model where both adapters are management uplinks. After
both ports flip to configApplied=true via the local-manager API, the
test asserts the behavioural witnesses on the device — resolv.conf
shows the LPS-supplied DNS, ifconfig shows the LPS-supplied MTU.
Exercises dpcmanager/lps.go loadLpsConfig and mergeWithLpsConfig in
the multi-port case — the structural precondition of
areAllMgmtPortUsingLpsConfig() suppression. (The actual fallback
suppression behaviour requires SDN-driven controller-failure
injection and is out of scope.)

lps_wireless_type_mismatch (eclient) — sends an LPS config for eth0
with WirelessDeviceType=WIRELESS_TYPE_WIFI on the QEMU model where
eth0 has WirelessType=None. The pre-merge guard in
mergeWithLpsConfig rejects the LPS port. Test asserts the
distinctive "wireless type mismatch" error string in the
local-manager network-info, configApplied=false on the LPS side,
and configApplied=true on the controller side. Cleanly skips on
the QEMU device model since both adapters are Ethernet — any LPS
DPC with WIRELESS_TYPE_WIFI is rejected upstream with "missing
WiFi configuration" rather than reaching the branch under test;
activating this test requires a custom devmodel.json with a
wireless port.

nim_diag_remote_endpoints (network) — sets
diag.probe.remote.http.endpoint via eden controller update and
asserts the new value reaches
/run/zedagent/ConfigItemValueMap/global.json under
.GlobalSettings["diag.probe.remote.http.endpoint"].StrValue.
Tests the upstream propagation (eden→adam→zedagent) only; the
downstream NIM consumption (in-memory connTester.DiagRemoteEndpoints
and the actual probes on controller failure) needs SDN-driven
failure injection and is documented as out of scope.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds four pre-onboard variants of the NIM /config-ingest tests:

- nim_bootstrap_only_preonboard
- nim_bootstrap_supersedes_override_preonboard
- nim_override_only_preonboard
- nim_globalconfig_only_preonboard

Each test owns its full eden lifecycle (stop, wipe, eden setup with
--eve-bootstrap-file or --eve-config-dir, eden start; no onboard).
With adam never learning the device exists, /config-sourced state
stays authoritative for the entire verification window, closing the
controller-takeover race that makes the onboarded
nim_bootstrap_supersedes_override test a known false negative.

debug.enable.ssh rides inside the staged config in every variant
and gates sshd's iptables open via SSHAuthorizedKeys. SSH readiness
doubles as the bug-class canary for the lf-edge/eve#5584/#5775
cert-chain regression class: when zedagent rejects the bootstrap pb
(or fails to apply GlobalConfig), SSHAuthorizedKeys is never set,
the iptables INPUT rule for tcp/22 stays REJECT, and the test
times out — a race-free signal.

The four tests share preonboard-template.json (a sanitized
EdgeDevConfig template) and are grouped in
eden.network-preonboard.tests.txt for invocation via
`eden test ./tests/network -s eden.network-preonboard.tests.txt`.
tests/network/Makefile's setup target globs *.tests.txt so any
additional manifest in the directory stages alongside the default
eden.network.tests.txt automatically.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Yetus's detsecrets (detect-secrets v1.2.0) plugin flags the
literal password value as Secret Keyword. The value is
meaningless -- the test rejects the port on wireless-type
mismatch before any credentials are inspected -- but the keyword
detector pattern-matches "password": "<any-string>" and trips.
detect_secrets/filters/heuristic.py is_templated_secret exempts
values shaped like <foo>, so use "<redacted>" as the sentinel.
Test still exercises the same code path with no behavioural
change.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an Eden testscript suite under tests/zedagent/ that exercises the
zedagent microservice end-to-end against a live EVE instance: device
info, metrics, app/NI info, attestation FSM, and the /config/-ingest
paths consulted at boot. Ten scenarios in two manifests.

eden.zedagent.tests.txt (eight tests, run against an onboarded EVE):
device_info_completeness, config_items_and_status, maintenance_mode,
app_metrics_detail, network_instance_info_metrics, attest_flow (skips
without eve.tpm), bootstrap_config_item_ingest,
global_config_file_ingest. The two /config-ingest scenarios stage a
bootstrap-config.pb or GlobalConfig/global.json after the device is
onboarded, freeze /persist/checkpoint so zedagent cannot recreate
lastconfig, and reboot. bootstrap_config_item_ingest polls
/run/zedagent/ConfigItemValueMap/global.json for its marker (which
survives because adam echoes the same configItems back).
global_config_file_ingest greps the newlog for the
"/config/GlobalConfig contains:" notice instead, because
parseConfigItems on adam's first fetch rebuilds globalConfig from
defaults+adam's-items and wipes any item adam doesn't push — the log
line is the only persistent signal.

eden.zedagent-preonboard.tests.txt (two tests, each owns its full eden
lifecycle — stop, wipe, eden setup with --eve-bootstrap-file or
--eve-config-dir, eden start, no onboard):
bootstrap_config_item_ingest_preonboard,
global_config_file_ingest_preonboard. With adam never learning the
device exists, /config-sourced ConfigItemValueMap stays authoritative
throughout the verification window — no controller-takeover race, no
log-grep fallback needed.

debug.enable.ssh rides inside the staged config in both preonboard
tests and gates sshd's iptables open via SSHAuthorizedKeys. SSH
readiness doubles as the bug-class canary for the
lf-edge/eve#5584/#5775 cert-chain regression class: a zedagent that
rejects the bootstrap pb (or fails to apply GlobalConfig) leaves
SSHAuthorizedKeys unset and the test times out — a loud, race-free
signal.

zedagent_test.go provides TestInfo, TestMetric, and TestFlowLog
helpers that the testscripts invoke via the `test` command. The
preonboard scenarios use preonboard-template.json from PR lf-edge#1165's
tests/network/testdata/. The post-onboard suite was validated against
a QEMU-based coverage-instrumented EVE; the six baseline scenarios
plus the two /config-ingest tests achieve substantially higher
cmd/zedagent coverage than the unit tests alone.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eriknordmark eriknordmark force-pushed the zedagent-integration-tests branch from f6d75b5 to ac1b2d3 Compare May 19, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant