-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
The auto_disarm_on_error: true default undermines the fault isolation benefits of the topological supervision architecture.
The Tension
BB's supervision architecture is designed around fault isolation - crashes propagate only within affected subtrees:
If an elbow servo fails, it shouldn't affect the shoulder. The supervision tree enforces this - crashes propagate only within affected subtrees, unaffected parts of the robot continue operating.
However, auto_disarm_on_error defaults to true, meaning any hardware error reported via BB.Safety.report_error/3 triggers a robot-wide disarm. This negates the isolation benefits:
- A single servo overheating disarms the entire robot
- A transient I2C glitch on one bus stops all motion
- The careful fault isolation in the supervision tree becomes meaningless
Current Behaviour
# In lib/bb/dsl.ex
auto_disarm_on_error: [
type: :boolean,
doc: "Automatically disarm the robot when a hardware error is reported. Defaults to true.",
default: true
]When any component calls BB.Safety.report_error/3, the entire robot disarms by default.
Proposed Change
Consider changing the default to false, aligning with the supervision philosophy:
- Report errors via PubSub - always publish to
[:safety, :error]for monitoring - Let supervision handle restarts - crashed processes restart within their subtree
- Let applications decide - subscribe to error events and implement appropriate responses
- Opt-in to auto-disarm - users who want conservative behaviour can set
auto_disarm_on_error: true
Alternative: Scoped Error Responses
A more sophisticated approach could allow scoped responses:
- Errors in a limb subtree could disable just that limb
- Critical errors (power, safety controller) could still trigger full disarm
- This would require distinguishing error severity/scope
Impact
This is a breaking change in default behaviour. However, the current default seems to be "safe but defeats the purpose of the architecture" rather than "correct default for production systems".
Users who have explicitly set auto_disarm_on_error: false would be unaffected. Users relying on the current default would need to opt-in to preserve that behaviour.