Skip to content

auto_disarm_on_error default contradicts supervision architecture philosophy #36

@jimsynz

Description

@jimsynz

Summary

The auto_disarm_on_error: true default undermines the fault isolation benefits of the topological supervision architecture.

The Tension

BB's supervision architecture is designed around fault isolation - crashes propagate only within affected subtrees:

If an elbow servo fails, it shouldn't affect the shoulder. The supervision tree enforces this - crashes propagate only within affected subtrees, unaffected parts of the robot continue operating.

However, auto_disarm_on_error defaults to true, meaning any hardware error reported via BB.Safety.report_error/3 triggers a robot-wide disarm. This negates the isolation benefits:

  • A single servo overheating disarms the entire robot
  • A transient I2C glitch on one bus stops all motion
  • The careful fault isolation in the supervision tree becomes meaningless

Current Behaviour

# In lib/bb/dsl.ex
auto_disarm_on_error: [
  type: :boolean,
  doc: "Automatically disarm the robot when a hardware error is reported. Defaults to true.",
  default: true
]

When any component calls BB.Safety.report_error/3, the entire robot disarms by default.

Proposed Change

Consider changing the default to false, aligning with the supervision philosophy:

  1. Report errors via PubSub - always publish to [:safety, :error] for monitoring
  2. Let supervision handle restarts - crashed processes restart within their subtree
  3. Let applications decide - subscribe to error events and implement appropriate responses
  4. Opt-in to auto-disarm - users who want conservative behaviour can set auto_disarm_on_error: true

Alternative: Scoped Error Responses

A more sophisticated approach could allow scoped responses:

  • Errors in a limb subtree could disable just that limb
  • Critical errors (power, safety controller) could still trigger full disarm
  • This would require distinguishing error severity/scope

Impact

This is a breaking change in default behaviour. However, the current default seems to be "safe but defeats the purpose of the architecture" rather than "correct default for production systems".

Users who have explicitly set auto_disarm_on_error: false would be unaffected. Users relying on the current default would need to opt-in to preserve that behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions