Skip to content

orchestrator(maintenance): leaving maintenance mode when host was rebooted does not work as expected #875

@michaelalang

Description

@michaelalang

Even though the method in the orch module exit_host_maintenance(self, hostname: str, force: bool = False, offline: bool = False) [1] takes the offline and force parameters, the command line parse does not accept them

$ ceph orch host maintenance exit --offline --force node.example.com
Invalid command: Unexpected argument '--offline'
orch host maintenance exit <hostname> :  Return a host from maintenance, restarting all Ceph daemons (cephadm only)

The reason this is discovered is that when setting a host in maintenance, rebooting that host and trying to leave maintenance again ends up in an uncought Exception

$ ceph orch host maintenance exit node.example.com 
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1907, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 186, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 526, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 122, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 111, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 804, in _host_maintenance_exit
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 240, in raise_if_exception
    e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'

re-executing the same command another time resolves the issue.

$ ceph orch host maintenance exit node.example.com
Ceph cluster .... on node.example.com has exited maintenance mode

reproducer

  • set a host into maintenance
    ceph orch host maintenance enter node.example.com

  • verify the host is reported in maintenance

    $ ceph orch host ls --host_status maintenance
    HOST                 ADDR             LABELS                                    STATUS       
    node.example.com  192.168.192.210  osd,mgr                            Maintenance 
    
  • reboot the host
    ssh node.example.com "sudo reboot"

  • verify the host is reported offline

    $ ceph orch host ls  --host_status offline
    HOST                 ADDR             LABELS          STATUS   
    node.example.com  192.168.192.210  osd,mgr  Offline  
    
  • after the host is back online leave maintenance mode
    ceph orch host maintenance exit node.example.com

  • exception returned

    Error EINVAL: Traceback (most recent call last):
      File "/usr/share/ceph/mgr/mgr_module.py", line 1907, in _handle_command
        return self.handle_command(inbuf, cmd)
      File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 186, in handle_command
        return dispatch[cmd['prefix']].call(self, cmd, inbuf)
      File "/usr/share/ceph/mgr/mgr_module.py", line 526, in call
        return self.func(mgr, **kwargs)
      File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 122, in <lambda>
        wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
      File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 111, in wrapper
        return func(*args, **kwargs)
      File "/usr/share/ceph/mgr/orchestrator/module.py", line 812, in _host_rescan
        raise_if_exception(completion)
      File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 240, in raise_if_exception
        e = pickle.loads(c.serialized_exception)
    TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
    
  • also happens on other commands like ceph orch host rescan node.example.com

[1] https://github.com/ceph/ceph/blob/9336c9e5e8cca58aadc3271f872e586061d07198/src/pybind/mgr/cephadm/module.py#L2219

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions