Failed worker notifications
When a worker (kresd
or gc
fails by itself) we should detect it and react somehow. The simplest reaction we can implement is to log an error and kill the manager. It's also the safest option we can do, so this MR attempts to do just that. The idea is as follows:
- extend the
SubprocessController
interface with aregister_instability_handler
function. Manager would than install a callback into the subprocess controller after its creation. - The controller would start a watchdog thread or monitor running workers in some way. When something wrong happens, it would call the given callback.
- In case of instability, the manager will kill everything. In future, we could change it so that the manager would use existing API in the controller to get the current state of the system and try to fix it, so that the last configuration is followed. If something weird happens again, kill everything.
^ This functionality must be implemented for both service managers supported. systemd
supports notifications via DBus, but we must spawn a separate thread for that. supervisord
AFAIK does not support notifications and we must poll its state (but we should check it to make sure there is no better way).