Health checks: listening for changes from service managers
At the moment, we have some sort of a watchdog task periodically checking for problems. It would be better to react directly at failures and not wait until they are detected. Actually, I would prefer to keep both safety measures in place at once.
The problems with reacting to changes immediately are:
- we can get information about single process failure from systemd with
ExecStopPost
hook and then actively check. This is prone to race conditions and we have to figure out how to communicate with the manager from systemd. - I currently don't know how to get information about systemd internal state changes (specifically in the service unit)
- I don't know how to do it all with supervisord