manager: inverted process tree with supervisord
Process tree before this MR:
manager
|
supervisord
| |
kresd gc
After this MR:
supervisord -------
| | |
kresd gc manager
Why?
Because when manager stops without this, everything is taken down in Docker and in systemd. If we invert the process tree, that does not happen.
What does this change bring?
- it's more complicated
- manager can be upgraded without stopping anything else
- ...
Is it really worth it?
I am not sure at the moment. Depends, whether we want to pursue zero-downtime restarts for everything. If it's ok that a failure of manager will bring down everything else on most deployments, than this MR is probably pointless. If we want to prevent that, we really need it.
What needs to be solved to make this useful?
when manager enters FATAL state, supervisord must exit with non-zero exit codelogging is a mess
Merge request reports
Activity
added manager label
assigned to @vsraier
- Resolved by Aleš Mrázek
Yes, I'd prefer the property that a crash in manager would not take down the whole service (typically). Then for many kinds of bugs, the restarted manager might even recover quite well, I imagine.
Though that slightly clashes with the aims of
when manager enters FATAL state, supervisord must exit with non-zero exit code
i.e. there might be a dilemma in some situations:
- FATAL: failing the whole single systemd service in order to give a stronger signal to the admin that it's unhealthy
- non-FATAL: trying to keep (partially) operational, with just errors in logs, etc.
Edited by Vladimír Čunát
added 1 commit
- c34cff2d - manager: improved logging, no exceptions on shutdown -> now it looks almost as...
added 1 commit
- dd6e251a - manager: improved logging, no exceptions on shutdown -> now it looks almost as...
added 1 commit
- 2f2053fd - manager: supervisord stops with non-zero exit code whenever manager enters FATAL state
added 1 commit
- 722316fa - manager: supervisord stops with non-zero exit code whenever manager enters FATAL state
It should all be working properly now:
- the process tree is inverted
- the logs are worse then before during startup, but passable, not horrible and we should probably attempt to clean them in some big consistent effort regardless
- when manager enters FATAL state, everything is teared down and stopped, manager ends with exit code 1
- startup time is a bit worse yet again and the measurements in logs no longer make much sense
Could you @amrazek please test, that it runs properly for you and give me a code review? Thanks.
As far as I know, this should be the last of the large changes related to ditching systemd and improving supervisord. After this, it will be more about polishing and packaging.
requested review from @amrazek
added 5 commits
-
722316fa...ac7de431 - 2 commits from branch
manager
- 05820866 - manager: initial support for inverted process tree (manager running under a...
- 57187d16 - manager: improved logging, no exceptions on shutdown -> now it looks almost as...
- 981c11d7 - manager: supervisord stops with non-zero exit code whenever manager enters FATAL state
Toggle commit list-
722316fa...ac7de431 - 2 commits from branch
added 1 commit
- 8f15f087 - manager: tests: datamodel: 'id' removed from tests
added 1 commit
- 4c48719d - manager: tests: datamodel: 'id' removed from tests
mentioned in commit 2ad8218e