manager: systemd: improved handling of subprocess crashes
closes #711 (closed)
Should be merged after !1249 (merged) is finalized, because the manager is unusable without changes made there.
Merge request reports
Activity
added manager label
assigned to @vsraier
added 9 commits
-
aba661cd...e70ca7ac - 8 commits from branch
manager
- e5857cf7 - manager: systemd: better error messages and restart subprocesses always
-
aba661cd...e70ca7ac - 8 commits from branch
@tkrizek @vcunat Following our discussion in #711 (closed), I've replaced
Restart=on-abnormal
withRestart=always
. The reasoning behind it is in the code, but shortly - the manager simply assumes that kresd is running at all times. Setting restartalways
makes sure it stays true. With this patch, manager shouldn't crash unlesskresd
crashes repeatedly.Do you agree with this? If yes, please approve this MR and I will merge it.
From a packaging/systemd perspective, I wouldn't recommend using
Restart=always
for reasons mentioned in #711 (closed).But if you're absolutely sure that's the right thing to do when using kresd under the manager component, feel free to ahead. I don't really know the internals of manager.
Just beware that is there's ever any bug in the generated configuration, the kresd instance will be restarted over and over again (until it hits the limit, I believe). Then the service log will be polluted with multiple crashes of the same kind in rapid succession.
Manager assumes the invariant, that the
kresd
instances never stop unless requested. If there is ever an event when any instance stops, manager unpredictably explodes at unspecified time in the future. I am sadly not aware of any simple and reliable way how to detect unit failures (the best option would probably be settingOnFailure=
, but that has it's own set of problems and race conditions).I've decided for
Restart=always
because the only downside I am aware of is the repeated log. But in practice, clean exits should never happen under manager and if they do, it's technically an error.Just beware that is there's ever any bug in the generated configuration, the kresd instance will be restarted over and over again (until it hits the limit, I believe).
Doesn't
kresd
terminate with a non-zero exit code when there is an error with configuration? I haven't tested it, but I would expect it. And if it's true, then there is no difference betweenon-failure
andalways
in this case. We would have to useon-abnormal
, which as @vcunat discovered is surprising even to knowledgeable people, because it leads to unexpected crashes of manager.I hope this clears it up a bit. I am quite convinced that
Restart=always
is the choice making manager most robust for the price of repeated logs, but I think that's just a minor acceptable issue.Doesn't
kresd
terminate with a non-zero exit code when there is an error with configuration?Yes, it should. Consequently I don't expect that the difference between
on-abnormal
andalways
would be "interesting" to us. Maybe the only case was me callingquit()
by accident.Edited by Vladimír Čunát
mentioned in commit a2c339a5
mentioned in issue #711 (closed)
mentioned in commit 7803cc5b
mentioned in commit 9e185e8a