Skip to content
Snippets Groups Projects

manager: systemd: improved handling of subprocess crashes

Merged Vaclav Sraier requested to merge manager-systemd-improvements into manager
1 unresolved thread

closes #711 (closed)

Should be merged after !1249 (merged) is finalized, because the manager is unusable without changes made there.

Edited by Vaclav Sraier

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added manager label

  • assigned to @vsraier

  • Vaclav Sraier deleted the manager-tooling-fixes branch. This merge request now targets the manager branch

    deleted the manager-tooling-fixes branch. This merge request now targets the manager branch

  • Vaclav Sraier added 9 commits

    added 9 commits

    • aba661cd...e70ca7ac - 8 commits from branch manager
    • e5857cf7 - manager: systemd: better error messages and restart subprocesses always

    Compare with previous version

  • Vaclav Sraier marked this merge request as ready

    marked this merge request as ready

    • Author Maintainer

      @tkrizek @vcunat Following our discussion in #711 (closed), I've replaced Restart=on-abnormal with Restart=always. The reasoning behind it is in the code, but shortly - the manager simply assumes that kresd is running at all times. Setting restart always makes sure it stays true. With this patch, manager shouldn't crash unless kresd crashes repeatedly.

      Do you agree with this? If yes, please approve this MR and I will merge it.

    • From a packaging/systemd perspective, I wouldn't recommend using Restart=always for reasons mentioned in #711 (closed).

      But if you're absolutely sure that's the right thing to do when using kresd under the manager component, feel free to ahead. I don't really know the internals of manager.

      Just beware that is there's ever any bug in the generated configuration, the kresd instance will be restarted over and over again (until it hits the limit, I believe). Then the service log will be polluted with multiple crashes of the same kind in rapid succession.

    • Author Maintainer

      Manager assumes the invariant, that the kresd instances never stop unless requested. If there is ever an event when any instance stops, manager unpredictably explodes at unspecified time in the future. I am sadly not aware of any simple and reliable way how to detect unit failures (the best option would probably be setting OnFailure=, but that has it's own set of problems and race conditions).

      I've decided for Restart=always because the only downside I am aware of is the repeated log. But in practice, clean exits should never happen under manager and if they do, it's technically an error.

      Just beware that is there's ever any bug in the generated configuration, the kresd instance will be restarted over and over again (until it hits the limit, I believe).

      Doesn't kresd terminate with a non-zero exit code when there is an error with configuration? I haven't tested it, but I would expect it. And if it's true, then there is no difference between on-failure and always in this case. We would have to use on-abnormal, which as @vcunat discovered is surprising even to knowledgeable people, because it leads to unexpected crashes of manager.

      I hope this clears it up a bit. I am quite convinced that Restart=always is the choice making manager most robust for the price of repeated logs, but I think that's just a minor acceptable issue.

    • Doesn't kresd terminate with a non-zero exit code when there is an error with configuration?

      Yes, it should. Consequently I don't expect that the difference between on-abnormal and always would be "interesting" to us. Maybe the only case was me calling quit() by accident.

      Edited by Vladimír Čunát
    • Please register or sign in to reply
  • Vladimír Čunát approved this merge request

    approved this merge request

  • I read it before rebase, and looked OK.

  • merged

  • Vaclav Sraier mentioned in commit a2c339a5

    mentioned in commit a2c339a5

  • mentioned in issue #711 (closed)

  • Vaclav Sraier mentioned in commit 7803cc5b

    mentioned in commit 7803cc5b

  • Vaclav Sraier mentioned in commit 9e185e8a

    mentioned in commit 9e185e8a

Please register or sign in to reply
Loading