Skip to content
Snippets Groups Projects

manager: recovery from 'policy-loader' failure during reload

Merged Aleš Mrázek requested to merge manager-instability-handling into master
All threads resolved!

The policy-loader process is no longer monitored by the _watchdog method. Whether or not the policy-loader has failed is checked directly in the load_policy_rules verifier method when attempting to load the rules. The verifier will fail if the policy-loader process exits with a return code other than 0. In the event of a failure, the entire resolver will revert to its previous working configuration.

Edited by Aleš Mrázek

Merge request reports

Pipeline #127642 waiting for manual action

Pipeline: Knot Resolver

#127643

    Pipeline waiting for manual action for b2682be4 on manager-instability-handling

    Approved by

    Merged by Aleš MrázekAleš Mrázek 9 months ago (Jul 11, 2024 10:50am UTC)

    Merge details

    • Changes merged into master with 545fbad2.
    • Deleted the source branch.

    Pipeline #127648 failed

    Pipeline failed for 545fbad2 on master

    Deployed to docs-deve‎lop/master‎ 9 months ago

    Activity

    Filter activity
    • Approvals
    • Assignees & reviewers
    • Comments (from bots)
    • Comments (from users)
    • Commits & branches
    • Edits
    • Labels
    • Lock status
    • Mentions
    • Merge request status
    • Tracking
  • Aleš Mrázek added 1 commit

    added 1 commit

    • 57971740 - fixup! manager: 'policy-loader' removed from watched subprocesses

    Compare with previous version

  • Aleš Mrázek marked this merge request as draft from 57971740

    marked this merge request as draft from 57971740

  • added 2 commits

    • 5da4d0e9 - manager: 'policy-loader' removed from watched subprocesses
    • b2682be4 - manager: config_store: renew with old config

    Compare with previous version

  • Vladimír Čunát marked this merge request as ready

    marked this merge request as ready

  • Vladimír Čunát resolved all threads

    resolved all threads

  • Code diff looks good to me. Also manual testing reloads of various succeed/fail sequences.


    The only thing of note is that if we're in a broken state, e.g. a RPZ contents got broken and thus we wouldn't even be able to restart the service, reload attempts still work as expected, allowing to recover cleanly (great!) – and every five seconds we get additional log:

    $TIMESTAMP manager[$PID]: [INFO] knot_resolver_manager.kres_manager: Subprocess 'policy-loader' is skipped by WatchDog because its status is monitored in a different way.

    But something persistently bugging the logs in this state actually doesn't seem like a bad thing, and INFO isn't even visible by default.

  • Vladimír Čunát approved this merge request

    approved this merge request

  • Oto Šťáva approved this merge request

    approved this merge request

  • Aleš Mrázek mentioned in commit 545fbad2

    mentioned in commit 545fbad2

  • Please register or sign in to reply
    Loading