manager: recovery from 'policy-loader' failure during reload
The policy-loader
process is no longer monitored by the _watchdog
method. Whether or not the policy-loader
has failed is checked directly in the load_policy_rules
verifier method when attempting to load the rules. The verifier will fail if the policy-loader
process exits with a return code other than 0
. In the event of a failure, the entire resolver will revert to its previous working configuration.
Merge request reports
Activity
added 1 commit
- 6b1c20fd - manager: config_store: renew with old config
added 16 commits
-
6b1c20fd...23f1c587 - 13 commits from branch
master
- b08ec833 - manager: run policy-loader with old config when instability detected
- f32bf725 - manager: 'policy-loader' removed from watched subprocesses
- fbb602cb - manager: config_store: renew with old config
Toggle commit list-
6b1c20fd...23f1c587 - 13 commits from branch
requested review from @vcunat
- Resolved by Vladimír Čunát
added 1 commit
- 57971740 - fixup! manager: 'policy-loader' removed from watched subprocesses
marked this merge request as draft from 57971740
Code diff looks good to me. Also manual testing reloads of various succeed/fail sequences.
The only thing of note is that if we're in a broken state, e.g. a RPZ contents got broken and thus we wouldn't even be able to restart the service, reload attempts still work as expected, allowing to recover cleanly (great!) – and every five seconds we get additional log:
$TIMESTAMP manager[$PID]: [INFO] knot_resolver_manager.kres_manager: Subprocess 'policy-loader' is skipped by WatchDog because its status is monitored in a different way.
But something persistently bugging the logs in this state actually doesn't seem like a bad thing, and INFO isn't even visible by default.
mentioned in commit 545fbad2