manager: processes watchdog error during shutdown
The error is caught in processes_watchdog and the watchdog is waiting to be cancelled. If the watchdog is no cancelled, an error is raised.
Error
2025-01-28 09:56:47,532 supervisor[1]: [WARN] received SIGINT indicating exit request
2025-01-28 09:56:47,532 supervisor[1]: [INFO] waiting for cache-gc, kresd0, kresd1, manager to die
2025-01-28 09:56:47,535 manager[8]: [ERROR] knot_resolver.manager.manager: Knot Resolver watchdog failed with an unexpected exception.
2025-01-28 09:56:47,535 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 164, in _list_running_subprocesses
2025-01-28 09:56:47,535 manager[8]: processes: Any = supervisord.getAllProcessInfo()
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 1122, in __call__
2025-01-28 09:56:47,535 manager[8]: return self.__send(self.__name, args)
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 1464, in __request
2025-01-28 09:56:47,535 manager[8]: response = self.__transport.request(
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/supervisor/xmlrpc.py", line 551, in request
2025-01-28 09:56:47,535 manager[8]: return u.close()
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 668, in close
2025-01-28 09:56:47,535 manager[8]: raise Fault(**self._stack[0])
2025-01-28 09:56:47,535 manager[8]: xmlrpc.client.Fault: <Fault 6: 'SHUTDOWN_STATE'>
2025-01-28 09:56:47,535 manager[8]:
2025-01-28 09:56:47,535 manager[8]: The above exception was the direct cause of the following exception:
2025-01-28 09:56:47,535 manager[8]:
2025-01-28 09:56:47,535 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 402, in _watchdog
2025-01-28 09:56:47,535 manager[8]: detected_subprocesses = await self._controller.get_subprocess_status()
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/compat/asyncio.py", line 30, in wrapper
2025-01-28 09:56:47,535 manager[8]: return await to_thread(func, *args, **kwargs)
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/compat/asyncio.py", line 20, in to_thread
2025-01-28 09:56:47,535 manager[8]: return await asyncio.to_thread(func, *args, **kwargs) # type: ignore[attr-defined]
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
2025-01-28 09:56:47,535 manager[8]: return await loop.run_in_executor(None, func_call)
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
2025-01-28 09:56:47,535 manager[8]: result = self.fn(*self.args, **self.kwargs)
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 287, in get_subprocess_status
2025-01-28 09:56:47,535 manager[8]: return cast(Dict[KresID, SubprocessStatus], _list_running_subprocesses(self._controller_config))
2025-01-28 09:56:47,535 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,535 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 166, in _list_running_subprocesses
2025-01-28 09:56:47,535 manager[8]: raise SubprocessControllerError(f"failed to get info from all running processes: {e}") from e
2025-01-28 09:56:47,535 manager[8]: knot_resolver.controller.exceptions.SubprocessControllerError: failed to get info from all running processes: <Fault 6: 'SHUTDOWN_STATE'>
2025-01-28 09:56:47,536 supervisor[1]: [WARN] ignoring ready notification from unregistered PID=8
2025-01-28 09:56:47,536 manager[8]: [WARNING] knot_resolver.manager.manager: Instability detected. Dropping known list of workers and reloading it from the system.
2025-01-28 09:56:47,536 manager[8]: [INFO] knot_resolver.manager.server: Received SIGINT, triggering graceful shutdown
2025-01-28 09:56:47,537 manager[8]: [INFO] knot_resolver.manager.server: Stopping API service...
2025-01-28 09:56:47,537 manager[8]: [INFO] knot_resolver.manager.server: Stopping kresd manager...
2025-01-28 09:56:47,538 manager[8]: [ERROR] knot_resolver.manager.manager: Failed attempting to fix an error. Forcefully shutting down.
2025-01-28 09:56:47,538 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 385, in _instability_handler
2025-01-28 09:56:47,538 manager[8]: await self._reload_system_state()
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 271, in _reload_system_state
2025-01-28 09:56:47,538 manager[8]: await self._collect_already_running_workers()
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 180, in _collect_already_running_workers
2025-01-28 09:56:47,538 manager[8]: for subp in await self._controller.get_all_running_instances():
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 256, in get_all_running_instances
2025-01-28 09:56:47,538 manager[8]: if await _is_supervisord_running(self._controller_config):
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 113, in _is_supervisord_running
2025-01-28 09:56:47,538 manager[8]: pid = await _get_supervisord_pid(config)
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 100, in _get_supervisord_pid
2025-01-28 09:56:47,538 manager[8]: return int(await readfile(supervisord_pid_file(config)))
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/async_utils.py", line 68, in readfile
2025-01-28 09:56:47,538 manager[8]: return await to_thread(readfile_sync, path)
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/compat/asyncio.py", line 20, in to_thread
2025-01-28 09:56:47,538 manager[8]: return await asyncio.to_thread(func, *args, **kwargs) # type: ignore[attr-defined]
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
2025-01-28 09:56:47,538 manager[8]: return await loop.run_in_executor(None, func_call)
2025-01-28 09:56:47,538 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,538 manager[8]: asyncio.exceptions.CancelledError
2025-01-28 09:56:47,539 manager[8]: [WARNING] knot_resolver.manager.manager: Collecting all remaining workers...
2025-01-28 09:56:47,541 manager[8]: [ERROR] knot_resolver.manager.manager: Watchdog failed while invoking instability callback
2025-01-28 09:56:47,541 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 385, in _instability_handler
2025-01-28 09:56:47,541 manager[8]: await self._reload_system_state()
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 271, in _reload_system_state
2025-01-28 09:56:47,541 manager[8]: await self._collect_already_running_workers()
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 180, in _collect_already_running_workers
2025-01-28 09:56:47,541 manager[8]: for subp in await self._controller.get_all_running_instances():
2025-01-28 09:56:47,541 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 256, in get_all_running_instances
2025-01-28 09:56:47,541 manager[8]: if await _is_supervisord_running(self._controller_config):
2025-01-28 09:56:47,541 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 113, in _is_supervisord_running
2025-01-28 09:56:47,541 manager[8]: pid = await _get_supervisord_pid(config)
2025-01-28 09:56:47,541 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,541 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 100, in _get_supervisord_pid
2025-01-28 09:56:47,542 manager[8]: return int(await readfile(supervisord_pid_file(config)))
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/async_utils.py", line 68, in readfile
2025-01-28 09:56:47,542 manager[8]: return await to_thread(readfile_sync, path)
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/utils/compat/asyncio.py", line 20, in to_thread
2025-01-28 09:56:47,542 manager[8]: return await asyncio.to_thread(func, *args, **kwargs) # type: ignore[attr-defined]
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
2025-01-28 09:56:47,542 manager[8]: return await loop.run_in_executor(None, func_call)
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: asyncio.exceptions.CancelledError
2025-01-28 09:56:47,542 manager[8]:
2025-01-28 09:56:47,542 manager[8]: During handling of the above exception, another exception occurred:
2025-01-28 09:56:47,542 manager[8]:
2025-01-28 09:56:47,542 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 164, in _list_running_subprocesses
2025-01-28 09:56:47,542 manager[8]: processes: Any = supervisord.getAllProcessInfo()
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 1122, in __call__
2025-01-28 09:56:47,542 manager[8]: return self.__send(self.__name, args)
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 1464, in __request
2025-01-28 09:56:47,542 manager[8]: response = self.__transport.request(
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/supervisor/xmlrpc.py", line 551, in request
2025-01-28 09:56:47,542 manager[8]: return u.close()
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3.11/xmlrpc/client.py", line 668, in close
2025-01-28 09:56:47,542 manager[8]: raise Fault(**self._stack[0])
2025-01-28 09:56:47,542 manager[8]: xmlrpc.client.Fault: <Fault 6: 'SHUTDOWN_STATE'>
2025-01-28 09:56:47,542 manager[8]:
2025-01-28 09:56:47,542 manager[8]: The above exception was the direct cause of the following exception:
2025-01-28 09:56:47,542 manager[8]:
2025-01-28 09:56:47,542 manager[8]: Traceback (most recent call last):
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 448, in _watchdog
2025-01-28 09:56:47,542 manager[8]: await self._instability_handler()
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 391, in _instability_handler
2025-01-28 09:56:47,542 manager[8]: await self.forced_shutdown()
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 369, in forced_shutdown
2025-01-28 09:56:47,542 manager[8]: await self._reload_system_state()
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 271, in _reload_system_state
2025-01-28 09:56:47,542 manager[8]: await self._collect_already_running_workers()
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/manager/manager.py", line 180, in _collect_already_running_workers
2025-01-28 09:56:47,542 manager[8]: for subp in await self._controller.get_all_running_instances():
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 257, in get_all_running_instances
2025-01-28 09:56:47,542 manager[8]: states = _list_running_subprocesses(self._controller_config)
2025-01-28 09:56:47,542 manager[8]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-28 09:56:47,542 manager[8]: File "/usr/lib/python3/dist-packages/knot_resolver/controller/supervisord/__init__.py", line 166, in _list_running_subprocesses
2025-01-28 09:56:47,542 manager[8]: raise SubprocessControllerError(f"failed to get info from all running processes: {e}") from e
2025-01-28 09:56:47,542 manager[8]: knot_resolver.controller.exceptions.SubprocessControllerError: failed to get info from all running processes: <Fault 6: 'SHUTDOWN_STATE'>
2025-01-28 09:56:47,543 manager[8]: [ERROR] knot_resolver.manager.manager: Violently terminating!
2025-01-28 09:56:47,544 manager[8]: [INFO] knot_resolver.manager.server: The manager run for 470 seconds...
2025-01-28 09:56:47,607 supervisor[1]: [WARN] stopped: manager (exit status 1)
2025-01-28 09:56:47,612 supervisor[1]: [INFO] stopped: kresd0 (exit status 0)
2025-01-28 09:56:47,612 supervisor[1]: [INFO] stopped: kresd1 (exit status 0)
2025-01-28 09:56:47,613 supervisor[1]: [INFO] stopped: cache-gc (exit status 0)
A very similar error occurs when the manager is quickly terminated at startup when it has not yet been initialized. Now the error is caught and logged. The program exits with an error(1).
See commits.
Edited by Aleš Mrázek