assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed
I use kresd on my home router, and last night it stopped processing queries (seemingly without crashing outright). The log was full of (thousands of repetitions):
....
Apr 30 06:39:36 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
Apr 30 06:39:36 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
Apr 30 06:39:36 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
Apr 30 06:39:36 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
Apr 30 06:39:37 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
Apr 30 06:39:37 router kresd[367]: [system] assertion "session_flags(session)->outgoing && !session_flags(session)->closing" failed in tcp_task_waiting_connection@../daemon/worker.c:1447
....
During kresd restart, it crashed on shutdown:
Apr 30 06:39:40 router systemd-coredump[3711]: [LNK] Process 367 (kresd) of user 972 dumped core.
Module linux-vdso.so.1 with build-id c84a1af85cfb395c374cd5f645723e53f7f8d62b
Module p11-kit-trust.so with build-id 84da804340e6a810123f87b2b4a9c4bd4d0e8cf0
Module stats.so with build-id 460aaa6ef03adef5a85ed1f2bd000be5363ac1a8
Module hints.so with build-id 90c38cd4a6b5f6a17b1bf77dbbc81683631187ed
Module extended_error.so with build-id c9984a96311c272732feff7a5528d957c8d8b4b1
Module refuse_nord.so with build-id e66c72ba75b14470f13110045595ab1ad3fb533b
Module edns_keepalive.so with build-id 3f3662709a965611174c37e672b7bce66ac98658
Module libstdc++.so.6 with build-id 0efbe365b709015ea481a66fb0f5ad650e617599
Module libgpg-error.so.0 with build-id 1e65d609a859c3c4ba69fe248838202cf00c8bbb
Module libbrotlicommon.so.1 with build-id 3dc157d6417d3602b6d774ae07508e4bbfa8920c
Module libffi.so.8 with build-id 5103e7b5b7addb8026a35a62734fefd1c7ef5c64
Module libelf.so.1 with build-id 7047fb71440373a1456396c581692cda24627825
Module libgcrypt.so.20 with build-id b10fee43a15f81876aeadec4e734decfc4214e4e
Module libcap.so.2 with build-id ba39fbcf17238edd9188c42c664778b3da8d8975
Module liblz4.so.1 with build-id 6d85cb32490fa810dbc0b9cbb0043fc52e6ddba0
Module libzstd.so.1 with build-id df4d0e928163f0b5e1c7c5f78ddb055cbe22b639
Module liblzma.so.5 with build-id d34507011f065d2da4c4cc360615b2cd3ce3d4b2
Module libgmp.so.10 with build-id ede351880698ee91c5e8d457bf078a8887ecc97a
Module libhogweed.so.6 with build-id 2b084732112218e0af7d9b77153758b092cfa54f
Module libnettle.so.8 with build-id c376ee33b84aebefdf23b0dee1f22c8e79f1fd0e
Module libtasn1.so.6 with build-id e64114db392bb17238bd5cb22dfd12e308db52b0
Module libunistring.so.2 with build-id 457d1352b4d0b8d2eaad4b0c9ccea31446a11395
Module libidn2.so.0 with build-id be16fc6cb7814edc928c646a2f11ddfcc0ec1822
Module libbrotlidec.so.1 with build-id a634700f82bb52f4fa5e4a9495b39b890a1b26e6
Module libbrotlienc.so.1 with build-id b20212ed7f9630b545fb132a93579aef1967f308
Module libp11-kit.so.0 with build-id 5c3eefdf311483790b33a8f76dc45a87f6769ecf
Module libz.so.1 with build-id 961b20a79348f990621bd0a145f15c51219eef5d
Module libpthread.so.0 with build-id 2d7e5623023dc082483554f4447388c3a48a244b
Module libdl.so.2 with build-id 3d5771318379b07f0a5dda7613f76422aa7f6022
Module libbpf.so.0 with build-id 6313987843e278092e5f9375e0215c552337c896
Module libm.so.6 with build-id be9757a4dc0f0a727982d77fca226e6e852aa3bb
Module liblmdb.so with build-id ac3b357165ae5eb6c17cdc9de3adf3c5b9f5b3e6
Module libc.so.6 with build-id 2858f54ba7c8eae476c62b8631c4feded56e9064
Module libgcc_s.so.1 with build-id 43de5fed20f08220e018b86c70e0e46e00a46de2
Module libnghttp2.so.14 with build-id dd24ff864cabdc1181dd940f264451de6dd04ece
Module libcap-ng.so.0 with build-id d02eff3ece50ff505401a5ff91046d6cbf499dcb
Module libsystemd.so.0 with build-id 63e76b23478874cf91e5d81741285d93ccbf27cb
Module libgnutls.so.30 with build-id 2fc3c60ebe9b399e5ac84e4496bb75f35443f89e
Module libluajit-5.1.so.2 with build-id c7b4394fcbb3e55dd9dde4164c59020aa962ab33
Module libuv.so.1 with build-id 5786228ca54387aeb7ebb38960f8a75305ee5223
Module libdnssec.so.8 with build-id 4fc7ee9ab8130753ba22e7179cb74357678f8651
Module libzscanner.so.4 with build-id 2a53ca5ee610b0674aeabce12f38a2187703a5d1
Module libknot.so.12 with build-id bba838634737e8b916f4f2067f61f8f29a13c49c
Module libkres.so.9 with build-id 563367b5523d95acc1a70849a58e0a16cb923a3e
Module kresd with build-id 4c099ec64de5aeeb7ef45a5024654b7b042756f4
Stack trace of thread 367:
#0 0x0000ffff9185ac38 n/a (libkres.so.9 + 0x1ac38)
#1 0x0000ffff9185ac40 n/a (libkres.so.9 + 0x1ac40)
#2 0x0000ffff9185b1bc map_clear (libkres.so.9 + 0x1b1bc)
#3 0x0000aaaac39ab2dc n/a (kresd + 0x2b2dc)
#4 0x0000aaaac398aa14 n/a (kresd + 0xaa14)
#5 0x0000ffff9108b8fc __libc_start_call_main (libc.so.6 + 0x2b8fc)
#6 0x0000ffff9108b9d4 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2b9d4)
#7 0x0000aaaac398c4f0 _start (kresd + 0xc4f0)
ELF object binary architecture: AARCH64
(I don't have debug symbols, sorry)
# kresd -V status=1
Knot Resolver, version 5.5.0
[2022-03-29T19:29:11+0200] [ALPM] upgraded knot-resolver (5.4.4-1 -> 5.5.0-1)
This seems like an exceedingly rare event (one crash in a month since update)
My configuration:
net.listen({'127.0.0.1', '192.168.1.1', '10.11.7.1', '10.11.4.1'})
net.listen({'[...redactecd public ipv6 address...]'})
net.outgoing_v6('[...redactecd public ipv6 address...]')
modules = {
'hints > iterate', -- Load /etc/hosts and allow custom root hints
'stats', -- Track internal statistics
'predict', -- Prefetch expiring/frequent records
}
cache.size = cache.fssize() - 10 * MB
log_level('warning')
--log_level('debug')
home_names = policy.todnames({'[... my public tld that I serve via a local secondary NS in case my internet crashes ...].'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE', 'NO_EDNS'}), home_names))
policy.add(policy.suffix(policy.STUB('127.0.0.2'), home_names))
policy.add(policy.slice(
policy.slice_randomize_psl(),
policy.TLS_FORWARD({
{'1.1.1.1', hostname='1dot1dot1dot1.cloudflare-dns.com'},
{'1.0.0.1', hostname='1dot1dot1dot1.cloudflare-dns.com'},
{'2606:4700:4700::1111', hostname='1dot1dot1dot1.cloudflare-dns.com'},
{'2606:4700:4700::1001', hostname='1dot1dot1dot1.cloudflare-dns.com'},
}),
policy.TLS_FORWARD({
{'8.8.8.8', hostname='dns.google'},
{'8.8.4.4', hostname='dns.google'},
{'2001:4860:4860::8888', hostname='dns.google'},
{'2001:4860:4860::8844', hostname='dns.google'},
}),
policy.TLS_FORWARD({
{'9.9.9.9', hostname='dns9.quad9.net'},
{'149.112.112.112', hostname='dns9.quad9.net'},
{'2620:fe::fe', hostname='dns9.quad9.net'},
{'2620:fe::9', hostname='dns9.quad9.net'},
})
))
About a week before the crash I added TLS_FORWARD policies. Before that I used recursive resolution in kresd. So it might be some issue with the forwarding of queries to those upstream resolvers.
Resolution was not failing completely. I noticed the crash, because HSTS using web pages were returning TLS certificate errors due to addresses being resolved to the IP address of my home router (which is also running something on HTTPS port). That was quite weird.