server selection: consider switching to TCP instead of backing off the timeouts to high values
The following discussion from !1030 (merged) should be addressed:
-
@sbalazik started a discussion: (+1 comment) config.hints
test is timing out sometimes on this branch and so far, I have no idea why.22/36 knot-resolver:postinstall+config+skip_asan / config.hints TIMEOUT 120.05 s --- command --- KRESD_NO_LISTEN='1' PATH='/builds/knot/knot-resolver/.local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' TEST_FILE='/builds/knot/knot-resolver/modules/hints/tests/hints.test.lua' SOURCE_PATH='/builds/knot/knot-resolver/tests/config' /builds/knot/knot-resolver/tests/config/../../scripts/test-config.sh -c /builds/knot/knot-resolver/build_ci/../tests/config/test.cfg -n --- stdout --- /builds/knot/knot-resolver/.local/sbin/kresd processing test file /builds/knot/knot-resolver/modules/hints/tests/hints.test.lua ok 1 - has IP address for a.root-servers.net. ok 2 - load root hints from file ok 3 - can retrieve root hints ok 4 - real IP address for a.root-servers.net. is replaced ok 5 - real IP address for a.root-servers.net. is correct [65536.00][rplan] [qry tree] badname.lan. A (0) <- [65536.00][rplan] [push] pending 1; badname.lan. A (0) | resolved 0 [65536.03][rplan] [qry tree] . DNSKEY (3) <- badname.lan. A (2) <- [65536.03][rplan] [push] pending 2; . DNSKEY (3); badname.lan. A (2) | resolved 0
This is because the iter_ns_badip.rpl
workaround allows the pushing of the same query to rplan
twice in the row which leads to multiple tries with back-off of the timeout to resolve . DNSKEY
or a.root-servers.net AAAA
(if DNSSEC is turned off). The old selection implementation switches to TCP after a few tries and there the connection fails and the NS address is flagged as 'bad'
.
Switching to TCP instead of backing off into big timeouts might be a good idea which might even help with the pathological cases that appear in respdiff
now.