Knot Resolver behavior when secondary DNS is in trouble
There was some issue with "secondary" DNS server for hostname ctn.cvut.cz. I reported the issue and monitored the issue for a day. It was already fixed few hours ago.
I collected data for a day (queried DNS servers once a minute) and I see interesting result. My local "unbound" installed at Debian box always returned "A" record, no failure, like it can handle situation that secondary DNS is in trouble. The same with GoogleDNS and OpenDNS. I had no issue with Quad9 DNS but I think I was just a lucky in this case because when I tried their web tool, I sow that some DNS servers had the issue to resolve ctn.cvut.cz but that was rare.
My point here is that ODVR@nic.cz DNS servers were "the worst" in my test, they failed most frequently. I assume those servers run Knot DNS and those do not know how to handle the situation when secondary DNS server is "broken" (they just return that failure to the client).
Statistic of failures, about 1100 samples for each DNS server; failure was counted when A record was missing:
1110 ctn.cvut.cz 147.32.1.9 # secondary server nss.cvut.cz, source of the issue
59 ctn.cvut.cz 193.17.47.1 # ovdr.nic.cz
47 ctn.cvut.cz 185.43.135.1 # ovdr.nic.cz
40 ctn.cvut.cz 192.168.222.1 # my gateway, round robin between Google, CloudFlare, OVDR
9 ctn.cvut.cz 1.1.1.1 # Cloudflare
0 ctn.cvut.cz 8.8.8.8 # Google
0 ctn.cvut.cz 9.9.9.9 # Quad9
0 ctn.cvut.cz 195.46.39.39 # SafeDNS
0 ctn.cvut.cz 208.67.222.222 # OpenDNS
0 ctn.cvut.cz 147.32.1.20 # primary server ns.cvut.cz, was running without issue
0 ctn.cvut.cz 192.168.222.11 # unbound @ Debian server
0 www.cvut.cz 192.168.222.1 # my gateway, round robin between Google, CloudFlare, OVDR
0 www.cvut.cz 1.1.1.1 # Cloudflare
0 www.cvut.cz 9.9.9.9 # Quad9
This is how "failure was visible" on secondary DNS:
$ khost ctn.cvut.cz 147.32.1.9
Host ctn.cvut.cz. has no A record
Host ctn.cvut.cz. has no AAAA record
Host ctn.cvut.cz. has no MX record
Primary server was OK:
$ khost ctn.cvut.cz 147.32.1.20
ctn.cvut.cz. has IPv4 address 147.32.235.120
Host ctn.cvut.cz. has no AAAA record
ctn.cvut.cz. mail is handled by 0 mailgw3.cvut.cz.
ctn.cvut.cz. mail is handled by 0 mailgw4.cvut.cz.
This is a successful test on CloudFlareDNS:
$ khost ctn.cvut.cz 1.1.1.1
ctn.cvut.cz. has IPv4 address 147.32.235.120
Host ctn.cvut.cz. has no AAAA record
ctn.cvut.cz. mail is handled by 0 mailgw4.cvut.cz.
ctn.cvut.cz. mail is handled by 0 mailgw3.cvut.cz.
And failure:
$ khost ctn.cvut.cz 1.1.1.1
Host ctn.cvut.cz. type A error: NXDOMAIN
Host ctn.cvut.cz. type AAAA error: NXDOMAIN
Host ctn.cvut.cz. type MX error: NXDOMAIN
I read log file and I see some other difference. I used host
utility (from Ubuntu repository) in my script. Replies from Cloudflare were consistent, in most cases like this:
=== 2023-07-20 16:15:17 CEST ctn.cvut.cz 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
When it failed, it was like this (A record is missing):
=== 2023-07-21 05:12:18 CEST ctn.cvut.cz 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.
And I see a case where MX records are missing:
=== 2023-07-21 08:10:02 CEST ctn.cvut.cz 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
Replies from OVDR are not so consistent, it is a mix of these replies. "Perfect answer":
=== 2023-07-20 14:06:46 CEST ctn.cvut.cz 193.17.47.1
Using domain server:
Name: 193.17.47.1
Address: 193.17.47.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.
=== 2023-07-20 14:06:46 CEST ctn.cvut.cz 193.17.47.1
Using domain server:
Name: 193.17.47.1
Address: 193.17.47.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.
MX records are missing in reply from 193.17.47.1 (no impact to failure counters):
=== 2023-07-20 14:07:56 CEST ctn.cvut.cz 193.17.47.1
Using domain server:
Name: 193.17.47.1
Address: 193.17.47.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
=== 2023-07-20 14:07:56 CEST ctn.cvut.cz 185.43.135.1
Using domain server:
Name: 185.43.135.1
Address: 185.43.135.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
MX records are missing in both answers; reply from 193.17.47.1 was counted as a failure (no A record):
=== 2023-07-21 07:51:47 CEST ctn.cvut.cz 193.17.47.1
Using domain server:
Name: 193.17.47.1
Address: 193.17.47.1#53
Aliases:
=== 2023-07-21 07:51:47 CEST ctn.cvut.cz 185.43.135.1
Using domain server:
Name: 185.43.135.1
Address: 185.43.135.1#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
Only one example of reply from unbound
because those were consistent, like this one, A record and two MX records:
=== 2023-07-21 12:24:18 CEST ctn.cvut.cz 192.168.222.11
Using domain server:
Name: 192.168.222.11
Address: 192.168.222.11#53
Aliases:
ctn.cvut.cz has address 147.32.235.120
ctn.cvut.cz mail is handled by 0 mailgw3.cvut.cz.
ctn.cvut.cz mail is handled by 0 mailgw4.cvut.cz.