Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2023-07-04T12:35:27+02:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/769failure to start the manager2023-07-04T12:35:27+02:00Vaclav Sraierfailure to start the managerHappens just about once in a while in our CI, nothing regular. Don't know how to reproduce. Rerunning the job always fixes the issue.
```
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 428ms:INFO:knot_resolver_man...Happens just about once in a while in our CI, nothing regular. Don't know how to reproduce. Rerunning the job always fixes the issue.
```
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 428ms:INFO:knot_resolver_manager.server:Loading initial configuration from /etc/knot-resolver/config.yml
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 437ms:INFO:knot_resolver_manager.server:Validating initial configuration...
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 439ms:WARNING:knot_resolver_manager.log:Changing logging level to 'INFO'
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Starting service manager auto-selection...
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Available subprocess controllers are ('supervisord',)
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Selected controller 'supervisord'
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 441ms:INFO:knot_resolver_manager.kresd_controller.supervisord:Supervisord is already running, we will just update its config...
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: knot-resolver.service: Main process exited, code=exited, status=1/FAILURE
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: knot-resolver.service: Failed with result 'exit-code'.
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: Failed to start Knot Resolver Manager.
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/747Expired gpg key in OBS2022-09-03T18:37:20+02:00Vladimír Čunátvladimir.cunat@nic.czExpired gpg key in OBS.deb users of our [upstream repo](https://www.knot-resolver.cz/download/) can't update anymore (Debian, Ubuntu).
Message examples:
```
# apt update
[...]
W: GPG error: http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolve....deb users of our [upstream repo](https://www.knot-resolver.cz/download/) can't update anymore (Debian, Ubuntu).
Message examples:
```
# apt update
[...]
W: GPG error: http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolver-latest/Debian_11 InRelease: The following signatures were invalid: EXPKEYSIG 74062DB36A1F4009 home:CZ-NIC OBS Project <home:CZ-NIC@build.opensuse.org>
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
```
The key:
```
pub rsa2048 2018-02-15 [SC] [expired: 2022-06-21]
45737F9C8BC3F3ED2791818274062DB36A1F4009
uid [ expired] home:CZ-NIC OBS Project <home:CZ-NIC@build.opensuse.org>
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/746daemon/http: returning status 400 to handshake with dnscrypt-proxy2022-06-23T09:39:55+02:00Oto Šťávadaemon/http: returning status 400 to handshake with dnscrypt-proxyWhen [`dnscrypt-proxy`](https://github.com/DNSCrypt/dnscrypt-proxy) attempts a handshake with `kresd`, status code 400 is returned.
On Gitter, user `jlongua` reported getting this log message:
```
Jun 16 13:41:55 draco.plan9-ns2.com dn...When [`dnscrypt-proxy`](https://github.com/DNSCrypt/dnscrypt-proxy) attempts a handshake with `kresd`, status code 400 is returned.
On Gitter, user `jlongua` reported getting this log message:
```
Jun 16 13:41:55 draco.plan9-ns2.com dnscrypt-proxy[5775]: [2022-06-16 13:41:55] [ERROR] Webserver returned code 400
```
When I try it locally with a simple Docker image of dnscrypt-proxy, I get this:
```
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] dnscrypt-proxy 2.1.1
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Network connectivity detected
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Now listening to 0.0.0.0:53 [UDP]
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Now listening to 0.0.0.0:53 [TCP]
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Source [relays] loaded
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Source [public-resolvers] loaded
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Firefox workaround initialized
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [ERROR] 400 Bad Request
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] dnscrypt-proxy is waiting for at least one server to be reachable
```Oto ŠťávaOto Šťávahttps://gitlab.nic.cz/knot/knot-resolver/-/issues/727DNS64: PTR synthesis yields SERVFAIL for some cache contents2022-03-21T11:03:34+01:00Ondřej CaletkaDNS64: PTR synthesis yields SERVFAIL for some cache contentsSummary
-------
When cache is cold, PTR synthesis of DNS64 module works well. When cache gets populated by quering without DNS64 synthesis on, PTR synthesis stops working and SERVFAIL is returned instead.
Steps to reproduce
-----------...Summary
-------
When cache is cold, PTR synthesis of DNS64 module works well. When cache gets populated by quering without DNS64 synthesis on, PTR synthesis stops working and SERVFAIL is returned instead.
Steps to reproduce
------------------
```
# cat /etc/knot-resolver/kresd.conf
-- SPDX-License-Identifier: CC0-1.0
-- vim:syntax=lua:set ts=4 sw=4:
-- Refer to manual: https://knot-resolver.readthedocs.org/en/stable/
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
--net.listen('127.0.0.1', 443, { kind = 'doh2' })
net.listen('::1', 53, { kind = 'dns', freebind = true })
net.listen('::1', 853, { kind = 'tls', freebind = true })
--net.listen('::1', 443, { kind = 'doh2' })
-- Load useful modules
modules = {
'hints > iterate', -- Allow loading /etc/hosts or custom root hints
'stats', -- Track internal statistics
'predict', -- Prefetch expiring/frequent records
'dns64',
'view',
}
-- Disable DNS64 for IPv4
view:addr('0.0.0.0/0', policy.all(policy.FLAGS('DNS64_DISABLE')))
-- Cache size
cache.size = 100 * MB
```
First query over IPv6 works as expected:
```
# kdig @::1 -x 64:ff9b::101:101 +noall +answer
1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. 60 IN CNAME 1.1.1.1.in-addr.arpa.
1.1.1.1.in-addr.arpa. 1265 IN PTR one.one.one.one.
```
Query over IPv4, where DNS64 is disabled, also works properly with `NXDOMAIN`:
```
# kdig @127.0.0.1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 41713
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
;; AUTHORITY SECTION:
ip6.arpa. 3600 IN SOA b.ip6-servers.arpa. nstld.iana.org. 2021111921 1800 900 604800 3600
```
After this query, PTR synthesis does not work anymore and yields `SERVFAIL`:
```
# kdig @::1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 25807
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
```
Clearing the cache restores correct behavior for a while.https://gitlab.nic.cz/knot/knot-resolver/-/issues/723manager: race condition with watchdog while starting workers2022-03-04T12:17:09+01:00Vaclav Sraiermanager: race condition with watchdog while starting workersHow to reproduce:
1. disable worker count limit
2. set worker count to 1000
3. run the manager
4. watch the world burn (like really, these steps will trash your system)
5. manager crashes (on my machine after starting 183 instances of k...How to reproduce:
1. disable worker count limit
2. set worker count to 1000
3. run the manager
4. watch the world burn (like really, these steps will trash your system)
5. manager crashes (on my machine after starting 183 instances of kresd)
I am guessing, that the same behavior could be reproduced by hammering manager with worker count change requests. But I haven't tested that.Vaclav SraierVaclav Sraierhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/722server selection: practical issues with some Microsoft domains2022-03-14T11:17:15+01:00Vladimír Čunátvladimir.cunat@nic.czserver selection: practical issues with some Microsoft domains With some Microsoft domains (outlook.com, office.com, office365.com) a small part of the nameservers is non-responsive, but kresd (sometimes) does not gracefully fall back to the other servers.
The same issue can surely happen with som... With some Microsoft domains (outlook.com, office.com, office365.com) a small part of the nameservers is non-responsive, but kresd (sometimes) does not gracefully fall back to the other servers.
The same issue can surely happen with someone else's names as well, but this set seems far the most commonly encountered in practice. It might be related to the NS server names being served by the same partially broken set.https://gitlab.nic.cz/knot/knot-resolver/-/issues/720Control sockets on relative paths fails2022-02-06T18:46:56+01:00Vaclav SraierControl sockets on relative paths failsWith this config:
```
local path = '/tmp/control/1'
local ok, err = pcall(net.listen, path, nil, { kind = 'control' })
if not ok then
log_warn(ffi.C.LOG_GRP_NETWORK, 'bind to '..path..' failed '..err)
end
```
everything works perfectl...With this config:
```
local path = '/tmp/control/1'
local ok, err = pcall(net.listen, path, nil, { kind = 'control' })
if not ok then
log_warn(ffi.C.LOG_GRP_NETWORK, 'bind to '..path..' failed '..err)
end
```
everything works perfectly.
This config though:
```
local path = './control/1'
local ok, err = pcall(net.listen, path, nil, { kind = 'control' })
if not ok then
log_warn(ffi.C.LOG_GRP_NETWORK, 'bind to '..path..' failed '..err)
end
```
Fails with this error message:
```
Feb 05 23:03:41 dingo kresd[169462]: [net ] bind to './control/1@53' (TCP): Invalid argument
Feb 05 23:03:41 dingo kresd[169462]: [net ] bind to ./control/1 failed error occurred here (config filename:lineno is at the bottom, if config is involved):
Feb 05 23:03:41 dingo kresd[169462]: stack traceback:
Feb 05 23:03:41 dingo kresd[169462]: [C]: at 0x556c94d0eae0
Feb 05 23:03:41 dingo kresd[169462]: [C]: in function 'pcall'
Feb 05 23:03:41 dingo kresd[169462]: kresd_1.conf:144: in main chunk
Feb 05 23:03:41 dingo kresd[169462]: ERROR: net.listen() failed to bind
```
It looks like the `kind` argument is completely ignored and defaults are assumed (UDP + TCP on port 53).
EDIT: Tested on `a2c339a57b8a6fb1c6bbaa83ed4bfdbe742a5fd0` (HEAD of `manager` branch)https://gitlab.nic.cz/knot/knot-resolver/-/issues/687serve_stale module doesn't provide stale answers when auths are unresponsive2022-03-09T11:16:31+01:00Tomas Krizekserve_stale module doesn't provide stale answers when auths are unresponsiveAs of version 5.4.2, `serve_stale` module doesn't work when auth servers are unresponsive (which is the typical case with network issues). The server selection algorithm tries very hard to resolve the request by re-trying different auth ...As of version 5.4.2, `serve_stale` module doesn't work when auth servers are unresponsive (which is the typical case with network issues). The server selection algorithm tries very hard to resolve the request by re-trying different auth servers and increasing their allowed timeouts, until the request ultimately times out and returns SERVFAIL instead of a stale answer.
If the auth servers are reachable but REFUSE to respond, the serve_stale module works as expected (that was our former test case with deckard).
Some notes about possible resolution:
- to be useful for clients, the stale answer should be provided quickly enough ([RFC 8767.5](https://datatracker.ietf.org/doc/html/rfc8767#section-5) suggests sending stale answer after 1.8s). The timeout used for serve_stale should ideally be configurable.
- the request resolution should keep going even after the stale answer is sent to the client to refresh data from slower auth severs (possible option: spawn a new duplicate internal request after providing the stale answer?)
- server selection should have a configurable time limit that is respected and allows serve_stale to activate in time
- the server selection time limit shouldn't be used unless serve_stale module is loaded _and_ there is a possible stale answer in the cachehttps://gitlab.nic.cz/knot/knot-resolver/-/issues/684ANSWER section not empty on SERVFAIL2021-11-04T10:58:48+01:00Tomas KrizekANSWER section not empty on SERVFAILIn some cases, the ANSWER section contains (unvalidated) data while the request ends with SERVFAIL.
In my specific conditions, the issue seems reproducible when:
- cache is clear
- IPv6 isn't available, but isn't turned off with net.ipv...In some cases, the ANSWER section contains (unvalidated) data while the request ends with SERVFAIL.
In my specific conditions, the issue seems reproducible when:
- cache is clear
- IPv6 isn't available, but isn't turned off with net.ipv6
- server selection chooses specific servers (and typically chooses the non-functioning IPv6 ones)
```
$ kdig @::1 -p 5553 +timeout=16 +edns signotincepted.bad-dnssec.wb.sidnlabs.nl
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 6998
;; Flags: qr rd ra; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; QUESTION SECTION:
;; signotincepted.bad-dnssec.wb.sidnlabs.nl. IN A
;; ANSWER SECTION:
signotincepted.bad-dnssec.wb.sidnlabs.nl. 3600 IN A 94.198.159.39
;; Received 85 B
;; Time 2021-11-04 10:45:32 CET
;; From ::1@5553(UDP) in 10027.7 ms
```
See attached [log.txt](/uploads/8d1aa54458e26860a5d0f4e36d105cad/log.txt)https://gitlab.nic.cz/knot/knot-resolver/-/issues/679DNSSEC failure on insecure subzone2021-10-23T10:08:30+02:00Tomas KrizekDNSSEC failure on insecure subzoneReported on [knot-resolver-users](https://lists.nic.cz/pipermail/knot-resolver-users/2021/000396.html) by Matthew Richardson
Attempting to resolve `213-133-203-34.newtel.in-addr.itconsult.net. PTR` ends up with a DNSSEC failure, even to...Reported on [knot-resolver-users](https://lists.nic.cz/pipermail/knot-resolver-users/2021/000396.html) by Matthew Richardson
Attempting to resolve `213-133-203-34.newtel.in-addr.itconsult.net. PTR` ends up with a DNSSEC failure, even tough the record itself is in an insecure subzone.
> The zone cut is between itconsult.net & newtel.in-addr.itconsult.net.
> Also whilst itconsult.net is DNSSEC signed, newtel.in-addr.itconsult.net is
> not. Thus, in-addr.itconsult.net is an empty non-terminal.
>
> If one asks for NS for newtel.in-addr.itconsult.net, thereafter resolution
> of the PTR then succeeds
```
[plan ][00000.00] plan '213-133-203-34.newtel.in-addr.itconsult.net.' type 'PTR' uid [51359.00]
[iterat][51359.00] '213-133-203-34.newtel.in-addr.itconsult.net.' type 'PTR' new uid was assigned .01, parent uid .00
[cache ][51359.01] => skipping exact RR: rank 027 (min. 030), new TTL 43131
[cache ][51359.01] => trying zone: itconsult.net., NSEC3, hash c75d4f37
[cache ][51359.01] => NSEC3 depth 3: hash uabfrhboj2pe1qnmfscd0adr77hqoirb
[cache ][51359.01] => NSEC3 encloser error for 213-133-203-34.newtel.in-addr.itconsult.net.: range search miss (!covers)
[cache ][51359.01] => NSEC3 depth 2: hash 7kdfmdhll7ee02vprj1oivl33lg5r7vu
[cache ][51359.01] => NSEC3 encloser error for newtel.in-addr.itconsult.net.: range search miss (!covers)
[cache ][51359.01] => NSEC3 depth 1: hash 4je672clu0jh2pbkm6mdj2n4ps7e9t2h
[cache ][51359.01] => NSEC3 encloser: only found existence of an ancestor
[cache ][51359.01] => skipping zone: itconsult.net., NSEC, hash 0;new TTL -123456789, ret -2
[zoncut][51359.01] found cut: itconsult.net. (rank 002 return codes: DS 0, DNSKEY 0)
[select][51359.01] => id: '47786' choosing: 'd.itconsult-dns.co.uk.'@'2001:67c:10b8::100#00053' with timeout 400 ms zone cut: 'itconsult.net.'
[resolv][51359.01] => id: '47786' querying: 'd.itconsult-dns.co.uk.'@'2001:67c:10b8::100#00053' zone cut: 'itconsult.net.' qname: 'iN-ADDR.iTConSult.neT.' qtype: 'NS' proto: 'udp'
[select][51359.01] NO6: timeouted, appended, timeouts 5/6
[select][51359.01] => id: '47786' noting selection error: 'd.itconsult-dns.co.uk.'@'2001:67c:10b8::100#00053' zone cut: 'itconsult.net.' error: 1 QUERY_TIMEOUT
[iterat][51359.01] '213-133-203-34.newtel.in-addr.itconsult.net.' type 'PTR' new uid was assigned .02, parent uid .00
[select][51359.02] => id: '56910' choosing: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' with timeout 38 ms zone cut: 'itconsult.net.'
[resolv][51359.02] => id: '56910' querying: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' zone cut: 'itconsult.net.' qname: 'in-aDdR.itCONsuLt.neT.' qtype: 'NS' proto: 'udp'
[select][51359.02] => id: '56910' updating: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' zone cut: 'itconsult.net.' with rtt 18 to srtt: 18 and variance: 4
[iterat][51359.02] <= rcode: NOERROR
[iterat][51359.02] <= retrying with non-minimized name
[iterat][51359.02] '213-133-203-34.newtel.in-addr.itconsult.net.' type 'PTR' new uid was assigned .03, parent uid .00
[select][51359.03] => id: '18773' choosing: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' with timeout 38 ms zone cut: 'itconsult.net.'
[resolv][51359.03] => id: '18773' querying: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' zone cut: 'itconsult.net.' qname: '213-133-203-34.nEWtEL.IN-AdDr.ITcONsuLt.NEt.' qtype: 'PTR' proto: 'udp'
[select][51359.03] => id: '18773' updating: 'd.itconsult-dns.co.uk.'@'176.97.158.100#00053' zone cut: 'itconsult.net.' with rtt 16 to srtt: 18 and variance: 4
[iterat][51359.03] <= rcode: NOERROR
[valdtr][51359.03] >< cut changed, needs revalidation
[resolv][51359.03] => resuming yielded answer
[valdtr][51359.03] >< no valid RRSIGs found: 213-133-203-34.newtel.in-addr.itconsult.net. PTR (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[plan ][51359.03] plan 'in-addr.itconsult.net.' type 'DS' uid [51359.04]
[iterat][51359.04] 'in-addr.itconsult.net.' type 'DS' new uid was assigned .05, parent uid .03
[cache ][51359.05] => trying zone: itconsult.net., NSEC3, hash c75d4f37
[cache ][51359.05] => NSEC3 depth 1: hash 4je672clu0jh2pbkm6mdj2n4ps7e9t2h
[cache ][51359.05] => NSEC3 sname: match proved NODATA, new TTL 43131
[iterat][51359.05] <= rcode: NOERROR
[valdtr][51359.05] <= parent: updating DS
[valdtr][51359.05] <= answer valid, OK
[resolv][51359.03] => resuming yielded answer
[valdtr][51359.03] >< no valid RRSIGs found: 213-133-203-34.newtel.in-addr.itconsult.net. PTR (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[plan ][51359.03] plan 'in-addr.itconsult.net.' type 'DS' uid [51359.06]
[iterat][51359.06] 'in-addr.itconsult.net.' type 'DS' new uid was assigned .07, parent uid .03
[cache ][51359.07] => trying zone: itconsult.net., NSEC3, hash c75d4f37
[cache ][51359.07] => NSEC3 depth 1: hash 4je672clu0jh2pbkm6mdj2n4ps7e9t2h
[cache ][51359.07] => NSEC3 sname: match proved NODATA, new TTL 43131
[iterat][51359.07] <= rcode: NOERROR
[valdtr][51359.07] <= parent: updating DS
[valdtr][51359.07] <= answer valid, OK
[resolv][51359.03] => resuming yielded answer
[valdtr][51359.03] >< no valid RRSIGs found: 213-133-203-34.newtel.in-addr.itconsult.net. PTR (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[valdtr][51359.03] <= continuous revalidation, fails
[cache ][51359.03] => not overwriting PTR 213-133-203-34.newtel.in-addr.itconsult.net.
[cache ][51359.03] => not overwriting PTR 213-133-203-34.newtel.in-addr.itconsult.net.
[dnssec] validation failure: 213-133-203-34.newtel.in-addr.itconsult.net. PTR
[resolv][51359.00] request failed, answering with empty SERVFAIL
[resolv][51359.03] finished in state: 8, queries: 2, mempool: 32800 B
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/673trust_anchors.set_insecure may miss some names2021-05-21T01:52:53+02:00Vladimír Čunátvladimir.cunat@nic.cztrust_anchors.set_insecure may miss some namesIf the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.If the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/671TLS_FORWARD can get stuck on broken addresses (v5.3.0)2021-03-24T16:09:15+01:00Vladimír Čunátvladimir.cunat@nic.czTLS_FORWARD can get stuck on broken addresses (v5.3.0)With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:48...With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:4860::8844', hostname='dns.google' },
})))
```
but part of addresses disabled, e.g.
```bash
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
```
some queries get stuck in a very long "loop" of attempting connection to the non-working IPs, even though half of them works. Example log snippet: [tls_forward.log](/uploads/a5716360f9a3e6879160ff0766e37add/tls_forward.log)
_!1143 doesn't trigger here; it wasn't meant for forwarding and individual addresses might be broken for other reasons anyway._5.3.1https://gitlab.nic.cz/knot/knot-resolver/-/issues/657policy: actions don't populate OPT when they should2021-11-23T19:52:44+01:00Vladimír Čunátvladimir.cunat@nic.czpolicy: actions don't populate OPT when they should[RFC 6891](https://tools.ietf.org/html/rfc6891#section-6.1.1):
> If an OPT record is present in a received request, compliant responders MUST include an OPT record in their respective responses.
Original report: https://forum.turris.cz...[RFC 6891](https://tools.ietf.org/html/rfc6891#section-6.1.1):
> If an OPT record is present in a received request, compliant responders MUST include an OPT record in their respective responses.
Original report: https://forum.turris.cz/t/kresd-response-missing-opt-pseudo-rr/14437
It causes practical issues with systemd-resolved (see the report).https://gitlab.nic.cz/knot/knot-resolver/-/issues/654insufficient caching of some uncommon wildcards2020-12-11T09:46:52+01:00Vladimír Čunátvladimir.cunat@nic.czinsufficient caching of some uncommon wildcardsIn an NSEC3-signed zone, if a wildcard is nested deeper than directly under the apex, positive expansions from it may not be cached properly (but they succeed). Testing example: `foo.t.cunat.cz AAAA`.
The issue is that aggressive cache...In an NSEC3-signed zone, if a wildcard is nested deeper than directly under the apex, positive expansions from it may not be cached properly (but they succeed). Testing example: `foo.t.cunat.cz AAAA`.
The issue is that aggressive cache thinks it needs to additionally provide an NSEC3 record matching the closest (provable) encloser, but that's not true in this case (because the wildcard record proves encloser's existence). This NSEC3 record must exist but resolver probably hasn't obtained it, so synthesis from cache (usually) fails.
Fortunately, typical wildcard usage I see is directly under the apex `*.example.com`. We may also be "saved" by queries for non-existing types on the same name (e.g. AAAA), as those need this NSEC3 record and thus the only downside would be its "unneeded" addition into the corresponding positive wildcard expansions.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/645FORMERR does not trigger EDNS fallback2021-10-11T13:06:06+02:00Petr ŠpačekFORMERR does not trigger EDNS fallbackVersion: 5.2.0
Domain `spam.molax.co.kr.` qtype `A` does not work with EDNS. Auth servers correctly return FORMERR but kresd 5.2.0 does not fallback to non-EDNS and SERVFAILs request from client.
[spam.molax.co.kr.A.log](/uploads/edde7...Version: 5.2.0
Domain `spam.molax.co.kr.` qtype `A` does not work with EDNS. Auth servers correctly return FORMERR but kresd 5.2.0 does not fallback to non-EDNS and SERVFAILs request from client.
[spam.molax.co.kr.A.log](/uploads/edde70e988fcf6ab810e693802c8896d/spam.molax.co.kr.A.log)
We need to:
- fix kresd
- investigate why test https://gitlab.nic.cz/knot/deckard/-/blob/master/sets/resolver/iter_formerr.rpl did not detect this and fix it!https://gitlab.nic.cz/knot/knot-resolver/-/issues/632control protocol redesign2020-10-27T17:39:35+01:00Petr Špačekcontrol protocol redesignVersion affected: 5.2.0
Current control protocol has several deficiencies:
- Input commands are read as text, individual commands are delimited with `\n` byte. This prevents user from sending multi-line commands or their parameters beca...Version affected: 5.2.0
Current control protocol has several deficiencies:
- Input commands are read as text, individual commands are delimited with `\n` byte. This prevents user from sending multi-line commands or their parameters because the embedded `\n` breaks implicit command boundaries.
- Output is always string from `table_print()`. Consequently:
- control protocol cannot represent e.g. Lua errors - these lead to empty output.
- sending structured data to another instance is PITA as it has to be serialized into string before it is returned to `table_print()`, and this serialized string is then (again) decorated by `table_print()` with string delimiters `'`
I don't know what's best approach to address this but I think it is worth exploring existing solutions (protobuf? something else?) before inventing our own serialization format and control protocol.https://gitlab.nic.cz/knot/knot-resolver/-/issues/626Can't validate `k.root-servers.net A` with minimization off and cold cache.2020-12-20T18:15:44+01:00Štěpán BalážikCan't validate `k.root-servers.net A` with minimization off and cold cache.Reproducer:
```lua
option('NO_MINIMIZE', true)
-- maybe wait a bit for priming to end
cache.clear()
verbose(true)
-- dig +dnssec @resolver k.root-servers.net A
```
```
[00000.00][plan] plan 'k.root-servers.net.' type 'A' uid [35628.00]...Reproducer:
```lua
option('NO_MINIMIZE', true)
-- maybe wait a bit for priming to end
cache.clear()
verbose(true)
-- dig +dnssec @resolver k.root-servers.net A
```
```
[00000.00][plan] plan 'k.root-servers.net.' type 'A' uid [35628.00]
[35628.00][iter] 'k.root-servers.net.' type 'A' new uid was assigned .01, parent uid .00
[35628.01][resl] => using root hints
[35628.01][iter] 'k.root-servers.net.' type 'A' new uid was assigned .02, parent uid .00
[35628.02][resl] >< TA: '.'
[35628.02][plan] plan '.' type 'DNSKEY' uid [35628.03]
[35628.03][iter] '.' type 'DNSKEY' new uid was assigned .04, parent uid .02
[35628.04][resl] => id: '54250' querying: '2001:500:a8::e#00053' score: 10 zone cut: '.' qname: '.' qtype: 'DNSKEY' proto: 'udp'
[35628.04][resl] => id: '54250' querying: '192.203.230.10#00053' score: 10 zone cut: '.' qname: '.' qtype: 'DNSKEY' proto: 'udp'
[35628.04][iter] <= rcode: NOERROR
[35628.04][vldr] <= parent: updating DNSKEY
[35628.04][vldr] <= answer valid, OK
[35628.04][cach] => stashed . DNSKEY, rank 060, 824 B total, incl. 1 RRSIGs
[ta_signal_query] signalling query trigered: _ta-4f66.
[35628.04][resl] <= server: '2001:500:a8::e' rtt: >= 229 ms
[35628.04][resl] <= server: '192.203.230.10' rtt: 29 ms
[35628.02][iter] 'k.root-servers.net.' type 'A' new uid was assigned .05, parent uid .00
[35628.05][resl] => id: '03562' querying: '192.203.230.10#00053' score: 29 zone cut: '.' qname: 'K.roOt-seRVers.NEt.' qtype: 'A' proto: 'udp'
[00000.00][plan] plan '_ta-4f66.' type 'NULL' uid [65566.00]
[65566.00][iter] '_ta-4f66.' type 'NULL' new uid was assigned .01, parent uid .00
[65566.01][resl] => using root hints
[65566.01][iter] '_ta-4f66.' type 'NULL' new uid was assigned .02, parent uid .00
[65566.02][resl] >< TA: '.'
[65566.02][plan] plan '.' type 'DNSKEY' uid [65566.03]
[65566.03][iter] '.' type 'DNSKEY' new uid was assigned .04, parent uid .02
[65566.04][cach] => satisfied by exact RRset: rank 060, new TTL 172800
[65566.04][iter] <= rcode: NOERROR
[65566.04][vldr] <= parent: updating DNSKEY
[65566.04][vldr] <= answer valid, OK
[65566.02][iter] '_ta-4f66.' type 'NULL' new uid was assigned .05, parent uid .00
[65566.05][resl] => id: '37696' querying: '2001:500:2f::f#00053' score: 10 zone cut: '.' qname: '_ta-4F66.' qtype: 'NULL' proto: 'udp'
[35628.05][iter] <= rcode: NOERROR
[35628.05][vldr] >< cut changed, needs revalidation
[35628.05][resl] <= server: '192.203.230.10' rtt: 21 ms
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][plan] plan 'net.' type 'DS' uid [35628.06]
[35628.06][iter] 'net.' type 'DS' new uid was assigned .07, parent uid .05
[35628.07][resl] => id: '15869' querying: '2001:500:1::53#00053' score: 10 zone cut: '.' qname: 'NEt.' qtype: 'DS' proto: 'udp'
[65566.05][resl] => id: '37696' querying: '192.5.5.241#00053' score: 10 zone cut: '.' qname: '_ta-4F66.' qtype: 'NULL' proto: 'udp'
[35628.07][resl] => id: '15869' querying: '198.97.190.53#00053' score: 10 zone cut: '.' qname: 'NEt.' qtype: 'DS' proto: 'udp'
[65566.05][iter] <= rcode: NXDOMAIN
[65566.05][vldr] <= answer valid, OK
[65566.05][cach] => stashed . NSEC, rank 060, 308 B total, incl. 1 RRSIGs
[65566.05][cach] => stashed . SOA, rank 060, 358 B total, incl. 1 RRSIGs
[65566.05][cach] => nsec_p stashed for . (new, hash: 0)
[65566.05][resl] <= server: '2001:500:2f::f' rtt: >= 225 ms
[65566.05][resl] <= server: '192.5.5.241' rtt: 25 ms
[65566.05][resl] AD: request classified as SECURE
[65566.05][resl] finished: 4, queries: 2, mempool: 98352 B
[35628.07][iter] <= rcode: NOERROR
[35628.07][vldr] <= DS: OK
[35628.07][vldr] <= parent: updating DS
[35628.07][vldr] <= answer valid, OK
[35628.07][cach] => stashed net. DS, rank 060, 330 B total, incl. 1 RRSIGs
[35628.07][resl] <= server: '2001:500:1::53' rtt: >= 250 ms
[35628.07][resl] <= server: '198.97.190.53' rtt: 50 ms
[35628.05][resl] >< TA: '.'
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][plan] plan 'net.' type 'DS' uid [35628.08]
[35628.08][iter] 'net.' type 'DS' new uid was assigned .09, parent uid .05
[35628.09][cach] => satisfied by exact RRset: rank 060, new TTL 86400
[35628.09][iter] <= rcode: NOERROR
[35628.09][vldr] <= DS: OK
[35628.09][vldr] <= parent: updating DS
[35628.09][vldr] <= answer valid, OK
[35628.05][resl] >< TA: '.'
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][vldr] <= continuous revalidation, fails
[35628.05][cach] => stashed k.root-servers.net. A, rank 027, 20 B total, incl. 0 RRSIGs
[35628.05][cach] => not overwriting A k.root-servers.net.
[35628.00][resl] request failed, answering with empty SERVFAIL
[35628.05][resl] finished: 8, queries: 3, mempool: 49200 B
```
And we get an empty SERVFAIL as an answer. :(https://gitlab.nic.cz/knot/knot-resolver/-/issues/622map() command mangles return values2020-11-24T18:34:20+01:00Petr Špačekmap() command mangles return valuesAffected version: 5.1.3 and most likely all older versions as well
Problem
=======
In short current `map()` command is broken for arrays of return values different with length != 1 and also for certain data types.
Example with three re...Affected version: 5.1.3 and most likely all older versions as well
Problem
=======
In short current `map()` command is broken for arrays of return values different with length != 1 and also for certain data types.
Example with three return values. This is wildly inconsistent even between consecutive executions on the same `kresd -f4` instances:
```
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
```
Example with nil - it gets turned into empty tables _but not for all four instances_:
```
> map('nil')
[1] => {
}
[2] => {
}
[3] => {
}
```
Example with errors - errors are incorrectly turned into strings so the map() caller cannot distinguish them from real string return values:
```
> map('error("test")')
[1] => test
[2] => test
[3] => test
[4] => test
```
In short it is utterly broken.
Proposal
========
`map()` is mostly broken and also undocumented so we change it more or less freely. Here is how I would do it:
- Follower instances will use `pcall` to determine if a the single call inside map() succeeded or not and send results back to leader
- map() running on leader instance will receive full outputs from `pcall` on followers and check if all calls succeeded. If not `map()` will throw out an error.
- Leader will check if number of return values from each instance is exactly one (from each follower) - if not map() on leader will throw out an error.
- if no error occured map() will return two tables:
- first table with results from each instance in format `{return value from first instance}, return value from second instance, return value from third instance, ...}` - order of return values is not defined and cannot be relied on
- second table with corresponding names of control sockets (? maybe we can skip this and add it later?)
- Usage if map() caller wants to receive multiple return values: `map('{expression to be evaluated}')` will produce table of tables {{return values from first instance}, {return values from second instance}, ...}, {name of first instance, name of second instance, ...}
- Usage if map() caller wants to also receive errors:
- Caller can either wrap intended expression in pcall: `map('{pcall(expression to be evaluated)}')`
- or we can define a table format which will be used by errors thrown from map()
This approach also allows to handle variable number of return values and also `nil` values in a safe manner. Let's define two helper functions:
```
function tab_pack(...)
local tab = {...}
tab.n = select('#', ...)
return tab
end
function tab_unpack(tab)
return unpack(tab, 1, tab.n)
end
```
Now map() caller can do: `map('tab_pack(nil,2,nil)')` and receive results like:
```
> map('tab_pack(nil,2,nil)')
[1] => {
[2] => 2
[n] => 3
}
[2] => {
[2] => 2
[n] => 3
}
```
The auto-generated table field `n` tells the caller that original result contained 3 values so the caller can iterate over [1],[2],[3] and safely find that first and third return value were nil.5.2.0https://gitlab.nic.cz/knot/knot-resolver/-/issues/614failure: forwarding + signatures + CNAME to sibling2020-10-23T10:03:46+02:00Vladimír Čunátvladimir.cunat@nic.czfailure: forwarding + signatures + CNAME to siblingMaybe there are some additional conditions needed to trigger the problem.
Example log:
<details><pre>
[44705.16][resl] => id: '14071' querying: '193.17.47.1#00053' score: 21 zone cut: 'dns-oarc.net.' qname: 'Rate.dnS-OaRc.NET.' qtype:...Maybe there are some additional conditions needed to trigger the problem.
Example log:
<details><pre>
[44705.16][resl] => id: '14071' querying: '193.17.47.1#00053' score: 21 zone cut: 'dns-oarc.net.' qname: 'Rate.dnS-OaRc.NET.' qtype: 'A' proto: 'udp'
[44705.16][iter] <= answer received:
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 14071
;; Flags: qr rd ra cd QUERY: 1; ANSWER: 4; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused
;; QUESTION SECTION
rate.dns-oarc.net. A
;; ANSWER SECTION
rate.dns-oarc.net. 120 CNAME dev.dns-oarc.net.
rate.dns-oarc.net. 120 RRSIG CNAME 8 3 120 20201028101102 20200928091102 12093 dns-oarc.net. Rttx3oznjSpRgoKZPNi1vYY3fP8KDq+y82p9cs+upwXHdpx/sB8pS4LvHlME53fJ5wqf8vmcAY07kJU4z4PKFam6Qj0a7De2zX3+JEFhZhUGV4UGxgbgX1lZTfS/bRPBmO/vxwCproIiIgWxgZLOvJb4kyCtscD8QWWxn2Tijvk=
dev.dns-oarc.net. 300 A 77.72.225.245
dev.dns-oarc.net. 300 RRSIG A 8 3 300 20201026021701 20200928021701 25608 dev.dns-oarc.net. faVj+wxFlFAjg2PQm9Dj4zjA2Ad57cXKv62YFVZ1x0zAUaBXnN+95YEfHwuBanu9P7REBKyaL47NmRo9BPOIzwxTbD610lUEWjx9OkMzmZwJOr5EddraB523q2BLToqJX344NBPNywtMUuYKPQQlBKzFN+Av/gstXGUCfyccU2E=
;; ADDITIONAL SECTION
[44705.16][iter] <= rcode: NOERROR
[44705.16][vldr] >< bogus signatures: dev.dns-oarc.net. A (3 matching RRSIGs, 0 expired, 0 not yet valid, 3 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[44705.16][vldr] >< cut changed (new signer), needs revalidation
[44705.16][resl] <= server: '185.43.135.1' rtt: 135 ms
[44705.16][plan] plan 'rate.dns-oarc.net.' type 'DS' uid [44705.17]
[44705.17][iter] 'rate.dns-oarc.net.' type 'DS' new uid was assigned .18, parent uid .16
[44705.18][cach] => skipping exact packet: rank 025 (min. 030), new TTL -501
[44705.18][cach] => trying zone: ., NSEC, hash 0
[44705.18][cach] => NSEC sname: range search miss (!covers)
[44705.18][cach] => skipping zone: ., NSEC, hash 0;new TTL -123456789, ret -2
[ ][nsre] score 21 for 185.43.135.1#00053; cached RTT: 120
[ ][nsre] score 21 for 193.17.47.1#00053; cached RTT: 11
[44705.18][resl] => id: '45855' querying: '185.43.135.1#00053' score: 21 zone cut: 'dev.dns-oarc.net.' qname: 'rAtE.dNs-oArC.nET.' qtype: 'DS' proto: 'udp'
[44705.18][resl] => id: '45855' querying: '193.17.47.1#00053' score: 21 zone cut: 'dev.dns-oarc.net.' qname: 'rAtE.dNs-oArC.nET.' qtype: 'DS' proto: 'udp'
[44705.18][iter] <= answer received:
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 45855
;; Flags: qr rd ra cd QUERY: 1; ANSWER: 4; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused
;; QUESTION SECTION
rate.dns-oarc.net. DS
;; ANSWER SECTION
rate.dns-oarc.net. 120 CNAME dev.dns-oarc.net.
rate.dns-oarc.net. 120 RRSIG CNAME 8 3 120 20201028101102 20200928091102 12093 dns-oarc.net. Rttx3oznjSpRgoKZPNi1vYY3fP8KDq+y82p9cs+upwXHdpx/sB8pS4LvHlME53fJ5wqf8vmcAY07kJU4z4PKFam6Qj0a7De2zX3+JEFhZhUGV4UGxgbgX1lZTfS/bRPBmO/vxwCproIiIgWxgZLOvJb4kyCtscD8QWWxn2Tijvk=
dev.dns-oarc.net. 300 DS 65191 8 2 7202E542EC7177402116BE5EABB2366EAA1EEE8196A03934B2870A11DF174102
dev.dns-oarc.net. 300 RRSIG DS 8 3 300 20201028101102 20200928091102 12093 dns-oarc.net. jwebuweys5NQ5g/QYmltaYRxs7s1pvWXtvS/yH3JounDaBklEseOvfumUv9VdxbxT7a/U/rwWDg3CNtXlOO+4W8WpQp94Tz0wSAwT/UcA5hh38hPXnqJ/nH3gvRmEUi8iEMwl2c615IZ9YX4zNXh07oNfPeWTgzjKu6nHNF8bKA=
;; ADDITIONAL SECTION
[44705.18][iter] <= rcode: NOERROR
[44705.18][vldr] <= no useful RR in authoritative answer
[44705.18][cach] => stashed packet: rank 025, TTL 120, DS rate.dns-oarc.net. (472 B)
[44705.00][resl] request failed, answering with empty SERVFAIL
[44705.18][resl] finished: 8, queries: 5, mempool: 49200 B
</pre></details>Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/602cache size exposed in Lua API can get out of sync2020-11-04T11:53:33+01:00Petr Špačekcache size exposed in Lua API can get out of syncThis is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should ...This is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168309): (+1 comment)
> I wonder if `cache.current_size` returns correct size if some rounding took place inside the backend.