Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2020-10-22T13:58:57+02:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/551client retry logic on TCP/TLS connection closure2020-10-22T13:58:57+02:00Vladimír Čunátvladimir.cunat@nic.czclient retry logic on TCP/TLS connection closureWhen remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/h...When remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/html/rfc7766#section-6.2.4):
> DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses.
On the other hand servers SHOULD not close the connections early, without reasons for the particular case... so hopefully this won't happen that often in practice; [FRITZ!](https://forum.turris.cz/t/dns-over-tcp-just-a-single-transaction/12003/11) seems a notable case. _I'll keep copying the important points from that discussion to here._https://gitlab.nic.cz/knot/knot-resolver/-/issues/569clarify respdiff job names in CI2020-10-19T11:16:35+02:00Petr Špačekclarify respdiff job names in CIMostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.Mostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.https://gitlab.nic.cz/knot/knot-resolver/-/issues/624Graph not shown in web management (webmgmt)2020-10-12T09:33:09+02:00Ghost UserGraph not shown in web management (webmgmt)I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae0...I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae028abbd19e8a33a6544c70f393970e/knot-webmgmt-0.png)
![knot-webmgmt-1](/uploads/8d37984831fc956075084c5588472ad9/knot-webmgmt-1.png)
**Error Log:**
```
DevTools failed to load SourceMap: Could not load content for http://127.0.0.1:8053/dist/dygraph.min.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
dygraph.min.js:5 Can't plot empty data set
Q.parseArray_ @ dygraph.min.js:5
Q.start_ @ dygraph.min.js:5
Q.__init__ @ dygraph.min.js:4
Q @ dygraph.min.js:4
(anonymous) @ kresd.js:89
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 jQuery.Deferred exception: chartElement is not defined ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (http://127.0.0.1:8053/kresd.js:357:2)
at mightThrow (http://127.0.0.1:8053/jquery.js:2:15044)
at process (http://127.0.0.1:8053/jquery.js:2:15698) undefined
jQuery.Deferred.exceptionHook @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 Uncaught ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (kresd.js:357)
at mightThrow (jquery.js:2)
at process (jquery.js:2)
(anonymous) @ kresd.js:357
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
jQuery.readyException @ jquery.js:2
(anonymous) @ jquery.js:2
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
```
**Knot Resolver Configuration:**
```
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
net.listen('127.0.0.1', 8053, { kind = 'webmgmt' })
-- Load useful modules
modules = {
'policy',
'http'
}
-- Cache size
cache.size = 1 * GB
-- Forward to upstream servers (8.8.8.8 and 1.1.1.1) using DoT
policy.add(policy.all(policy.TLS_FORWARD({
{'8.8.8.8', hostname='dns.google'},
{'1.1.1.1', hostname='cloudflare-dns.com'}
})))
```
**Knot Resolver Version:**
```
root@engine:/etc/knot-resolver# apt-cache policy knot-resolver
knot-resolver:
Installed: 5.1.3-2
Candidate: 5.1.3-2
Version table:
*** 5.1.3-2 500
500 http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolver-latest/xUbuntu_20.04 Packages
100 /var/lib/dpkg/status
3.2.1-3ubuntu2 500
500 http://kambing.ui.ac.id/ubuntu focal/universe amd64 Packages
```
Thank You.https://gitlab.nic.cz/knot/knot-resolver/-/issues/182fuzz & fix configuration interface to avoid segfaults2020-10-09T19:18:24+02:00Petr Špačekfuzz & fix configuration interface to avoid segfaultsOften a typo in config file can lead to segfault. We might try to write a fuzzer for config file and see what happens.Often a typo in config file can lead to segfault. We might try to write a fuzzer for config file and see what happens.https://gitlab.nic.cz/knot/knot-resolver/-/issues/573net.tls() allow usage of multiple certificates2020-10-08T11:43:59+02:00Tomas Krizeknet.tls() allow usage of multiple certificatesECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, i...ECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, if both ECC and RSA certificates could be used simulataneously.https://gitlab.nic.cz/knot/knot-resolver/-/issues/621always keep RRSIG and its RRset in single data structure2020-10-07T18:04:01+02:00Petr Špačekalways keep RRSIG and its RRset in single data structureProblem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all as...Problem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all associated RRSIGs (multiple of them!).
Investigation how this could be done in most efficient way is needed.
Maybe this approach could be beneficial also to libknot/Knot DNS so let's not forget to talk to them.
Cc @lpeltan @dsalzman and gang.https://gitlab.nic.cz/knot/knot-resolver/-/issues/471FORMERR for bad packets2020-10-02T11:06:36+02:00Vladimír Čunátvladimir.cunat@nic.czFORMERR for bad packetsCurrently a request from client is either accepted or _ignored_. We should return `FORMERR` for packets where header looks like DNS.Currently a request from client is either accepted or _ignored_. We should return `FORMERR` for packets where header looks like DNS.https://gitlab.nic.cz/knot/knot-resolver/-/issues/59364-bit ARM: remaining issues2020-10-01T10:53:36+02:00Santiago64-bit ARM: remaining issues(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
...(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
- - -
#### Original post
Hi there,
It seems to be known that kresd doesn't work on arm64, but I haven't found this particular build error document (so sorry for the possible noise). knot-resolver 5.1.x doesn't build on Debian due to a luajit error (bad light userdata pointer). The full build log is in https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
And this is the relevant part:
````
...
Message: --- config_tests dependencies ---
Running command: /usr/bin/luajit -l cqueues -e os.exit(0)
--- stdout ---
--- stderr ---
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
../tests/meson.build:27:4: ERROR: Problem encountered: missing luajit package: cqueues
````
Cheers,
-- Santiagohttps://gitlab.nic.cz/knot/knot-resolver/-/issues/588control socket drops long outputs2020-09-17T13:22:45+02:00Petr Špačekcontrol socket drops long outputsControl socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | soca...Control socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | socat - unix-connect:$(ls control/*) | wc -c
223362
```
I.e. the output is truncated after 223362 bytes. This value is not a constant, it varies. Expected output should be 1024*1024*10 bytes `a` + 2x2 bytes of prompt `> `.
Strace:
```
read(23, "__binary\nstring.rep('a', 1024*10"..., 65536) = 40
dup(23) = 24
fcntl(24, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
write(24, "\0\240\0\1aaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096) = 4096
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10481664) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10262400) = 109632
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10152768) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 9933504) = -1 EAGAIN (Resource temporarily unavailable)
close(24) = 0
```
The whole `io_tty_process_input()` function is a mess and should be refactored into smaller pieces, and most importantly rewritten to use libuv for writes as well.https://gitlab.nic.cz/knot/knot-resolver/-/issues/603cache: get rid of mdb_env_sync()2020-09-07T17:52:07+02:00Petr Špačekcache: get rid of mdb_env_sync()Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/kno...Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169608): (+1 comment)
> Out of curiosity, why the sync is necessary here?https://gitlab.nic.cz/knot/knot-resolver/-/issues/578test aggressive cache on NSEC3PARAM rotation2020-08-20T10:05:40+02:00Vladimír Čunátvladimir.cunat@nic.cztest aggressive cache on NSEC3PARAM rotationI don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:...I don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:
- NSEC3PARAM is the [data collected](https://tools.ietf.org/html/rfc5155#section-4.2) but it's taken from NSEC3 records directly.
- For this purpose, using NSEC is like one more possible NSEC3PARAM configuration.
- Reading from cache is designed to consider the last two NSEC3PARAMs that's been written for that zone.
- Code reference: identifiers containing `nsec_p`.https://gitlab.nic.cz/knot/knot-resolver/-/issues/36lib: parallel queries2020-07-20T13:49:49+02:00Ghost Userlib: parallel queriesSome queries can be made in parallel (A+AAAA).
The current `rplan` can only work with the current query at the top of the stack.
The change would be to store a pointer to `current` that would be chosen when the
answer comes based on f...Some queries can be made in parallel (A+AAAA).
The current `rplan` can only work with the current query at the top of the stack.
The change would be to store a pointer to `current` that would be chosen when the
answer comes based on following criteria: `msgid + <qname, qtype, qclass> match`,
and the query MUST NOT have a parent. This could be later used for look-ahead queries (DNSKEY),
but then a care must be taken as the answers MAY come out of order, while they MUST be processed in order.https://gitlab.nic.cz/knot/knot-resolver/-/issues/589document threat model2020-07-11T22:10:59+02:00Petr Špačekdocument threat model- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not ...- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not count on it, encrypting only DNS traffic does not hide ithttps://gitlab.nic.cz/knot/knot-resolver/-/issues/590document bug reporting procedure2020-07-10T14:10:23+02:00Petr Špačekdocument bug reporting procedure- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...https://gitlab.nic.cz/knot/knot-resolver/-/issues/583new statistics for encrypted transports2020-06-19T14:17:50+02:00Petr Špačeknew statistics for encrypted transportsIt would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connect...It would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connection reuse?https://gitlab.nic.cz/knot/knot-resolver/-/issues/488can't reliably fetch stats when using SO_REUSEPORT2020-06-15T09:35:13+02:00Jean-Danielcan't reliably fetch stats when using SO_REUSEPORTI'm using knot resolver with systemd, and want to use the stats module + http module to fetch stats in prometheus format.
My problem is that if I start more that one instance (kresd@1, kresd@2, …), stats fetching requests are distribute...I'm using knot resolver with systemd, and want to use the stats module + http module to fetch stats in prometheus format.
My problem is that if I start more that one instance (kresd@1, kresd@2, …), stats fetching requests are distributed among the instances and returns only the stats from the answering instance.
I can't get a reliable way to fetch the stats in such configuration.
Workaround:
I can fetch and aggregate individual workers stats from the controls sockets, but the control socket is very unreliable (it is not able to properly parse 2 successives queries properly and often try to interpret them as a single query).https://gitlab.nic.cz/knot/knot-resolver/-/issues/568Some cases of DNS resolution from lua fail if OS provides only IPv6 resolvers2020-04-24T10:04:07+02:00Vladimír Čunátvladimir.cunat@nic.czSome cases of DNS resolution from lua fail if OS provides only IPv6 resolversConditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootst...Conditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootstrapping – both only after !894 (kresd >= 5.0.0).
Result example:
```
[prefill] fetch of `https://www.internic.net/domain/root.zone` failed: HTTP client library error: A non-recoverable error occurred when attempting to resolve the name (-1684960053)), will retry root zone download in 09 minutes 59 seconds
```
This is a problem in lua libraries that we've chosen to use: https://github.com/wahern/dns/issues/23https://gitlab.nic.cz/knot/knot-resolver/-/issues/429negative trust anchor does not prevent NXDOMAIN from aggressive cache2020-04-06T09:52:56+02:00Petr Špačeknegative trust anchor does not prevent NXDOMAIN from aggressive cacheRight now aggressive cache masks "grafted" domains, e.g. fake TLDs, even if these are listed as negative trust anchors.
This is unexpected behavior and forces users to use `NO_CACHE` which is not optimal. In future we should exempt NTAs...Right now aggressive cache masks "grafted" domains, e.g. fake TLDs, even if these are listed as negative trust anchors.
This is unexpected behavior and forces users to use `NO_CACHE` which is not optimal. In future we should exempt NTAs from aggressive cache.https://gitlab.nic.cz/knot/knot-resolver/-/issues/311policy.TLS_FORWARD should hold open a connection2020-02-28T10:01:04+01:00Daniel Kahn Gillmorpolicy.TLS_FORWARD should hold open a connectionI have an example `kresd` instance configured with the following policy:
policy.add(policy.all(policy.TLS_FORWARD({{'9.9.9.9', hostname="dns.quad9.net", ca_file="/etc/ssl/certs/ca-certificates.crt"}})))
If i make one request to thi...I have an example `kresd` instance configured with the following policy:
policy.add(policy.all(policy.TLS_FORWARD({{'9.9.9.9', hostname="dns.quad9.net", ca_file="/etc/ssl/certs/ca-certificates.crt"}})))
If i make one request to this local `kresd` instance, it sets up the TLS session to `quad9`, exchanges traffic with it, and then (about 2 seconds later) it tears down the connection to `quad9`. TLS session creation and teardown is pretty high overhead, and the `quad9` servers tolerate significantly longer periods of idle time.
Barring a good reason for early teardown, a forwarding client should hold open a session for at least 20 seconds -- but this should probably also be an adjustable configuration for a forwarder as different forwarders may have different policies.
Note that the configuration choice for timeout for `kresd` as a client forwarding over TLS should be distinct from the configuration choice for the delay tolerated by `kresd` when operating as a TLS listener.https://gitlab.nic.cz/knot/knot-resolver/-/issues/403Restrict how long a delegation can be refreshed in cache2020-02-28T09:55:02+01:00Marek VavrusaRestrict how long a delegation can be refreshed in cacheCurrently the NS record for domain delegation can be refreshed in cache with queries arriving near it's expiration time. This is good because the NS record can be prefetched ahead of time, but it also means when a domain moves to a diffe...Currently the NS record for domain delegation can be refreshed in cache with queries arriving near it's expiration time. This is good because the NS record can be prefetched ahead of time, but it also means when a domain moves to a different DNS provider, resolver will never know as long as the NS record is getting refreshed from child side of the delegation, as it will never go back to the TLD to check if the zone delegation changed.
In order to fix this, the resolver will have to track how was the NS record cached. One possible solution is to add an inception time which would only be updated when NS record first enters cache from it's parent, or restrict the amount of times a record can be updated before it's expired, or just prevent NS records from being updated until they're fully expired.
What's the best way to fix this?