Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2020-11-25T13:22:36+01:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/623declarative config - Lua API extension2020-11-25T13:22:36+01:00Vaclav Sraierdeclarative config - Lua API extensionI would like to open a discussion as a follow up after #536. The problem remains and this proposal attempts to fix it differently.
# Problem (re)statement
Current configuration is practically a Lua program, which is a nightmare for mul...I would like to open a discussion as a follow up after #536. The problem remains and this proposal attempts to fix it differently.
# Problem (re)statement
Current configuration is practically a Lua program, which is a nightmare for multiple reasons:
* non-programmers have hard time understanding what is going on
* Lua language makes it hard to detect mistakes in the config
* run-time reconfiguration requires doing each change N times for N processes
* currently it exposes low-level stuff and it prone to crashes on invalid use (#182)
# Proposal
## kresd
We could extend kresd API with the following function:
```lua
--- Sets the resolver to supplied state regardless of what was configured
--- before. Options that aren't specified in the argument are set to their
--- default value
---
--- @param cfg Table corresponding to the existing YANG model
function configure(cfg)
```
And optionally with this:
```lua
--- Returns a table corresponding to the existing YANG model with the current
--- configuration.
function dump_configuration()
```
### Motivation
* extends existing API, this change will not break any existing setup
* works with simple data formats so it is quite feasible to implement the whole functionality in pure Lua
* Updates of policies or other large data might be performed by the existing API, side-stepping the new configuration functions, alleviating performance issues with the declarative API.
* relatively simple to implement
* new file configuration format might be easily added later on allowing direct declarative configuration
* implements foundation for dynamically reloadable configuration - adding it on top of the declarative configuration (in the previous bullet point) would be quite straightforward
### Known issues
* At least some validation of the data format must be present in every kresd instance. By exposing these functions publicly, there is no way to go around that. An option might be to make something very similar but private. Then a centralized configuration tool (see bellow) could do the validation eliminating the need for validation by every instance.
### To be considered
* Is it really a good idea to use Lua tables as the configuration format? Lua is not backward compatible between releases which might lead to potential problems. Using JSON instead might be more future proof and it might integrate better with existing tools.
* Do we really want to stick to the existing Lua API? Wouldn't it be better to implement something completely new allowing us to ditch the existing API at some point in the future?
## Centralized management of multiple instances
To enable centralized management of multiple instances, a separate tool can be developed utilizing both new functions described above. It could provide any type of external API (NETCONF, REST API, sysrepo, different centralized configuration file...) and bridge it to our two new functions, calling them for all resolver instances as necessary. We could even implement this in a form of a library for commanding all kresd instances on the system at once, leaving the external API implementation up to interested parties in their specific technologies.
Basics of this were already written by @amrazek in the form of the `kres-watcher` tool.https://gitlab.nic.cz/knot/knot-resolver/-/issues/621always keep RRSIG and its RRset in single data structure2020-10-07T18:04:01+02:00Petr Špačekalways keep RRSIG and its RRset in single data structureProblem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all as...Problem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all associated RRSIGs (multiple of them!).
Investigation how this could be done in most efficient way is needed.
Maybe this approach could be beneficial also to libknot/Knot DNS so let's not forget to talk to them.
Cc @lpeltan @dsalzman and gang.https://gitlab.nic.cz/knot/knot-resolver/-/issues/615disallow mixing protocols in net.listen()2022-02-16T07:24:37+01:00Tomas Krizekdisallow mixing protocols in net.listen()Due to our reuseport facility, it is possible to use `net.listen()` to bind multiple protocols to a single (ip, port) combination. I can't think of any valid use-case and the most likely cause - typo - will cause misbehavior instead of a...Due to our reuseport facility, it is possible to use `net.listen()` to bind multiple protocols to a single (ip, port) combination. I can't think of any valid use-case and the most likely cause - typo - will cause misbehavior instead of a crash.
```
-- this isn't valid or supported
net.listen('::1', 443, { kind = 'tls' })
net.listen('::1', 443, { kind = 'doh2' })
```
I think the resolver should crash in these cases.https://gitlab.nic.cz/knot/knot-resolver/-/issues/606incorporate DNS Shotgun into kresd CI2020-10-30T11:55:49+01:00Petr Špačekincorporate DNS Shotgun into kresd CIThe following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scena...The following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scenation into pytests/connection tests?https://gitlab.nic.cz/knot/knot-resolver/-/issues/605cache: explore better ways to detect cache changes made by other processes2020-11-04T11:53:32+01:00Petr Špačekcache: explore better ways to detect cache changes made by other processeskresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following ...kresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310): (+1 comment)
> Why not use https://docs.libuv.org/en/v1.x/guide/filesystem.html#file-change-events ?https://gitlab.nic.cz/knot/knot-resolver/-/issues/604cache: zero-downtime restart is not supported across versions which change ca...2020-11-04T11:53:33+01:00Petr Špačekcache: zero-downtime restart is not supported across versions which change cache format/versionCurrently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in ...Currently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in release notes when it is necessary to stop all instances before starting new ones.
See rest of the [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683): (+1 comment)
> I wonder how this magic would work in situation where:
> - kresd instances 1+2 are running version 5.y.z with cache in /var/cache/knot-resolver
> - kresd binary gets updated to version 6.0.0
> - admin restarts instance 1 first (according to https://knot-resolver.readthedocs.io/en/v5.1.2/systemd-multiinst.html#zero-downtime-restarts) and restarts instance 2 later
> I guess instance 2 would not detect this unless cache overflows, so most likely instance 2 will write data in old format into cache versioned by version 6.0.0.
>
> Am I correct?
>
> If so I think we should open issue and keep it in mind for future cache rewrite/migration to custom data structure.https://gitlab.nic.cz/knot/knot-resolver/-/issues/603cache: get rid of mdb_env_sync()2020-09-07T17:52:07+02:00Petr Špačekcache: get rid of mdb_env_sync()Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/kno...Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169608): (+1 comment)
> Out of curiosity, why the sync is necessary here?https://gitlab.nic.cz/knot/knot-resolver/-/issues/602cache size exposed in Lua API can get out of sync2020-11-04T11:53:33+01:00Petr Špačekcache size exposed in Lua API can get out of syncThis is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should ...This is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168309): (+1 comment)
> I wonder if `cache.current_size` returns correct size if some rounding took place inside the backend.https://gitlab.nic.cz/knot/knot-resolver/-/issues/598ability to reload ssl certificate on certificate change2020-11-25T09:26:51+01:00TomVnzability to reload ssl certificate on certificate changeI was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls ...I was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls = true, cert = "\<CERT\>", key = "\<KEY\>"}, '<webmgmt|doh>') --for DoH|webmgmt
net.listen('0.0.0.0', 53, { kind = 'dns' })
net.listen('0.0.0.0', 443, { kind = 'doh' })
net.listen('0.0.0.0', 853, { kind = 'tls' })
net.listen('0.0.0.0', 8453, { kind = 'webmgmt' })
net.tls("\<CERT\>", "\<KEY\>") --for DoT
</code>
But, if knot-resolver is running as unprivileged user then it can't rebind to privileged ports. And this needs to be scripted somehow.
An alternative way would be for the process that creates the new SSL certificates to restart knot-resolver but then that process would need to run as root.
So for now, I'm using a custom systemd path / service combo to monitor certificate file for any changes and then reload knot-resolver that way.
Would be keen to know of any thoughts to simplyfy this, or even the ability to reload the certificate could be added into knot-resolver itself - I know rpz files are monitored and reloaded when changed so this seems somewhat similar.https://gitlab.nic.cz/knot/knot-resolver/-/issues/59364-bit ARM: remaining issues2020-10-01T10:53:36+02:00Santiago64-bit ARM: remaining issues(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
...(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
- - -
#### Original post
Hi there,
It seems to be known that kresd doesn't work on arm64, but I haven't found this particular build error document (so sorry for the possible noise). knot-resolver 5.1.x doesn't build on Debian due to a luajit error (bad light userdata pointer). The full build log is in https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
And this is the relevant part:
````
...
Message: --- config_tests dependencies ---
Running command: /usr/bin/luajit -l cqueues -e os.exit(0)
--- stdout ---
--- stderr ---
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
../tests/meson.build:27:4: ERROR: Problem encountered: missing luajit package: cqueues
````
Cheers,
-- Santiagohttps://gitlab.nic.cz/knot/knot-resolver/-/issues/590document bug reporting procedure2020-07-10T14:10:23+02:00Petr Špačekdocument bug reporting procedure- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...https://gitlab.nic.cz/knot/knot-resolver/-/issues/589document threat model2020-07-11T22:10:59+02:00Petr Špačekdocument threat model- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not ...- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not count on it, encrypting only DNS traffic does not hide ithttps://gitlab.nic.cz/knot/knot-resolver/-/issues/588control socket drops long outputs2020-09-17T13:22:45+02:00Petr Špačekcontrol socket drops long outputsControl socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | soca...Control socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | socat - unix-connect:$(ls control/*) | wc -c
223362
```
I.e. the output is truncated after 223362 bytes. This value is not a constant, it varies. Expected output should be 1024*1024*10 bytes `a` + 2x2 bytes of prompt `> `.
Strace:
```
read(23, "__binary\nstring.rep('a', 1024*10"..., 65536) = 40
dup(23) = 24
fcntl(24, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
write(24, "\0\240\0\1aaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096) = 4096
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10481664) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10262400) = 109632
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10152768) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 9933504) = -1 EAGAIN (Resource temporarily unavailable)
close(24) = 0
```
The whole `io_tty_process_input()` function is a mess and should be refactored into smaller pieces, and most importantly rewritten to use libuv for writes as well.https://gitlab.nic.cz/knot/knot-resolver/-/issues/583new statistics for encrypted transports2020-06-19T14:17:50+02:00Petr Špačeknew statistics for encrypted transportsIt would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connect...It would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connection reuse?https://gitlab.nic.cz/knot/knot-resolver/-/issues/578test aggressive cache on NSEC3PARAM rotation2020-08-20T10:05:40+02:00Vladimír Čunátvladimir.cunat@nic.cztest aggressive cache on NSEC3PARAM rotationI don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:...I don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:
- NSEC3PARAM is the [data collected](https://tools.ietf.org/html/rfc5155#section-4.2) but it's taken from NSEC3 records directly.
- For this purpose, using NSEC is like one more possible NSEC3PARAM configuration.
- Reading from cache is designed to consider the last two NSEC3PARAMs that's been written for that zone.
- Code reference: identifiers containing `nsec_p`.https://gitlab.nic.cz/knot/knot-resolver/-/issues/573net.tls() allow usage of multiple certificates2020-10-08T11:43:59+02:00Tomas Krizeknet.tls() allow usage of multiple certificatesECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, i...ECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, if both ECC and RSA certificates could be used simulataneously.https://gitlab.nic.cz/knot/knot-resolver/-/issues/569clarify respdiff job names in CI2020-10-19T11:16:35+02:00Petr Špačekclarify respdiff job names in CIMostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.Mostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.https://gitlab.nic.cz/knot/knot-resolver/-/issues/568Some cases of DNS resolution from lua fail if OS provides only IPv6 resolvers2020-04-24T10:04:07+02:00Vladimír Čunátvladimir.cunat@nic.czSome cases of DNS resolution from lua fail if OS provides only IPv6 resolversConditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootst...Conditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootstrapping – both only after !894 (kresd >= 5.0.0).
Result example:
```
[prefill] fetch of `https://www.internic.net/domain/root.zone` failed: HTTP client library error: A non-recoverable error occurred when attempting to resolve the name (-1684960053)), will retry root zone download in 09 minutes 59 seconds
```
This is a problem in lua libraries that we've chosen to use: https://github.com/wahern/dns/issues/23https://gitlab.nic.cz/knot/knot-resolver/-/issues/551client retry logic on TCP/TLS connection closure2020-10-22T13:58:57+02:00Vladimír Čunátvladimir.cunat@nic.czclient retry logic on TCP/TLS connection closureWhen remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/h...When remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/html/rfc7766#section-6.2.4):
> DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses.
On the other hand servers SHOULD not close the connections early, without reasons for the particular case... so hopefully this won't happen that often in practice; [FRITZ!](https://forum.turris.cz/t/dns-over-tcp-just-a-single-transaction/12003/11) seems a notable case. _I'll keep copying the important points from that discussion to here._https://gitlab.nic.cz/knot/knot-resolver/-/issues/548Support for DoQ | DNS over QUIC2023-11-15T09:26:55+01:00Gaspard d'HautefeuilleSupport for DoQ | DNS over QUICHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFHHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFH