Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2017-10-09T17:06:28+02:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/172query name minimization does not work with partially bad glue records2017-10-09T17:06:28+02:00Petr Špačekquery name minimization does not work with partially bad glue recordsLet's have a zone which has incomplete glue records in delegation like this:
```
ENTRY_BEGIN
MATCH opcode subdomain
ADJUST copy_id copy_query
REPLY QR NOERROR
SECTION QUESTION
com. IN A
SECTION AUTHORITY
; This is the offending NS (it mu...Let's have a zone which has incomplete glue records in delegation like this:
```
ENTRY_BEGIN
MATCH opcode subdomain
ADJUST copy_id copy_query
REPLY QR NOERROR
SECTION QUESTION
com. IN A
SECTION AUTHORITY
; This is the offending NS (it must be ignored)
com. IN NS x.gtld-servers.net.
com. IN NS a.gtld-servers.net.
SECTION ADDITIONAL
x.gtld-servers.net. IN A 192.5.6.31
ENTRY_END
```
The server `x.gtld-servers.net.` is broken and returns REFUSED for all but NS queries. The other server `a.gtld-servers.net.` works.
kresd without query name minimization can handle it fine as it detects the `x` server as `bad` and moves on to the next server:
```
[ 0][plan] plan 'www.foo.com.' type 'A'
[55398][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[55398][resl] => using root hints
[39654][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[39654][resl] => querying: '193.0.14.129' score: 10 zone cut: '.' m12n: 'wWw.foO.cOM.' type: 'A' proto: 'udp'
[39654][iter] <= using glue for 'x.gtld-servers.net.': '192.5.6.31'
[39654][iter] <= referral response, follow
[39654][resl] <= server: '193.0.14.129' rtt: 6 ms
[30494][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[30494][resl] => querying: '192.5.6.31' score: 10 zone cut: 'com.' m12n: 'WWW.FoO.COM.' type: 'A' proto: 'udp'
[30494][iter] <= rcode: REFUSED
[30494][resl] <= server: '192.5.6.31' rtt: 1 ms
[18206][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[18206][resl] => querying: '192.5.6.31' score: 111 zone cut: 'com.' m12n: 'WWw.FOo.COm.' type: 'A' proto: 'udp'
[18206][iter] <= rcode: REFUSED
[18206][resl] <= server: '192.5.6.31' rtt: 1 ms
[57219][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[57219][resl] => querying: '192.5.6.31' score: 161 zone cut: 'com.' m12n: 'WwW.FoO.cOm.' type: 'A' proto: 'udp'
[57219][iter] <= rcode: REFUSED
[57219][resl] <= server: '192.5.6.31' rtt: 1 ms
[61022][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[61022][resl] => querying: '192.5.6.31' score: 186 zone cut: 'com.' m12n: 'wwW.fOO.COm.' type: 'A' proto: 'udp'
[61022][iter] <= rcode: REFUSED
[61022][resl] => server: '192.5.6.31' flagged as 'bad'
[54075][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[54075][plan] plan 'a.gtld-servers.net.' type 'AAAA'
[32802][iter] 'a.gtld-servers.net.' type 'AAAA' id was assigned, parent id 54075
[32802][resl] => using root hints
[61553][iter] 'a.gtld-servers.net.' type 'AAAA' id was assigned, parent id 54075
[61553][resl] => querying: '193.0.14.129' score: 11 zone cut: '.' m12n: 'A.gtld-SerVErS.nET.' type: 'AAAA' proto: 'udp'
[61553][iter] <= rcode: NOERROR
[61553][resl] <= server: '193.0.14.129' rtt: 2 ms
[30187][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[30187][plan] plan 'a.gtld-servers.net.' type 'A'
[27896][iter] 'a.gtld-servers.net.' type 'A' id was assigned, parent id 30187
[27896][resl] => using root hints
[34685][iter] 'a.gtld-servers.net.' type 'A' id was assigned, parent id 30187
[34685][resl] => querying: '193.0.14.129' score: 11 zone cut: '.' m12n: 'A.gtLd-SErVeRs.nET.' type: 'A' proto: 'udp'
[34685][iter] <= rcode: NOERROR
[30187][iter] <= using glue for 'a.gtld-servers.net.': '192.5.6.30'
[34685][resl] <= server: '193.0.14.129' rtt: 2 ms
[14390][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[14390][resl] => querying: '192.5.6.30' score: 10 zone cut: 'com.' m12n: 'WWW.foo.cOm.' type: 'A' proto: 'udp'
[14390][iter] <= referral response, follow
[14390][resl] <= server: '192.5.6.30' rtt: 1 ms
[14916][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[14916][plan] plan 'a.gtld-servers.net.' type 'AAAA'
[24215][iter] 'a.gtld-servers.net.' type 'AAAA' id was assigned, parent id 14916
[24215][resl] => using root hints
[45906][iter] 'a.gtld-servers.net.' type 'AAAA' id was assigned, parent id 14916
[45906][resl] => querying: '193.0.14.129' score: 11 zone cut: '.' m12n: 'A.gTld-SeRVErS.nET.' type: 'AAAA' proto: 'udp'
[45906][iter] <= rcode: NOERROR
[45906][resl] <= server: '193.0.14.129' rtt: 2 ms
[57675][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[57675][plan] plan 'a.gtld-servers.net.' type 'A'
[21748][iter] 'a.gtld-servers.net.' type 'A' id was assigned, parent id 57675
[21748][ rc ] => satisfied from cache
[21748][iter] <= rcode: NOERROR
[57675][iter] <= using glue for 'a.gtld-servers.net.': '192.5.6.30'
[49536][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[49536][resl] => querying: '192.5.6.30' score: 11 zone cut: 'www.foo.com.' m12n: 'WwW.fOo.CoM.' type: 'A' proto: 'udp'
[49536][iter] <= rcode: NOERROR
[49536][resl] <= server: '192.5.6.30' rtt: 1 ms
```
Unfortunately kresd does not move to the next server if query minimization is enabled:
```
[ 0][plan] plan 'www.foo.com.' type 'A'
[ 6555][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[ 6555][resl] => using root hints
[39232][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[39232][resl] => querying: '193.0.14.129' score: 10 zone cut: '.' m12n: 'coM.' type: 'NS' proto: 'udp'
[39232][iter] <= using glue for 'x.gtld-servers.net.': '192.5.6.31'
[39232][iter] <= referral response, follow
[39232][resl] <= server: '193.0.14.129' rtt: 7 ms
[17873][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[17873][resl] => querying: '192.5.6.31' score: 10 zone cut: 'com.' m12n: 'FoO.Com.' type: 'NS' proto: 'udp'
[17873][iter] <= using glue for 'x.gtld-servers.net.': '192.5.6.31'
[17873][iter] <= referral response, follow
[17873][resl] <= server: '192.5.6.31' rtt: 4 ms
[ 8362][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[ 8362][resl] => querying: '192.5.6.31' score: 11 zone cut: 'foo.com.' m12n: 'Www.fOo.cOM.' type: 'A' proto: 'udp'
[ 8362][iter] <= rcode: REFUSED
[ 8362][resl] <= server: '192.5.6.31' rtt: 3 ms
[ 6889][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[ 6889][resl] => querying: '192.5.6.31' score: 111 zone cut: 'foo.com.' m12n: 'WWw.fOO.cOm.' type: 'A' proto: 'udp'
[ 6889][iter] <= rcode: REFUSED
[ 6889][resl] <= server: '192.5.6.31' rtt: 2 ms
[43963][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[43963][resl] => querying: '192.5.6.31' score: 161 zone cut: 'foo.com.' m12n: 'Www.FOo.cOM.' type: 'A' proto: 'udp'
[43963][iter] <= rcode: REFUSED
[43963][resl] <= server: '192.5.6.31' rtt: 2 ms
[60355][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[60355][resl] => querying: '192.5.6.31' score: 186 zone cut: 'foo.com.' m12n: 'WWw.foo.COm.' type: 'A' proto: 'udp'
[60355][iter] <= rcode: REFUSED
[60355][resl] => server: '192.5.6.31' flagged as 'bad'
[26974][iter] 'www.foo.com.' type 'A' id was assigned, parent id 0
[26974][resl] => no valid NS left
```
Versions
-----------
Kresd: f9352bee195996c65bb764ec0ba3a2ad7683824d
This is covered by (fixed) test sets/resolver/iter_ns_badglue.rpl from commit deckard@ebcc8b59c29652af83266abbae6e5ae512e66f45 . (temporary branch iter_ns_badglue)https://gitlab.nic.cz/knot/knot-resolver/-/issues/173kresd answer from different IP2021-09-08T16:18:49+02:00Dan Rimalkresd answer from different IPHello,
I hit weird behaviour of kresd and i think it is a bug. I have 1 public IP on the network interface and i also have another two public IPs od dummy interface. Kresd listen on all interfaces and when i send query to IP sitting on d...Hello,
I hit weird behaviour of kresd and i think it is a bug. I have 1 public IP on the network interface and i also have another two public IPs od dummy interface. Kresd listen on all interfaces and when i send query to IP sitting on dummy iface, kresd send back response with SRC ip (probably) resolved from routing table - which is, in this case, IP of real network interface.
I think it is not correct behaviour. I cannot get response from different address. I try Unbound dns server it the same situation and it works good. Response came from requested IP.
My config is:
```
-- vim:syntax=lua:
-- Refer to manual: http://knot-resolver.readthedocs.org/en/latest/daemon.html#configuration
-- interfaces
net.ipv4 = true
net.ipv6 = true
net.listen({ '0.0.0.0', '::' }, 53)
-- drop privileges
user('kresd', 'kresd')
-- Load Useful modules
modules = {
'policy', -- Block queries to local zones/bad sites
'view', --
'stats' -- Track internal statistics
}
-- ACL
view:addr('15.62.0.0/15', function (req, qry) return policy.PASS end)
view:addr('128.13.5.67', function (req, qry) return policy.PASS end)
view:addr('2a01:bbbb:2:312:2222:2222::/64', function (req, qry) return policy.PASS end)
-- view:addr('0.0.0.0/0', function (req, qry) return policy.DROP end)
-- unmanaged DNSSEC root TA
trust_anchors.config('/etc/kresd/root.keys', nil)
cache.size = 2 * GB
```
Traffic dump:
```
14:38:37.908237 IP 128.15.1.67.42957 > 25.62.162.162.53: 17304+ [1au] A? centrum.cz. (39)
14:38:37.908443 IP 15.62.162.98.53 > 128.15.1.67.42957: 17304$ 1/0/1 A 46.255.231.48 (55)
```
IPs:
```
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
inet 15.62.162.98/30 brd 85.162.162.99 scope global ens192
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
inet 15.62.162.162/32 brd 85.162.162.162 scope global dummy0
valid_lft forever preferred_lft forever
5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
inet 15.62.162.85/32 brd 85.162.162.85 scope global dummy1
```
Routes:
```
default via 15.62.162.97 dev ens192 proto bird
15.62.162.96/30 dev ens192 proto kernel scope link src 15.62.162.98
```
Regards,
Danielhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/88TA bootstrap doesn't work without external resolver2020-01-07T17:16:08+01:00Ondřej SurýTA bootstrap doesn't work without external resolverIf `/etc/resolv.conf` contains `nameserver 127.0.0.1` and the nameserver running on `127.0.0.1` is the Knot Resolver instance bootstrapping the root TA, then the bootstrapping fails with name resolution error as it doesn't start resolvin...If `/etc/resolv.conf` contains `nameserver 127.0.0.1` and the nameserver running on `127.0.0.1` is the Knot Resolver instance bootstrapping the root TA, then the bootstrapping fails with name resolution error as it doesn't start resolving until the root TA is bootstrapped.
Knot Resolver should be able to resolve at least `data.iana.org` when doing the bootstrap and it should probably fail to start if it can't bootstrap root TA.https://gitlab.nic.cz/knot/knot-resolver/-/issues/47tools to migrate configuration from other resolver2018-12-17T13:32:08+01:00Marek Vavrusatools to migrate configuration from other resolverI wrote an Ansible scripts for Knot authoritative which might be a good starting point.
https://github.com/vavrusa/ansible-role-knotauth
... or something else, possibly YANG model.I wrote an Ansible scripts for Knot authoritative which might be a good starting point.
https://github.com/vavrusa/ansible-role-knotauth
... or something else, possibly YANG model.https://gitlab.nic.cz/knot/knot-resolver/-/issues/36lib: parallel queries2020-07-20T13:49:49+02:00Ghost Userlib: parallel queriesSome queries can be made in parallel (A+AAAA).
The current `rplan` can only work with the current query at the top of the stack.
The change would be to store a pointer to `current` that would be chosen when the
answer comes based on f...Some queries can be made in parallel (A+AAAA).
The current `rplan` can only work with the current query at the top of the stack.
The change would be to store a pointer to `current` that would be chosen when the
answer comes based on following criteria: `msgid + <qname, qtype, qclass> match`,
and the query MUST NOT have a parent. This could be later used for look-ahead queries (DNSKEY),
but then a care must be taken as the answers MAY come out of order, while they MUST be processed in order.https://gitlab.nic.cz/knot/knot-resolver/-/issues/32lib: child-side NS records are not always fetched2021-01-04T11:28:06+01:00Ghost Userlib: child-side NS records are not always fetchedhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/182fuzz & fix configuration interface to avoid segfaults2020-10-09T19:18:24+02:00Petr Špačekfuzz & fix configuration interface to avoid segfaultsOften a typo in config file can lead to segfault. We might try to write a fuzzer for config file and see what happens.Often a typo in config file can lead to segfault. We might try to write a fuzzer for config file and see what happens.https://gitlab.nic.cz/knot/knot-resolver/-/issues/185test graphite module2019-12-18T18:39:02+01:00Petr Špačektest graphite moduleThis might be lower priority than other tests.This might be lower priority than other tests.https://gitlab.nic.cz/knot/knot-resolver/-/issues/187test etcd module2017-10-09T17:00:34+02:00Petr Špačektest etcd moduleAn open question is how to mock etcd.An open question is how to mock etcd.https://gitlab.nic.cz/knot/knot-resolver/-/issues/195support RPZ from zone transfer2023-07-12T16:52:22+02:00Petr Špačeksupport RPZ from zone transferIt is painful to retrieve RPZ data as files. IXFR would operationaly help a lot.It is painful to retrieve RPZ data as files. IXFR would operationaly help a lot.https://gitlab.nic.cz/knot/knot-resolver/-/issues/207workarounds: log using generic workarounds2020-02-03T16:20:36+01:00Vladimír Čunátvladimir.cunat@nic.czworkarounds: log using generic workaroundsWhen using generic workarounds, it would be nice to have a possibility to log them, so the protocol violations might be collected and reported e.g. at https://github.com/dns-violations/, at the server operator, etc. ([Suggested by Anand...When using generic workarounds, it would be nice to have a possibility to log them, so the protocol violations might be collected and reported e.g. at https://github.com/dns-violations/, at the server operator, etc. ([Suggested by Anand.](https://ripe74.ripe.net/archives/video/159/))
It's probably of no use for the specific cases in the workarounds module, as those are known.https://gitlab.nic.cz/knot/knot-resolver/-/issues/210RFC 4035 sec 5.2: downgrade to insecure when only unknown algorithms are used...2019-07-09T17:12:39+02:00Vladimír Čunátvladimir.cunat@nic.czRFC 4035 sec 5.2: downgrade to insecure when only unknown algorithms are used (provably)Currently these lead to SERVFAIL, as detected by https://rootcanary.org/test.html
This will probably be about handling the `DNSSEC_INVALID_DS_ALGORITHM` return code from libdnssec.
https://tools.ietf.org/html/rfc4035#section-5.2
...Currently these lead to SERVFAIL, as detected by https://rootcanary.org/test.html
This will probably be about handling the `DNSSEC_INVALID_DS_ALGORITHM` return code from libdnssec.
https://tools.ietf.org/html/rfc4035#section-5.2
> If the validator does not support any of the algorithms listed in an
authenticated DS RRset, then the resolver has no supported
authentication path leading from the parent to the child. The
resolver should treat this case as it would the case of an
authenticated NSEC RRset proving that no DS RRset exists, as
described above.https://gitlab.nic.cz/knot/knot-resolver/-/issues/218dns64 is broken with policy.STUB2024-02-28T12:09:18+01:00Vladimír Čunátvladimir.cunat@nic.czdns64 is broken with policy.STUBSee e.g. 0b748e0e49. Related: https://gitlab.nic.cz/knot/knot-resolver/issues/217See e.g. 0b748e0e49. Related: https://gitlab.nic.cz/knot/knot-resolver/issues/217https://gitlab.nic.cz/knot/knot-resolver/-/issues/224validate: support mixing NSEC and NSEC3 in a single packet2017-10-10T10:08:11+02:00Vladimír Čunátvladimir.cunat@nic.czvalidate: support mixing NSEC and NSEC3 in a single packethttps://gitlab.nic.cz/knot/knot-resolver/-/issues/225opcode IQUERY returns SERVFAIL instead of NOTIMP2019-07-09T17:14:22+02:00Štěpán Kotekopcode IQUERY returns SERVFAIL instead of NOTIMPUnsupported opcode must lead to `RCODE=NOTIMP`. This will get back and bite us when the session signalling draft comes by.
Clarification: Response to unknown OPCODE must contain only the DNS message header and nothing else, not even EDN...Unsupported opcode must lead to `RCODE=NOTIMP`. This will get back and bite us when the session signalling draft comes by.
Clarification: Response to unknown OPCODE must contain only the DNS message header and nothing else, not even EDNS. The reason is that different OPCODEs might potentially use very different message format so it is risky to return anything beyond the DNS header.
test failing: `sets/resolver/iter_opcode_notimp.rpl ` in deckard, branch `unknown-opcode`
blocks deckard#112019 Q1https://gitlab.nic.cz/knot/knot-resolver/-/issues/226handling out-of-bailiwick CNAME chains from authoritative servers2017-10-10T10:12:12+02:00Vladimír Čunátvladimir.cunat@nic.czhandling out-of-bailiwick CNAME chains from authoritative serversSome servers incorrectly answer like this:
```
$ kdig @2a02:4a8:ac24:100::96:2 www.rozpocetverejne.cz.
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 41711
;; Flags: qr aa rd; Q...Some servers incorrectly answer like this:
```
$ kdig @2a02:4a8:ac24:100::96:2 www.rozpocetverejne.cz.
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 41711
;; Flags: qr aa rd; QUERY: 1; ANSWER: 1; AUTHORITY: 1; ADDITIONAL: 0
;; QUESTION SECTION:
;; www.rozpocetverejne.cz. IN A
;; ANSWER SECTION:
www.rozpocetverejne.cz. 600 IN CNAME ghs.google.com.
;; AUTHORITY SECTION:
google.com. 3600 IN SOA alfa.ns.active24.cz. hostmaster.active24.cz. 2017042405 10800 1800 1209600 3600
;; Received 132 B
;; Time 2017-07-28 10:26:52 CEST
;; From 2a02:4a8:ac24:100::96:2@53(UDP) in 5.3 ms
```
That claims two wrong things: that the server is authoritative for google.com and that name ghs.google.com doesn't exist. (For RCODE meaning with CNAMEs see https://tools.ietf.org/html/rfc6604#section-3) We found multiple instances of this, e.g. also from wedos: www.silvidesign.cz.
Kresd currently SERVFAILs on this (validation); it would be better to use the in-bailiwick information (the CNAME) and discard the rest of the information, even in this case.https://gitlab.nic.cz/knot/knot-resolver/-/issues/244track network changes and reconfigure as validating stub / resolver automatic...2021-06-24T13:38:33+02:00Petr Špačektrack network changes and reconfigure as validating stub / resolver automaticallyTaken from https://github.com/CZ-NIC/knot-resolver/issues/7
There should be a module to track changes in the network and environment to detect when the resolver is in an:
- Environment that blocks DNS queries altogether (and revert to s...Taken from https://github.com/CZ-NIC/knot-resolver/issues/7
There should be a module to track changes in the network and environment to detect when the resolver is in an:
- Environment that blocks DNS queries altogether (and revert to stub mode)
- Environment with DNSSEC-unaware resolver (do validation)
- Open environment (full recursive resolver)
This would make it as painless as possible for the end users with frequent network transitions (hotel wifi, workplace, home, ...)
Fallback to https://github.com/fcambus/rrda if the DNS is filtered/unreachable.https://gitlab.nic.cz/knot/knot-resolver/-/issues/252Test DNS64 module with weird answers2017-12-17T01:10:17+01:00Petr ŠpačekTest DNS64 module with weird answersPresentation [DNS64 at scale – Turning off IPv4](https://indico.dns-oarc.net/event/27/session/2/contribution/0) contains on slide 14 queries which return intentionally weird answers. We should test our DNS64 module that it reacts reasona...Presentation [DNS64 at scale – Turning off IPv4](https://indico.dns-oarc.net/event/27/session/2/contribution/0) contains on slide 14 queries which return intentionally weird answers. We should test our DNS64 module that it reacts reasonably.
If there is something which is not RFC-compliant, let's fix it in the DNS64 module. If there is something worth fixing for non-compliant cases, it should probably be in workarounds module.
Please talk to me before introducing workarounds for non-compliant cases.
Also, this might require some new Deckard tests.https://gitlab.nic.cz/knot/knot-resolver/-/issues/262simplify DNS64 code2017-10-22T14:25:03+02:00Petr Špačeksimplify DNS64 codeNew code introduced in #203 seems ugly because it introduced FFI spaghetti into DNS64 module. When you have some time, we should refactor that so it is readable again.New code introduced in #203 seems ugly because it introduced FFI spaghetti into DNS64 module. When you have some time, we should refactor that so it is readable again.https://gitlab.nic.cz/knot/knot-resolver/-/issues/264errors from Lua module interface are not developer friendly2019-12-18T19:56:41+01:00Petr Špačekerrors from Lua module interface are not developer friendlyI'm creating a "Hello world" Lua plugin and the process is not straightforward as I would wish.
Interestingly if a Lua module does not return a table (which is easy to forget when you start), it spits out quite confusing error message:
...I'm creating a "Hello world" Lua plugin and the process is not straightforward as I would wish.
Interestingly if a Lua module does not return a table (which is easy to forget when you start), it spits out quite confusing error message:
```
> modules.load('test')
attempt to index a boolean value
```
I was looking into the C code which loads the Lua modules and it does not have any super-easy fix because of Lua-C integration. This lead me to idea that we might rewrite Lua-module loading into Lua, so it is not such a long spagetti. (Or not, if it does not simplify the code. I'm just thinking aloud.)https://gitlab.nic.cz/knot/knot-resolver/-/issues/291refactor excessively long functions2018-12-17T13:29:42+01:00Marek Vavrusarefactor excessively long functionsFor readability's sake, we should refactor functions so that they're reasonably short.
The screen size is ~80 lines, some functions are >300 lines, which makes it easier to make mistakes.
The !432 added an upper bound limit of 400 statem...For readability's sake, we should refactor functions so that they're reasonably short.
The screen size is ~80 lines, some functions are >300 lines, which makes it easier to make mistakes.
The !432 added an upper bound limit of 400 statements / 500 lines, but we should do better.
These functions exceed the 200 statements / 300 lines limit:
* [ ] layer/validate.c:824 function 'validate' 337 statements (threshold 200)
* [ ] resolve.c:1310 function 'kr_resolve_produce' 250 statements (threshold 200)
* [x] worker.c:1406 function 'qr_task_step' 221 statements (threshold 200)
* [x] worker.c:1872 function 'worker_process_tcp' 260 statements (threshold 200)
* [ ] main.c:425 function 'main' 247 statements (threshold 200)https://gitlab.nic.cz/knot/knot-resolver/-/issues/292tls forwarding: there are high likelyhood of msg-id duplication for active qu...2018-02-16T11:04:58+01:00Grigorii Demidovtls forwarding: there are high likelyhood of msg-id duplication for active query under heavy loadhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/295validator might better ignore out-of-bailiwick crap2018-01-22T15:27:22+01:00Vladimír Čunátvladimir.cunat@nic.czvalidator might better ignore out-of-bailiwick crapReal-life example: `www.vikhockey.se. AAAA` fails in validator, due to server returning:
```
kdig @195.74.39.30 www.vikhockey.se. AAAA +dnssec
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 50218
;; Flags: qr aa rd; QUERY: 1; ANSWE...Real-life example: `www.vikhockey.se. AAAA` fails in validator, due to server returning:
```
kdig @195.74.39.30 www.vikhockey.se. AAAA +dnssec
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 50218
;; Flags: qr aa rd; QUERY: 1; ANSWER: 2; AUTHORITY: 8; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 1680 B; ext-rcode: NOERROR
;; QUESTION SECTION:
;; www.vikhockey.se. IN AAAA
;; ANSWER SECTION:
www.vikhockey.se. 600 IN CNAME vvik1-vvik.ramses.nu.
www.vikhockey.se. 600 IN RRSIG CNAME 8 3 600 20180201000000 20180111000000 34296 vikhockey.se. mnn7gL0v3BupFGZi4N/CV6vINkNOFy2y4H0Vx0ukrYDScxCubeLA0YCYCIE3thu13DCkOFuijUbWtaA9KSMivfJUb1q5yX0jdT0b5nvwK1/YSk2YnXMEbrjWqTu4rig+KsrZ0XSb76E0d/9wN5VtFxNkhfZypu5HSj85Isy46Bw=
;; AUTHORITY SECTION:
ramses.nu. 3600 IN SOA ns3.binero.se. registry.binero.se. 1516233600 86400 5400 604800 3600
ramses.nu. 3600 IN RRSIG SOA 8 2 3600 20180201000000 20180111000000 34296 ramses.nu. g4KxoD6HuieeEBgG6Z6oUTlhwdGelcUWRUq3Jd9osVaFzvn8XscQDdmcGh4maK0yofoz8t/ShRVjC4XQGnj5//eejMXY1jgra39VMbJ9P+7JOvGUuETw0WJL8oT7YehfFkCv1CRL5IoM6d9SYdYkmcDt/aoDMeoG+WgEZ6QHW5Y=
v8ssphenr3p30k9a4dpae5pr9ib7m3l1.ramses.nu. 3600 IN NSEC3 1 1 1 AB 18AJT6FFNC06017DT70ELSCVH3763P1C NS SOA MX RRSIG DNSKEY NSEC3PARAM
v8ssphenr3p30k9a4dpae5pr9ib7m3l1.ramses.nu. 3600 IN RRSIG NSEC3 8 3 3600 20180201000000 20180111000000 34296 ramses.nu. wSFv8izGquRzjaZJSnXn+7hgpaqfKGEr3l5OwtEI0KlBRPFmXGv8RD1d9dhJqp1QeaDK67rZqzFHioA/p13RP7kYDUCiOHX8VoA9hbQr3nFHeerkt+zSiYNaAH43sWT7oHpnrN9ODUIIB0s4Tbm1+U2G7tJ90JyjCjmMEXu+UQQ=
3dnbf1prkcm9234cr9atsv8a2gfs2oua.ramses.nu. 3600 IN NSEC3 1 1 1 AB 71O8H4PM96IP6HK4FDMQ2G34KD9KKGV4 A RRSIG
3dnbf1prkcm9234cr9atsv8a2gfs2oua.ramses.nu. 3600 IN RRSIG NSEC3 8 3 3600 20180201000000 20180111000000 34296 ramses.nu. dFKDMKzdwDmNEFfItTlEIIhAqqbk13WEO/etgywJLzEt3PRW1s70jfFCWqTeOjAUdeF6JEfLWklPYkhpBe0UwmYEVqlQcYJ37AKX7gUyN/iBKTtMfQWTXfdHMyjj1fyfEoeFh2SMk1Vl5bys1HKajB0SkOnKmzDKnZjBftDuimE=
j8qedtq6ned9n5sl7e99incs8s1m29sb.ramses.nu. 3600 IN NSEC3 1 1 1 AB MUE5EI8JM7A860A6HCDO7LQ42OSF6V55 A RRSIG
j8qedtq6ned9n5sl7e99incs8s1m29sb.ramses.nu. 3600 IN RRSIG NSEC3 8 3 3600 20180201000000 20180111000000 34296 ramses.nu. HTN4XXRy53RX8p2wksZ5HwW8gYisHHCWwbD/yjiUc4CC+q2tc9jiX9NTriGuKd32BCKqceHlPrAeU62Bn1fujCCKvmctVavr0oUXw4XSl0sJblyH5FitapCBwSW2rmiFY53Jup8oUQLpuNeNP8euADbai//gUiBl9UwHR0qR65c=
;; Received 1224 B
;; Time 2018-01-19 13:42:17 CET
;; From 195.74.39.30@53(UDP) in 130.5 ms
```
The part about CNAME is OK, but the NXDOMAIN on the target is BOGUS. (Seems like outdated `ramses.nu.` zone remaining on the server.)https://gitlab.nic.cz/knot/knot-resolver/-/issues/298improving latency of nameserver chasing2021-01-04T11:26:09+01:00Vladimír Čunátvladimir.cunat@nic.czimproving latency of nameserver chasingWhen chasing addresses of nameservers, kresd by default only trusts glue addresses if in bailiwick of the zone we asked. This isn't optimal even in some common cases, e.g. `com` and `net` TLD zones are served by the same set of servers,...When chasing addresses of nameservers, kresd by default only trusts glue addresses if in bailiwick of the zone we asked. This isn't optimal even in some common cases, e.g. `com` and `net` TLD zones are served by the same set of servers, so when a delegation from either has `NS` records from the other, we *could* safely trust the glue. Doing this check generally won't be trivial, but it might be worth the latency gains on cold cache; some nameservers cause us to chase through multiple zones until we find a trusted glue.
On a related note, we might also accept the glue if the child zone is signed. (seems easier to implement)https://gitlab.nic.cz/knot/knot-resolver/-/issues/311policy.TLS_FORWARD should hold open a connection2020-02-28T10:01:04+01:00Daniel Kahn Gillmorpolicy.TLS_FORWARD should hold open a connectionI have an example `kresd` instance configured with the following policy:
policy.add(policy.all(policy.TLS_FORWARD({{'9.9.9.9', hostname="dns.quad9.net", ca_file="/etc/ssl/certs/ca-certificates.crt"}})))
If i make one request to thi...I have an example `kresd` instance configured with the following policy:
policy.add(policy.all(policy.TLS_FORWARD({{'9.9.9.9', hostname="dns.quad9.net", ca_file="/etc/ssl/certs/ca-certificates.crt"}})))
If i make one request to this local `kresd` instance, it sets up the TLS session to `quad9`, exchanges traffic with it, and then (about 2 seconds later) it tears down the connection to `quad9`. TLS session creation and teardown is pretty high overhead, and the `quad9` servers tolerate significantly longer periods of idle time.
Barring a good reason for early teardown, a forwarding client should hold open a session for at least 20 seconds -- but this should probably also be an adjustable configuration for a forwarder as different forwarders may have different policies.
Note that the configuration choice for timeout for `kresd` as a client forwarding over TLS should be distinct from the configuration choice for the delay tolerated by `kresd` when operating as a TLS listener.https://gitlab.nic.cz/knot/knot-resolver/-/issues/314forwarding policy should be able to specify EDNS0 Client Subnet2019-12-18T19:15:02+01:00Daniel Kahn Gillmorforwarding policy should be able to specify EDNS0 Client SubnetThe [EDNS0 Client Subnet extension](https://tools.ietf.org/html/rfc7871#section-7.1.2) describes a way that a "stub resolver" can specify its preferred limit of how much the resolver should reveal to the authoritative about the client's ...The [EDNS0 Client Subnet extension](https://tools.ietf.org/html/rfc7871#section-7.1.2) describes a way that a "stub resolver" can specify its preferred limit of how much the resolver should reveal to the authoritative about the client's IP address.
A user may have a configured resolver that they trust enough to forward to, but not want the resolver to leak their IP address to the authoritative servers it looks up. If such a user is running `kresd` as a local caching stub with a forwarding policy, they might want to configure something like:
policy.FORWARD({'192.0.2.15', ecs_prefix_len=0})https://gitlab.nic.cz/knot/knot-resolver/-/issues/316improve cache performance with qname minimization2018-07-11T13:43:37+02:00Petr Špačekimprove cache performance with qname minimizationIt seems that resolver sends more queries than necessary.
Following list in format summarizes queries made by resolver 2.1.1 sent to upstream servers.
Apparently `corp.microsoft.com. NS` (which does not exist and is denied by insecure ...It seems that resolver sends more queries than necessary.
Following list in format summarizes queries made by resolver 2.1.1 sent to upstream servers.
Apparently `corp.microsoft.com. NS` (which does not exist and is denied by insecure SOA RR with TTL 3600 s) is not cached properly. @vcunat told me that this is related to caching qname minimization steps and that fix might not be trivial because of some interdependency on iterator implementation (blah).
Format: (count, qname, qtype as number)
```
1 BiTLOckErRecoVeRY.CORp.MICRoSofT.cOM 1
1 BiTloCKErreCoVERY.RMB.CORP.miCrOSoFt.COM 1
1 CO1-Na-DC-01.NOrthAmErICa.corp.MICRoSoFT.cOm 1
1 CO1w7fS01b.cOrP.MIcRoSOfT.cOM 1
1 Co1vfScluSt02.CORP.MIcRoSOFT.coM 1
1 Cy1-eU-dc-02.eUrope.CORp.MicrOsoFt.CoM 1
1 DB3-REd-dC-01.COrp.mICRosOfT.COm 1
1 DB3-eu-Dc-08.EuROpe.coRp.microsOFT.com 1
1 DB3WefpRoD1.eurOPe.corP.mIcrOSofT.COm 1
1 DB3wEFPROd10.EUROPe.CORp.MICROSOFT.cOM 1
1 Db3-Red-dC-04.CORp.MiCroSofT.COM 1
1 Db3-af-DC-02.aFRIcA.cORP.mICrOsOFt.com 1
1 Db3WEfpROD3.EuropE.COrP.MiCRosOFT.COM 1
1 Db3WeFPROd6.EUroPE.cOrP.MICrosofT.COm 1
1 Db3WeFprOd9.EuroPe.corP.MiCroSofT.COm 1
1 Db3wefPrOD4.eURoPE.CorP.micrOsofT.Com 1
1 Db3wefPrOd8.eUROpe.CoRp.mIcrosOfT.com 1
1 EURopE.CORp.micrOsOft.COm 6
1 EmeAcAT.EUROPE.corP.micROSOfT.COM 1
1 LS2WeB.RedMOnD.CorP.MiCrosOft.coM 1
1 UDE.GuesT.coRp.MIcroSOft.coM 1
1 UDE.LHWKsta.cOrP.MIcRosOfT.COm 1
1 UDE.SoUTHPaciFiC.coRp.mIcroSOFT.COM 1
1 UDE.rMB.CoRP.mICROSOFt.CoM 1
1 UDe.NoRTHaMerIcA.corp.MICROSOFt.COm 1
1 UDe.ReDmoND.coRP.mICrOsOFT.com 1
1 UDe.Sys-WiNGROUP.ntDEv.coRp.micROSoFT.coM 1
1 UdE.MIdDlEEaSt.cORp.miCroSoFT.cOM 1
1 UdE.SeGROup.wiNSE.cORP.MicROsOft.cOm 1
1 UdE.sOuthaMERiCa.CorP.MiCROsOFt.CoM 1
1 Ude.AfRiCA.coRP.miCROsoFT.COm 1
1 WPAD.NTDev.cOrp.mIcRoSofT.COM 1
1 WPaD.NOrthaMErIca.coRP.miCroSoFt.CoM 1
1 WPaD.afRica.coRp.mICROsofT.Com 1
1 WpAd.Sys-wIngROup.NTdEv.cOrP.mICRoSoFT.com 1
1 WpAd.midDlEeAST.cOrp.MICrosOft.COm 1
1 WpaD.MslPa.corp.MICROSofT.COM 1
1 WpaD.euroPe.cOrp.miCRoSofT.COM 1
1 Wpad.reDMond.cORP.micrOsOFT.COM 1
1 _LDAp._TcP.eU-iE-DuBdC._sItES.DC._MSdCs.fAreAst.cOrP.MiCrOsOFT.CoM 33
1 _LDaP._TcP.EU-IE-dUBdc._siTes.Dc._MsDCs.nOrTHaMEriCA.coRp.MicrOsoFT.COm 33
1 _LdAp._tcP.eu-iE-DUbDC._SItes.AFRIcA.corp.micRoSoFT.COM 33
1 _Ldap._TCp.eu-ie-DubDc._SitEs.farEast.CORp.miCroSOft.cOM 33
1 _lDaP._tcp.EU-ie-DUbdC._siTES.dC._MsDcs.a-jINOvo-NB2.EuropE.COrP.mIcRosoft.COM 33
1 _lDap._TCP.PDC._MSDCS.EuroPE.CorP.MicRoSOFT.COM 33
1 _ldAp._Tcp.Eu-Ie-DuBdc._SitEs.COrp.MiCROsoFT.cOm 33
1 a-jinoVO-nB2.EuRoPe.cOrP.MicrOSOfT.com 6
1 aZeu1MP03.EUrOPe.CoRP.MICrOsOfT.cOm 1
1 biTLOCKerrEcOvEry.GuEst.corp.mICroSOFT.com 1
1 cO1-fE-dC-05.fArEASt.cOrp.MicROSoFt.cOM 1
1 cY1Cdmvfs1.cOrp.MicrOsoFT.cOm 1
1 corp.MICROsOft.COm 6
1 dB3WEFprOD7.eURoPe.CoRP.miCROSoFt.cOM 1
1 dB3WefProd2.EuROPE.coRp.MICRosOft.COm 1
1 dR._dns-SD._UDp.COrP.MicrosOft.CoM 12
1 db3-eU-DC-03.eurOPe.cORP.Microsoft.CoM 1
1 db3WEfprOd5.europe.CoRP.MICRosOft.cOM 1
1 suhriN-dEvopS.eURope.cOrP.mICrOsoft.cOM 6
1 tRYlEK-z240.eUrOpE.COrp.MicROSoFt.CoM 6
1 uDE.CorP.MICRosOft.Com 1
1 uDE.MSlpA.CORP.MicROsoFt.cOm 1
1 uDE.wINSE.coRp.MICroSofT.Com 1
1 uDe.faREASt.corP.micRoSOfT.Com 1
1 udE.eUROpE.CORP.MiCROSoFt.Com 1
1 udE.ntDev.CORp.mICrosOFt.COm 1
1 wPAD.GUest.CoRp.mICROSOFt.cOm 1
1 wPAD.Lhwksta.cORP.MicRoSOfT.cOm 1
1 wPaD.sOUtHAmERiCa.CORP.MIcRosoFt.CoM 1
1 wPaD.wiNSe.corP.MICroSOft.COm 1
1 wpAD.cORP.microsOfT.cOM 1
1 wpAd.SouThpACiFIc.CORp.MICrOSOFt.cOm 1
1 wpAd.rMb.CORP.MIcROSOft.COm 1
1 wpaD.fAReAst.cORp.MIcROsOfT.CoM 1
1 wpaD.seGrOuP.wINsE.coRP.mICROsoFt.cOM 1
57 CORP.MIcRosoft.com 2
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/318map_set is used incorrectly on some places2018-05-03T17:06:32+02:00Vladimír Čunátvladimir.cunat@nic.czmap_set is used incorrectly on some placesProbably due to misleading API docs; when it returns 1, it's replaced the value, but sometimes we free the value afterwards assuming ENOMEM. Some `set_add` call sites might also be affected.Probably due to misleading API docs; when it returns 1, it's replaced the value, but sometimes we free the value afterwards assuming ENOMEM. Some `set_add` call sites might also be affected.https://gitlab.nic.cz/knot/knot-resolver/-/issues/326Use connected UDP sockets for outgoing queries2019-04-02T17:26:25+02:00Marek VavrusaUse connected UDP sockets for outgoing queriesIt'd be nice to use connected UDP sockets for outgoing queries as it makes it a little bit harder to spoof, and a bit more efficient as kernel can discard dgrams from different source addresses than the connected one.
Currently libuv do...It'd be nice to use connected UDP sockets for outgoing queries as it makes it a little bit harder to spoof, and a bit more efficient as kernel can discard dgrams from different source addresses than the connected one.
Currently libuv doesn't have a facility for connected UDP sockets, so I'm creating the issue to track this when it gets it: https://github.com/libuv/leps/pull/10https://gitlab.nic.cz/knot/knot-resolver/-/issues/346www.nrl.navy.mil. validation broken without query minimization2018-09-04T16:29:06+02:00Filip Sirokywww.nrl.navy.mil. validation broken without query minimizationValidation is broken without query minimization for www.nrl.navy.mil. after it was fixed with it in merge !543.
Kresd log:
[server.log](/uploads/199eaec49170e46882d23c12e6db646b/server.log)
Deckard scenario:
[gen_navy.rpl](/uploads/aaa4...Validation is broken without query minimization for www.nrl.navy.mil. after it was fixed with it in merge !543.
Kresd log:
[server.log](/uploads/199eaec49170e46882d23c12e6db646b/server.log)
Deckard scenario:
[gen_navy.rpl](/uploads/aaa46e764a232e811ee9d32813953325/gen_navy.rpl)https://gitlab.nic.cz/knot/knot-resolver/-/issues/347knot-resolver fails to build from source on hurd due to missing MAXPATHLEN2018-05-03T12:48:02+02:00Daniel Kahn Gillmorknot-resolver fails to build from source on hurd due to missing MAXPATHLENthe [debian hurd build daemon](https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=hurd-i386&ver=2.3.0-2&stamp=1524785893&raw=0) shows:
```
daemon/engine.c: In function 'engine_set_moduledir':
daemon/engine.c:231:15: error...the [debian hurd build daemon](https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=hurd-i386&ver=2.3.0-2&stamp=1524785893&raw=0) shows:
```
daemon/engine.c: In function 'engine_set_moduledir':
daemon/engine.c:231:15: error: 'MAXPATHLEN' undeclared (first use in this function); did you mean 'MAXNAMLEN'?
char l_paths[MAXPATHLEN] = { 0 };
^~~~~~~~~~
MAXNAMLEN
```
See [Justus Winter's thoughts on MAXPATHLEN](https://lists.debian.org/debian-hurd/2012/01/msg00166.html) about why this might not be something worth relying on.https://gitlab.nic.cz/knot/knot-resolver/-/issues/349How to localize, not forward queries for a domain name?2019-12-18T19:15:02+01:00jodaHow to localize, not forward queries for a domain name?Hi all,
I own a turris router and let's assume I own a domain name example.org.
My router's external interface has a public IP.
I setup a dynamic DNS record for the router home.example.org.
Any unknown sub-domain DNS record like x.y.exa...Hi all,
I own a turris router and let's assume I own a domain name example.org.
My router's external interface has a public IP.
I setup a dynamic DNS record for the router home.example.org.
Any unknown sub-domain DNS record like x.y.example.org resolves to IP of example.org.
Now I want a local DNS overlay provided by knot resolver (dnsmasq supports this feature).
All my devices at home get an non-public network IP address like:
```
host1.lan.example.org
host2.lan.example.org
host3.lan.example.org
```
Every internal network client gets an internal DNS name with the suffix lan.example.org.
knot-resolver automatically sets up the hostnames from DHCP leases provided by odhcpd.
I was unable to localize the queries for my local network. By localizing I mean that knot resolver resolves the quieries locally from it's "DHCP cache", but does not forward them to the upstream DNS servers.
Solution that fail to work:
1. Always forwarding: unknown (local) hostname like "does-not-exist.lan.example.org" returns external IP of example.org (instead of NXDOMAIN)
1. Blocking lan.example.org: If I used POLIC.DENY to block resolving hosts with suffix lan.example.org then "does-not-exist.lan.example.org" get NXDOMAIN, but all other hosts fail to resolve, too :-/
1. I tried using Policy.PASS for lan.example.org (in order to just look-up the DNS name up from "hints cache"), but that did not work properly. Either queries are not properly answered (not resolved against hints cache) or the DNS requests are send to the upstream DNS again (getting an reply with the external IP).
It's been a while since I spent time troubleshooting my setup.
Please tell me what information you need. I read a lot of your documentation, but failed to find a solution.
My custom.conf for kresd (aka. knot-resolver):
```
-- Comment
--local trace_rule = policy.add(policy.suffix(policy.QTRACE, {todname('lan.example.org.')}))
--policy.del(trace_rule.id)
--table.insert(policy.rules, 1, trace_rule)
-- local hints_rule = policy.add(policy.suffix(policy.all(kres.YIELD),{todname('lan.example.org.')}))
local hints_rule = policy.add(policy.suffix(function(state, req)
local qry = req:current()
--print('Local DNS ',qry.sname,' hints "','"')
--print('Local DNS "',ffi.string(qry.sname,sizeof(qry.sname)), '" hints ""')
--print('Local DNS ',qry.sname,' hints "',hints.get(qry.sname),'"')
--ffi.C.knot_dname_is_sub(qry.sname, todname('ip6.arpa.'))
--if hints.get(qry.sname) == '{ result: [] }' then
return policy.DENY
--else
-- return policy.PASS
--end
end
,{todname('lan.example.org.')}))
--local hints_rule = policy.add(policy.suffix(policy.FLAGS('',kres.query.ALWAYS_CUT),{todname('lan.example.org.')}))
policy.del(hints_rule.id)
table.insert(policy.rules, 1, hints_rule)
--policy.add(policy.all(policy.FORWARD({ '<upstream IPv4 DNS>' })))
```
Current Knot DNS Resolver, version 1.5.1 (turris OS v3.9.6).https://gitlab.nic.cz/knot/knot-resolver/-/issues/360make sure contrib/ does not get out of sync with libknot upstream2018-05-24T19:08:06+02:00Petr Špačekmake sure contrib/ does not get out of sync with libknot upstreamThis needs some clever idea how to compare against correct branch etc. See !588 for an example.This needs some clever idea how to compare against correct branch etc. See !588 for an example.https://gitlab.nic.cz/knot/knot-resolver/-/issues/363Avoid usage of DNS64_MARK query flag. Lua variable should be used instead as...2019-12-18T19:56:41+01:00Grigorii DemidovAvoid usage of DNS64_MARK query flag. Lua variable should be used instead as described in !533.https://gitlab.nic.cz/knot/knot-resolver/-/issues/364policy and statistics: improvements?2019-12-18T19:15:02+01:00Vladimír Čunátvladimir.cunat@nic.czpolicy and statistics: improvements?- [ ] UX. Each rule in the list of policies has a `.count`, but it's not much useful as it is. It's not exported in usual statistics and introspecting by hand makes it hard to read the list.
```
[rules] => {
[1] => {
[count...- [ ] UX. Each rule in the list of policies has a `.count`, but it's not much useful as it is. It's not exported in usual statistics and introspecting by hand makes it hard to read the list.
```
[rules] => {
[1] => {
[count] => 40698
[id] => 0
[cb] => function: 0xb69374b0
}
}
```
##### Consider collecting more statistics:
- [ ] RPZ rules might additionally collect a counter of matches for each RPZ file line. That seems relatively cheap on performance side, but it's difficult in the way the abstractions are done now, as the `[cb]` (above) knows nothing about the "parent table".
- [ ] Count of "secure" answer would be interesting, i.e. those that would set AD flag if requested. (ATM the state isn't well visible unless the request had DO or AD.)
- [ ] e.g. inspiration https://pi-hole.nethttps://gitlab.nic.cz/knot/knot-resolver/-/issues/380Losing requests on quit()2019-01-11T14:20:09+01:00Vladimír Čunátvladimir.cunat@nic.czLosing requests on quit()`quit()` and `SIGTERM` currently stop the UV loop immediately, so requests in progress will be lost. Plan:
- stop listening for new requests (might be tricky for TCP/TLS),
- wait for requests to finish (normally bounded by by 10s timeou...`quit()` and `SIGTERM` currently stop the UV loop immediately, so requests in progress will be lost. Plan:
- stop listening for new requests (might be tricky for TCP/TLS),
- wait for requests to finish (normally bounded by by 10s timeout already),
- add `quit(true)` for immediate quit.https://gitlab.nic.cz/knot/knot-resolver/-/issues/391DNSSEC turned off for the NS TTL when transitioning from insecure to secure zone2019-12-19T09:34:51+01:00Marek VavrusaDNSSEC turned off for the NS TTL when transitioning from insecure to secure zoneThe current validator mode is that it turns off DNSSEC on insecure delegations, which is fine, but it doesn't turn it back on after the DS is refetched, possibly because it's served from packet cache.
See https://lists.dns-oarc.net/pipe...The current validator mode is that it turns off DNSSEC on insecure delegations, which is fine, but it doesn't turn it back on after the DS is refetched, possibly because it's served from packet cache.
See https://lists.dns-oarc.net/pipermail/dns-operations/2018-August/017869.htmlVladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/392improve protection from NTP attacks2018-08-06T11:38:00+02:00Petr Špačekimprove protection from NTP attacksMaybe we can tune some parameters introduced in !392 to be more resilient. This needs more thought.
Sources:
* https://nlnetlabs.nl/downloads/presentations/The-impact-of-NTP-security-weaknesses-on-DNSSEC.pdf
* https://tools.ietf.org/htm...Maybe we can tune some parameters introduced in !392 to be more resilient. This needs more thought.
Sources:
* https://nlnetlabs.nl/downloads/presentations/The-impact-of-NTP-security-weaknesses-on-DNSSEC.pdf
* https://tools.ietf.org/html/draft-aanchal-time-implementation-guidance-00https://gitlab.nic.cz/knot/knot-resolver/-/issues/393cache open: handle EAGAIN2018-08-17T11:31:16+02:00Vladimír Čunátvladimir.cunat@nic.czcache open: handle EAGAIN... probably via random exponential backoff or something. Details: https://gitter.im/CZ-NIC/knot-resolver?at=5b73e162a37112689c21348b... probably via random exponential backoff or something. Details: https://gitter.im/CZ-NIC/knot-resolver?at=5b73e162a37112689c21348bhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/394cache.get(): resurrect the lua API2018-09-12T15:32:37+02:00Vladimír Čunátvladimir.cunat@nic.czcache.get(): resurrect the lua APIThis wasn't finished in https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/633, but there are some ideas how the API might look like.This wasn't finished in https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/633, but there are some ideas how the API might look like.https://gitlab.nic.cz/knot/knot-resolver/-/issues/395Validating glue information before using it2020-11-24T16:48:50+01:00Vladimír Čunátvladimir.cunat@nic.czValidating glue information before using itThe NS and address records given on delegation are not signed (except in special cases where that's possible). There's still some slight risk of attacking that place. For on-path attackers that gives no advantage (as far as I know), fo...The NS and address records given on delegation are not signed (except in special cases where that's possible). There's still some slight risk of attacking that place. For on-path attackers that gives no advantage (as far as I know), for off-path attackers we have the shared protection via randomization of query ID + port + qname case (= 30–40 bits of entropy usually).
If an attacker manages to change glue to IPs they control, they basically get elevated to an on-path attacker, i.e. they can:
- observe all queries,
- easily extend the attack to whole subtree,
- arbitrarily change the legitimately unsigned parts,
- reliably DoS any chosen signed parts.
Most of this could be mitigated by first using glue to validate itself and only using it for real queries after that. With [parallel queries](https://gitlab.labs.nic.cz/knot/knot-resolver/issues/36) this seems realistic, as fetching the NS and addresses could be done all at once with fetching the zone's DNSKEY, so probably not add any latency in usual setting. _Before that I'd expect a few RTT slow-down for each uncached zone cut, i.e. way too much._https://gitlab.nic.cz/knot/knot-resolver/-/issues/403Restrict how long a delegation can be refreshed in cache2020-02-28T09:55:02+01:00Marek VavrusaRestrict how long a delegation can be refreshed in cacheCurrently the NS record for domain delegation can be refreshed in cache with queries arriving near it's expiration time. This is good because the NS record can be prefetched ahead of time, but it also means when a domain moves to a diffe...Currently the NS record for domain delegation can be refreshed in cache with queries arriving near it's expiration time. This is good because the NS record can be prefetched ahead of time, but it also means when a domain moves to a different DNS provider, resolver will never know as long as the NS record is getting refreshed from child side of the delegation, as it will never go back to the TLD to check if the zone delegation changed.
In order to fix this, the resolver will have to track how was the NS record cached. One possible solution is to add an inception time which would only be updated when NS record first enters cache from it's parent, or restrict the amount of times a record can be updated before it's expired, or just prevent NS records from being updated until they're fully expired.
What's the best way to fix this?https://gitlab.nic.cz/knot/knot-resolver/-/issues/404incorrect handling of EDNS version 1+2019-07-09T17:12:25+02:00Petr Špačekincorrect handling of EDNS version 1+Apparently we do not return BADVERS as we should:
```
$ dig +nocookie +rec +noad +edns=1 +noednsneg +ednsopt=100 soa isc.org. @1.1.1.1
; <<>> DiG 9.13.0-dev <<>> +nocookie +rec +noad +edns=1 +noednsneg +ednsopt=100 soa isc.org. @1.1.1....Apparently we do not return BADVERS as we should:
```
$ dig +nocookie +rec +noad +edns=1 +noednsneg +ednsopt=100 soa isc.org. @1.1.1.1
; <<>> DiG 9.13.0-dev <<>> +nocookie +rec +noad +edns=1 +noednsneg +ednsopt=100 soa isc.org. @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20124
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;isc.org. IN SOA
;; ANSWER SECTION:
isc.org. 6914 IN SOA ns-int.isc.org. hostmaster.isc.org. 2018092500 7200 3600 24796800 3600
;; Query time: 16 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Mon Oct 01 13:40:13 CEST 2018
;; MSG SIZE rcvd: 90
```
Test suite:
https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing
run `genreport -R` with input like:
`nic.cz. resolver.test. 1.1.1.1`
Output at the moment:
```
nic.cz. @1.1.1.1 (resolver.test.): dns=ok edns=ok edns1=noerror,badversion,soa edns@512=ok ednsopt=ok edns1opt=noerror,badversion,soa do=ok ednsflags=ok optlist=ok signed=ok,yes ednstcp=ok
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/405Improving TCP/TLS timer logic for long-lived connections2018-10-31T15:51:37+01:00BaptisteImproving TCP/TLS timer logic for long-lived connectionsI am testing long-lived client connections to Knot resolver over TCP or TLS.
Currently, the idle timeout is quite short: `kresd` closes a client TCP connection after just a few seconds when no request is made. While investigating this p...I am testing long-lived client connections to Knot resolver over TCP or TLS.
Currently, the idle timeout is quite short: `kresd` closes a client TCP connection after just a few seconds when no request is made. While investigating this part of the code, I found that the idle timeout strategy is quite complex, and mixes up the timeout values for "downstream" TCP connections and "upstream" TCP connections (while in reality, they have very different requirements).
Below is an attempt at documenting the current behaviour, so that we can discuss how to improve it.
This is related to #311 (short idle timeout for outgoing TLS connections) and #378 ("unificate processing of inbound and outbound TCP connections where it possible")https://gitlab.nic.cz/knot/knot-resolver/-/issues/417support prefilling for arbitrary zone2024-02-28T12:12:23+01:00Petr Špačeksupport prefilling for arbitrary zoneUlrich from IIS requested feature which would allow them to prefill resolver's cache with arbitrary zone, i.e. not only root zone.
Technical note:
Simple removal of checks for zone name does not work because `DS` records are missing in ...Ulrich from IIS requested feature which would allow them to prefill resolver's cache with arbitrary zone, i.e. not only root zone.
Technical note:
Simple removal of checks for zone name does not work because `DS` records are missing in cache and this lead to failing validation. Maybe we can just wrap import in a function which requests `DS` and calls import from query callback?https://gitlab.nic.cz/knot/knot-resolver/-/issues/425Too many requests for DNSKEY2018-11-29T17:36:42+01:00Ivana KrumlovaToo many requests for DNSKEYwhen it uses unsupported algorithm (DSA).
Happens on this rpl test:
[val_noadwhennodo.rpl](/uploads/e3e52c6d62772621faa8047cd247ea00/val_noadwhennodo.rpl)
Server log:
[server.log](/uploads/27aff279562f79da370b5b2de67a1d5d/server.log)when it uses unsupported algorithm (DSA).
Happens on this rpl test:
[val_noadwhennodo.rpl](/uploads/e3e52c6d62772621faa8047cd247ea00/val_noadwhennodo.rpl)
Server log:
[server.log](/uploads/27aff279562f79da370b5b2de67a1d5d/server.log)https://gitlab.nic.cz/knot/knot-resolver/-/issues/429negative trust anchor does not prevent NXDOMAIN from aggressive cache2020-04-06T09:52:56+02:00Petr Špačeknegative trust anchor does not prevent NXDOMAIN from aggressive cacheRight now aggressive cache masks "grafted" domains, e.g. fake TLDs, even if these are listed as negative trust anchors.
This is unexpected behavior and forces users to use `NO_CACHE` which is not optimal. In future we should exempt NTAs...Right now aggressive cache masks "grafted" domains, e.g. fake TLDs, even if these are listed as negative trust anchors.
This is unexpected behavior and forces users to use `NO_CACHE` which is not optimal. In future we should exempt NTAs from aggressive cache.https://gitlab.nic.cz/knot/knot-resolver/-/issues/430"=> going insecure because there's no covering TA" message2018-12-14T12:48:27+01:00Ivana Krumlova"=> going insecure because there's no covering TA" messageDeckard often prints this at the beginning of the log, even on tests where data are DNSSEC-validated correctly.
Maybe this is a problem in kresd logging or something like that.
for example:
log:
```deckard.py 364 DEBUG...Deckard often prints this at the beginning of the log, even on tests where data are DNSSEC-validated correctly.
Maybe this is a problem in kresd logging or something like that.
for example:
log:
```deckard.py 364 DEBUG [00000.00][plan] plan 'b.example.com.' type 'DS' uid [36622.00]
deckard.py 364 DEBUG [36622.00][iter] 'b.example.com.' type 'DS' new uid was assigned .01, parent uid .00
deckard.py 364 DEBUG [36622.01][resl] => going insecure because there's no covering TA
deckard.py 364 DEBUG [36622.01][resl] => using root hints
deckard.py 364 DEBUG [36622.01][iter] 'b.example.com.' type 'DS' new uid was assigned .02, parent uid .00
deckard.py 364 DEBUG [36622.02][resl] => id: '50568' querying: '193.0.14.129' score: 10 zone cut: '.' qname: 'b.EXampLe.COm.' qtype: 'DS' proto: 'udp'
deckard.py 364 DEBUG [36622.02][iter] <= answer received:
deckard.py 364 DEBUG ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 50568
deckard.py 364 DEBUG ;; Flags: qr QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 2
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; EDNS PSEUDOSECTION:
deckard.py 364 DEBUG ;; Version: 0; flags: ; UDP size: 1280 B; ext-rcode: Unused
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; QUESTION SECTION
deckard.py 364 DEBUG b.example.com. DS
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; AUTHORITY SECTION
deckard.py 364 DEBUG com. 3600 NS a.gtld-servers.net.
deckard.py 364 DEBUG
deckard.py 364 DEBUG [36622.02][iter] <= loaded 1 glue addresses
deckard.py 364 DEBUG [36622.02][iter] <= referral response, follow
deckard.py 364 DEBUG [36622.02][cach] => stashed com. NS, rank 002, 36 B total, incl. 0 RRSIGs
deckard.py 364 DEBUG [36622.02][cach] => stashed also 1 nonauth RRsets
deckard.py 364 DEBUG [36622.02][resl] <= server: '193.0.14.129' rtt: 103 ms
deckard.py 364 DEBUG [36622.02][iter] 'b.example.com.' type 'DS' new uid was assigned .03, parent uid .00
deckard.py 364 DEBUG [36622.03][resl] => id: '52885' querying: '192.5.6.30' score: 10 zone cut: 'com.' qname: 'b.EXampLe.CoM.' qtype: 'DS' proto: 'udp'
deckard.py 364 DEBUG [36622.03][iter] <= answer received:
deckard.py 364 DEBUG ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 52885
deckard.py 364 DEBUG ;; Flags: qr QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 2
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; EDNS PSEUDOSECTION:
deckard.py 364 DEBUG ;; Version: 0; flags: ; UDP size: 1280 B; ext-rcode: Unused
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; QUESTION SECTION
deckard.py 364 DEBUG b.example.com. DS
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; AUTHORITY SECTION
deckard.py 364 DEBUG example.com. 3600 NS ns.example.com.
deckard.py 364 DEBUG
deckard.py 364 DEBUG [36622.03][iter] <= loaded 1 glue addresses
deckard.py 364 DEBUG [36622.03][iter] <= referral response, follow
deckard.py 364 DEBUG [36622.03][cach] => stashed example.com. NS, rank 002, 32 B total, incl. 0 RRSIGs
deckard.py 364 DEBUG [36622.03][cach] => stashed also 1 nonauth RRsets
deckard.py 364 DEBUG [36622.03][resl] <= server: '192.5.6.30' rtt: 5 ms
deckard.py 364 DEBUG [36622.03][iter] 'b.example.com.' type 'DS' new uid was assigned .04, parent uid .00
deckard.py 364 DEBUG [36622.04][resl] >< TA: 'example.com.'
deckard.py 364 DEBUG [36622.04][plan] plan 'example.com.' type 'DNSKEY' uid [36622.05]
deckard.py 364 DEBUG [36622.05][iter] 'example.com.' type 'DNSKEY' new uid was assigned .06, parent uid .04
deckard.py 364 DEBUG [36622.06][cach] => no NSEC* cached for zone: example.com.
deckard.py 364 DEBUG [36622.06][cach] => skipping zone: example.com., NSEC, hash 0;new TTL -123456789, ret -2
deckard.py 364 DEBUG [36622.06][cach] => skipping zone: example.com., NSEC, hash 0;new TTL -123456789, ret -2
deckard.py 364 DEBUG [36622.06][resl] => id: '19571' querying: '1.2.3.4' score: 10 zone cut: 'example.com.' qname: 'EXaMPlE.Com.' qtype: 'DNSKEY' proto: 'udp'
deckard.py 364 DEBUG [36622.06][iter] <= answer received:
deckard.py 364 DEBUG ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 19571
deckard.py 364 DEBUG ;; Flags: qr QUERY: 1; ANSWER: 2; AUTHORITY: 2; ADDITIONAL: 3
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; EDNS PSEUDOSECTION:
deckard.py 364 DEBUG ;; Version: 0; flags: do; UDP size: 1280 B; ext-rcode: Unused
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; QUESTION SECTION
deckard.py 364 DEBUG example.com. DNSKEY
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; ANSWER SECTION
deckard.py 364 DEBUG example.com. 3600 DNSKEY 256 3 7 AwEAAef0Gt81KzrbFGbFmk6VeEzLLcRbnKiDjdMBO7R+HsQWCO9YpPGx20mBEV7ISCLva+LZulf584i30ga7qMeVsarsdh9xCYtyMXd4Ex5nMEXxV9f2Or+FjihPduL2TnAlWpvL8oc1oKVI2RISTT1yf8IYy6X/FpfmMP819WBN2Kit
deckard.py 364 DEBUG example.com. 3600 RRSIG DNSKEY 7 2 3600 20181230101851 20181130101851 16907 example.com. RPXAcaVjBdtk/geHTdTg9ZOKREpAdjZAopRE/5Kk9fdFYQWwg0uRxexLPJ11jXjnp9MKOp1FehctyvE/mm1lB/J6+YepHu3tRAzzJ9YfjVxJjUppQv/nA/fU55MHWYhdhXwKn7F+PXD8+MFlAqPyFz9mYZEO89lI4P2/Wf4xpv4=
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; AUTHORITY SECTION
deckard.py 364 DEBUG example.com. 3600 NS ns.example.com.
deckard.py 364 DEBUG example.com. 3600 RRSIG NS 7 2 3600 20181230101851 20181130101851 16907 example.com. KXsKhCme80OQl4qekE+q0KvymkhEelk+OdOsajCsGmfG5eeCEkN58gVw5fBgtR2Ekp15KLsV1elsyVL8i7W5Hp5f2G70/plqSQ+78n3Al5jXONgNoVFSOuf8N179F2uf3k20MpnlxQQ7W/VX6SpuAOejyVpp6il6dm2YwRHHnX4=
deckard.py 364 DEBUG
deckard.py 364 DEBUG [36622.06][iter] <= loaded 1 glue addresses
deckard.py 364 DEBUG [36622.06][iter] <= rcode: NOERROR
deckard.py 364 DEBUG [36622.06][vldr] <= parent: updating DNSKEY
deckard.py 364 DEBUG [36622.06][vldr] <= answer valid, OK
deckard.py 364 DEBUG [36622.06][cach] => stashed example.com. DNSKEY, rank 060, 314 B total, incl. 1 RRSIGs
deckard.py 364 DEBUG [36622.06][cach] => not overwriting A ns.example.com.
deckard.py 364 DEBUG [36622.06][resl] <= server: '1.2.3.4' rtt: 7 ms
deckard.py 364 DEBUG [36622.04][iter] 'b.example.com.' type 'DS' new uid was assigned .07, parent uid .00
deckard.py 364 DEBUG [36622.07][resl] => id: '04066' querying: '1.2.3.4' score: 11 zone cut: 'example.com.' qname: 'b.EXAmPLE.cOM.' qtype: 'DS' proto: 'udp'
deckard.py 364 DEBUG [36622.07][iter] <= answer received:
deckard.py 364 DEBUG ;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 4066
deckard.py 364 DEBUG ;; Flags: qr aa QUERY: 1; ANSWER: 0; AUTHORITY: 4; ADDITIONAL: 1
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; EDNS PSEUDOSECTION:
deckard.py 364 DEBUG ;; Version: 0; flags: do; UDP size: 1280 B; ext-rcode: Unused
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; QUESTION SECTION
deckard.py 364 DEBUG b.example.com. DS
deckard.py 364 DEBUG
deckard.py 364 DEBUG ;; AUTHORITY SECTION
deckard.py 364 DEBUG example.com. 86394 SOA ns.iana.org. nstld.iana.org. 2007092000 1800 900 604800 86400
deckard.py 364 DEBUG example.com. 86394 RRSIG SOA 7 2 86394 20181230101851 20181130101851 16907 example.com. uQjgfvlcxQLPfqetqWjTgKTbDOK3BoqbdmrqudrEl/X/S3OR8uhTQu7PEsrJm7IP7lmKcsbF4LAFjBNRp28G4at8v5cnCpvZfKFDzO3JzCubaVnn18rSZj9gM1e4CN5ms/aAlr5I2hDhIQnsKmhxQBTrngyTcpGgf/YQuruMRKw=
deckard.py 364 DEBUG *.example.com. 3600 NSEC *.b.example.com. A MX RRSIG NSEC
deckard.py 364 DEBUG *.example.com. 86400 RRSIG NSEC 7 2 86400 20181230101851 20181130101851 16907 example.com. 5NyjMTv7p0jvYrfxQzTJXvTlf1Uy2tMSmYKEWZoBq87u6mLNBtRgpKl91gpVvT8o+uA2XAznujnFZYgLdE9Swk87KqQQSWkyM81458SuSVwB5hma9afCrB38FH9D9aOCN1nfqIuoEsQi3Bu3Uvtr+eV7oE97ViROSy/1pyyKg9A=
deckard.py 364 DEBUG
deckard.py 364 DEBUG [36622.07][iter] <= rcode: NOERROR
deckard.py 364 DEBUG [36622.07][vldr] <= DS doesn't exist, going insecure
deckard.py 364 DEBUG [36622.07][vldr] <= answer valid, OK
deckard.py 364 DEBUG [36622.07][cach] => stashed *.example.com. NSEC, rank 060, 204 B total, incl. 1 RRSIGs
deckard.py 364 DEBUG [36622.07][cach] => stashed example.com. SOA, rank 060, 228 B total, incl. 1 RRSIGs
deckard.py 364 DEBUG [36622.07][cach] => nsec_p stashed for example.com. (new, hash: 0)
deckard.py 364 DEBUG [36622.07][resl] <= server: '1.2.3.4' rtt: 7 ms
deckard.py 364 DEBUG [36622.07][resl] AD: request classified as SECURE
deckard.py 364 DEBUG [36622.07][resl] finished: 4, queries: 2, mempool: 16400 B
scenario.py 536 INFO [ RANGE 0-100 ] {'192.5.6.30'} received: 1 sent: 1
scenario.py 536 INFO [ RANGE 0-100 ] {'193.0.14.129'} received: 1 sent: 1
scenario.py 536 INFO [ RANGE 0-100 ] {'1.2.3.4'} received: 2 sent: 2
. [100%]
1 passed, 1 skipped in 1.32 seconds```
from test [val_mal_wc.rpl](https://gitlab.labs.nic.cz/knot/deckard/blob/master/sets/resolver/val_mal_wc.rpl)https://gitlab.nic.cz/knot/knot-resolver/-/issues/433DNSSEC validation failing for empty subsubdomain2021-12-13T14:29:10+01:00Ivana KrumlovaDNSSEC validation failing for empty subsubdomaintest: [val_anchor_nx.rpl](/uploads/48b7d6e4bd7cea788a7497622812280e/val_anchor_nx.rpl)
zone:[example.com.zone.signed](/uploads/7d89cae3239747a49b306a182ef80531/example.com.zone.signed)
log: [server.log](/uploads/947010f4266c53e8cd6940b...test: [val_anchor_nx.rpl](/uploads/48b7d6e4bd7cea788a7497622812280e/val_anchor_nx.rpl)
zone:[example.com.zone.signed](/uploads/7d89cae3239747a49b306a182ef80531/example.com.zone.signed)
log: [server.log](/uploads/947010f4266c53e8cd6940b3441e30bf/server.log)Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/446test huge pages2019-02-06T17:37:48+01:00Petr Špačektest huge pagesWe might test if some variant of huge pages can help with performance ... it is of uncertainly value but it is one more idea we can test during benchmarking.
See https://fosdem.org/2019/schedule/event/hugepages_databases/We might test if some variant of huge pages can help with performance ... it is of uncertainly value but it is one more idea we can test during benchmarking.
See https://fosdem.org/2019/schedule/event/hugepages_databases/https://gitlab.nic.cz/knot/knot-resolver/-/issues/455ugly uv_foo_t * casts all over the place2019-03-12T12:47:18+01:00Vladimír Čunátvladimir.cunat@nic.czugly uv_foo_t * casts all over the placeThe following discussion from !786 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/786#note_100831): (+2 comments)
> Wondering out loud: Would it be nicer if ...The following discussion from !786 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/786#note_100831): (+2 comments)
> Wondering out loud: Would it be nicer if we used union for this? That would avoid explicit retyping all over the place ...https://gitlab.nic.cz/knot/knot-resolver/-/issues/459maintenance daemon2022-05-08T12:13:54+02:00Petr Špačekmaintenance daemonKnot Resolver has bunch of tasks which need to be done only once, so it does not make much sense to do them from all workers independently.
Examples:
- [x] cache cleanup - #257
- [ ] cache import - `zimport` into cache
- [ ] TLS certif...Knot Resolver has bunch of tasks which need to be done only once, so it does not make much sense to do them from all workers independently.
Examples:
- [x] cache cleanup - #257
- [ ] cache import - `zimport` into cache
- [ ] TLS certificate maintenance (DNS-over-TLS, HTTP module)
- [ ] TLS ticket rotation
- [ ] RFC 5011
- [ ] TA bootstrap
... and possibly others.
In long term we might create a "maintenance" daemon which could take care of these tasks so they would not block worker threads (it would also avoid duplication of tasks).
This would require means to communicate between maintenance daemon and workers.https://gitlab.nic.cz/knot/knot-resolver/-/issues/471FORMERR for bad packets2020-10-02T11:06:36+02:00Vladimír Čunátvladimir.cunat@nic.czFORMERR for bad packetsCurrently a request from client is either accepted or _ignored_. We should return `FORMERR` for packets where header looks like DNS.Currently a request from client is either accepted or _ignored_. We should return `FORMERR` for packets where header looks like DNS.https://gitlab.nic.cz/knot/knot-resolver/-/issues/473validate: NSEC proofs can confuse NXDOMAIN with NODATA2019-04-30T12:39:49+02:00Vladimír Čunátvladimir.cunat@nic.czvalidate: NSEC proofs can confuse NXDOMAIN with NODATA[Real-life example](https://gitlab.labs.nic.cz/knot/knot-resolver/issues/462#note_104852).
The records get into aggressive cache that doesn't suffer from this bug, so only the first answer can be wrong. So far I can see no security imp...[Real-life example](https://gitlab.labs.nic.cz/knot/knot-resolver/issues/462#note_104852).
The records get into aggressive cache that doesn't suffer from this bug, so only the first answer can be wrong. So far I can see no security implications of exchanging NODATA with NXDOMAIN.https://gitlab.nic.cz/knot/knot-resolver/-/issues/475daemon: support AF_UNIX for Do53 and DoT sockets?2020-11-24T16:29:44+01:00Vladimír Čunátvladimir.cunat@nic.czdaemon: support AF_UNIX for Do53 and DoT sockets?I split that away from [AF_UNIX for the other sockets](https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/811), because I saw some assumptions in worker and/or session code, and there's been no demand so far. In particular, a ...I split that away from [AF_UNIX for the other sockets](https://gitlab.labs.nic.cz/knot/knot-resolver/merge_requests/811), because I saw some assumptions in worker and/or session code, and there's been no demand so far. In particular, a different libuv handle type would have to be used for AF_UNIX (third case added to UDP and TCP).https://gitlab.nic.cz/knot/knot-resolver/-/issues/479NOERROR from pre-RFC 2308 servers is treated as lame2019-05-23T16:40:41+02:00Petr ŠpačekNOERROR from pre-RFC 2308 servers is treated as lameKnot Resolver 4.0.0 does not accept NOERROR answers from pre-RFC 2308 auths, i.e. auths which do not send SOA RR in AUTHORITY section of NOERROR answer.
Example from live Internet:
```
resolve('blogs.cisco.com', kres.type.AAAA, kres.c...Knot Resolver 4.0.0 does not accept NOERROR answers from pre-RFC 2308 auths, i.e. auths which do not send SOA RR in AUTHORITY section of NOERROR answer.
Example from live Internet:
```
resolve('blogs.cisco.com', kres.type.AAAA, kres.class.IN, {}, function(pkt) print(pkt) end)
```
...
```
[65537.22][iter] 'blogs.glb-ext.cisco.com.' type 'AAAA' new uid was assigned .25, parent uid .00
[65537.25][resl] => id: '43849' querying: '72.163.5.22#00053' score: 10 zone cut: 'glb-ext.cisco.com.' qname: 'BLogS.glb-eXT.CiscO.Com.' qtype: 'AAAA' proto: 'udp'
[65537.25][iter] <= answer received:
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 43849
;; Flags: qr cd QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 1280 B; ext-rcode: Unused
;; QUESTION SECTION
blogs.glb-ext.cisco.com. AAAA
[65537.25][iter] <= rcode: NOERROR
[65537.25][iter] <= lame response: non-auth sent negative response
```
This seems to be caused by `is_authoritative()` in lib/layer/iterate.c.https://gitlab.nic.cz/knot/knot-resolver/-/issues/481FORWARD/TLS_FORWARD: support forwarding to hostname, DANE2019-12-18T19:15:02+01:00Tomas KrizekFORWARD/TLS_FORWARD: support forwarding to hostname, DANEThe `FORWARD` / `TLS_FORWARD` policies currently require an IP address as a target. Instead, a hostname could be provided. However, the initial bootstrap + handling TTL could be quite complex.
If the bootstrap + TTL problem would be sol...The `FORWARD` / `TLS_FORWARD` policies currently require an IP address as a target. Instead, a hostname could be provided. However, the initial bootstrap + handling TTL could be quite complex.
If the bootstrap + TTL problem would be solved, `TLS_FORWARD` could also support DANE [RFC8310#section8.2](https://tools.ietf.org/html/rfc8310#section-8.2)https://gitlab.nic.cz/knot/knot-resolver/-/issues/483DNS64 does not synthesise if AAAA query fails but A query works2019-12-18T19:56:41+01:00Petr ŠpačekDNS64 does not synthesise if AAAA query fails but A query worksQuery for `internetbanken.privat.nordea.se. AAAA` ends up with SERVFAIL because it is broken on the authoritative side, but query `internetbanken.privat.nordea.se. A` succeeds.
https://tools.ietf.org/html/rfc6147#section-5.1.2 seems to ...Query for `internetbanken.privat.nordea.se. AAAA` ends up with SERVFAIL because it is broken on the authoritative side, but query `internetbanken.privat.nordea.se. A` succeeds.
https://tools.ietf.org/html/rfc6147#section-5.1.2 seems to specify (using pretty convoluted language), that any failure in AAAA resolving should trigger A subquery and DNS64 synthesis.
This was reported during RIPE 78 meeting because some people were not able to reach their bank website.
I can see two problems with current DNS64 module (as in Knot Resolver 4.0.0):
- Failed AAAA query does not trigger synthesis, e.g. if we get SERVFAIL. This should be easy to fix.
- AAAA query which fails because of all NS servers do not respond for AAAA query will not call `consume()` layer in module, and thus DNS64 module does not get a chance to do A query and synthesis. This will be harder to fix.https://gitlab.nic.cz/knot/knot-resolver/-/issues/488can't reliably fetch stats when using SO_REUSEPORT2020-06-15T09:35:13+02:00Jean-Danielcan't reliably fetch stats when using SO_REUSEPORTI'm using knot resolver with systemd, and want to use the stats module + http module to fetch stats in prometheus format.
My problem is that if I start more that one instance (kresd@1, kresd@2, …), stats fetching requests are distribute...I'm using knot resolver with systemd, and want to use the stats module + http module to fetch stats in prometheus format.
My problem is that if I start more that one instance (kresd@1, kresd@2, …), stats fetching requests are distributed among the instances and returns only the stats from the answering instance.
I can't get a reliable way to fetch the stats in such configuration.
Workaround:
I can fetch and aggregate individual workers stats from the controls sockets, but the control socket is very unreliable (it is not able to properly parse 2 successives queries properly and often try to interpret them as a single query).https://gitlab.nic.cz/knot/knot-resolver/-/issues/494support running behind NAT64?2019-07-31T13:35:41+02:00Vladimír Čunátvladimir.cunat@nic.czsupport running behind NAT64?Minor use case: _running_ kresd on a machine without native IPv4. (maybe)
While DNS servers tend to have much higher rate of IPv6 support than (say) HTTP servers, there are still problems, e.g. [Fastly CDN](https://fastly.net) is a long...Minor use case: _running_ kresd on a machine without native IPv4. (maybe)
While DNS servers tend to have much higher rate of IPv6 support than (say) HTTP servers, there are still problems, e.g. [Fastly CDN](https://fastly.net) is a long-standing example that doesn't have any IPv6 glue.
- - -
So far one wants at least improve NS selection algorithm by:
```lua
net.ipv4 = false
```
or work around the problem by forwarding to someplace with native IPv4.https://gitlab.nic.cz/knot/knot-resolver/-/issues/511Add VRF support2020-02-05T14:57:09+01:00krombelAdd VRF supportI try to run knot-resolver in a vrf.
I want the /metric endpoint to be accessible only internally but resolving DoH/DoT over a vrf-interface which is meant to be for external requests.I try to run knot-resolver in a vrf.
I want the /metric endpoint to be accessible only internally but resolving DoH/DoT over a vrf-interface which is meant to be for external requests.https://gitlab.nic.cz/knot/knot-resolver/-/issues/517OCSP stapling for server side2019-12-18T15:28:32+01:00Vladimír Čunátvladimir.cunat@nic.czOCSP stapling for server sideOCSP stapling seems to make much sense for server side as well, at least at a quick look.OCSP stapling seems to make much sense for server side as well, at least at a quick look.https://gitlab.nic.cz/knot/knot-resolver/-/issues/532OCSP stapling for client side2019-12-18T15:28:32+01:00Petr ŠpačekOCSP stapling for client sideFor client side (TLS_FORWARD) we could get inspired by [`kdig +tls-ocsp-stapling`](https://github.com/CZ-NIC/knot/pull/13).For client side (TLS_FORWARD) we could get inspired by [`kdig +tls-ocsp-stapling`](https://github.com/CZ-NIC/knot/pull/13).https://gitlab.nic.cz/knot/knot-resolver/-/issues/534CI: test server selection algorithm2019-12-18T19:41:28+01:00Petr ŠpačekCI: test server selection algorithmImplement https://gitlab.labs.nic.cz/knot/maze/ into Knot Resolver's CI.
Ideas:
- Gitlab shell executor in a VM with sudo access (yuck!)
- shell executor to a VM with a systemd build which contains https://github.com/systemd/systemd/pul...Implement https://gitlab.labs.nic.cz/knot/maze/ into Knot Resolver's CI.
Ideas:
- Gitlab shell executor in a VM with sudo access (yuck!)
- shell executor to a VM with a systemd build which contains https://github.com/systemd/systemd/pull/138232020 Q1Štěpán BalážikŠtěpán Balážikhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/535declarative policy module and other user-supplied DNS data2024-02-28T12:15:39+01:00Petr Špačekdeclarative policy module and other user-supplied DNS dataCurrent problem
---------------
Our current imperative policy module is using chain of Lua functions: This is quite slow and hard to use for non-programmers.
Proposal
--------
Design a new method to configure "policies", preferably in a...Current problem
---------------
Our current imperative policy module is using chain of Lua functions: This is quite slow and hard to use for non-programmers.
Proposal
--------
Design a new method to configure "policies", preferably in a declarative way. By "policies" I mean a generic way to influence resolving and inject user-supplied data into DNS tree or block other stuff.
A declarative way should be more intuitive to use than writing Lua functions, and also faster if we design it right.
Here is incomplete list of stuff we might want to express.
- [x] ability to also block sub-queries, e.g. when following CNAMEs (#217)
- [ ] ability to block RR data - e.g. rebinding protection, blacklist of NS names etc. (#523)
- [x] ACLs (including negative ACLs, #370)
- [x] merge views with other policies (see also #445)?
- [x] redirecting specific zones to user-configured servers (#428, !651)
- [ ] beware that we need also port number, not just IP address
- [x] theoretical "helper" NS+glue records from kresd config should not be retrievable from outside
- FORWARDing
- TLS forwarding has many knobs and might need even more: #481
- do we still need STUB policy? If so see #218
- FORWARDing might need exceptions for some subtrees (see e.g. https://lists.nlnetlabs.nl/pipermail/unbound-users/2019-December/006560.html)
- generally special EDNS tricks: #314, #303; also improve #657
- special cache semantics (do not cache this sub-tree, limit TTL in this sub-tree)
- maybe DNS64 module should be merged with policies and ACLs: #368
- [x] maybe hints module should be merged in as well (see also #205, #349)
- [x] maybe also a way to provide other user-supplied data - #540
* (well, more ways can always be added)
- maybe prefill module should be merged as well (see also #417)
- think of interaction with daf module (beware of #183)
* `@vcunat` would prefer to deprecate DAF,
but theoretically we could think of translating DAF rules into the new policy rules
- design should be able to support full strength of RPZ (example of a problem: #194)
* the most common features are in 6.0.x – CNAME redirection in particular, and interacting well with other rules (multiple rules of different kinds can trigger when jumping through CNAME chains)
- design needs to support efficient mechanism which mimicks RPZ with zone transfer including IXFR(!) (#195)
- build mechanism for better visibility into policies (#364)
- it needs to work with huge lists (apparently users want to have long block lists, see https://lists.nlnetlabs.nl/pipermail/unbound-users/2019-December/006559.html)
* improved in 6.0.x: shared inside LMDB across all processes, but efficiency of restarts/reloads/updates could be significantly improved (as of 6.0.6)
- [x] open question: at which stage should the module kick in? Can it be e.g. used to implement `ignore-cd-flag` policy as seen in Unbound?
* the `view:` part can be used to set such options, though there's no ignore-cd in particular so far
- per-domain setting for rate-limits e.g. like `ratelimit-below-domain`, `ratelimit-for-domain` etc. like in Unbound
* [ ] first per-user changes in rate-limits in `views:` (when we have any rate-limiting)
- [x] special handling for reserved and local-only names: see #205 and think it through2020 Q2https://gitlab.nic.cz/knot/knot-resolver/-/issues/537module API redesign2020-11-30T17:52:59+01:00Petr Špačekmodule API redesignProblem statement
-----------------
- Current module API is not well defined and does not provide sufficient abstraction
- As a result, modules are not isolated and must know about internals of other modules (e.g. modules resetting reque...Problem statement
-----------------
- Current module API is not well defined and does not provide sufficient abstraction
- As a result, modules are not isolated and must know about internals of other modules (e.g. modules resetting request state must also reset `req.*_selected` arrays)
- Mixing wire-format-generating modules with modules relying on `req.*_selected` arrays leads to weird bugs (one example: !842, !851, !859)
- Lua modules seem to be slow (because of the way how C code calls Lua?)
Related tickets
---------------
- #363 Modules need generic way to persist own state
- #432 Modules need ability to not respond at all (for response rate limiting)
- #483 Modules currently cannot generate answer if no NS is responding
- #447 New server selection system should expose and use API instead of being hard-wired
- #396 SERVFAIL answer can still contain bogus RRsets
- #471 low-level protocol stuff is hard-coded (incorrectly)
- #36 make sure new API does not get in the way when implementing parallel queries
- #527 modules need a way to cooperate with fine-grained logging
- #418 engine object access - I don't know if this requirement will be still valid after redesign, but let's think about it
- #264 error reporing from modules sucks
- #234 a way to cooperate between modules??? e.g. for DNAME support???
- attempt to move `reorder_RR()` into module, ideally in a form of policy action so it can be triggered on per-client basis - what API would be necessary?
Objective
---------
Design a new API for modules in a way which prevents bugs stemming from bad API usage from ever repeating again.
Implementation is expected to be a long-term project, but we need proper design first. Hopefully #447, #535 and other tasks planned for 2020 will provide us sufficient experience for better API design.2020 Q4https://gitlab.nic.cz/knot/knot-resolver/-/issues/548Support for DoQ | DNS over QUIC2023-11-15T09:26:55+01:00Gaspard d'HautefeuilleSupport for DoQ | DNS over QUICHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFHHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFHhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/551client retry logic on TCP/TLS connection closure2020-10-22T13:58:57+02:00Vladimír Čunátvladimir.cunat@nic.czclient retry logic on TCP/TLS connection closureWhen remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/h...When remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/html/rfc7766#section-6.2.4):
> DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses.
On the other hand servers SHOULD not close the connections early, without reasons for the particular case... so hopefully this won't happen that often in practice; [FRITZ!](https://forum.turris.cz/t/dns-over-tcp-just-a-single-transaction/12003/11) seems a notable case. _I'll keep copying the important points from that discussion to here._https://gitlab.nic.cz/knot/knot-resolver/-/issues/568Some cases of DNS resolution from lua fail if OS provides only IPv6 resolvers2020-04-24T10:04:07+02:00Vladimír Čunátvladimir.cunat@nic.czSome cases of DNS resolution from lua fail if OS provides only IPv6 resolversConditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootst...Conditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootstrapping – both only after !894 (kresd >= 5.0.0).
Result example:
```
[prefill] fetch of `https://www.internic.net/domain/root.zone` failed: HTTP client library error: A non-recoverable error occurred when attempting to resolve the name (-1684960053)), will retry root zone download in 09 minutes 59 seconds
```
This is a problem in lua libraries that we've chosen to use: https://github.com/wahern/dns/issues/23https://gitlab.nic.cz/knot/knot-resolver/-/issues/569clarify respdiff job names in CI2020-10-19T11:16:35+02:00Petr Špačekclarify respdiff job names in CIMostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.Mostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.https://gitlab.nic.cz/knot/knot-resolver/-/issues/573net.tls() allow usage of multiple certificates2020-10-08T11:43:59+02:00Tomas Krizeknet.tls() allow usage of multiple certificatesECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, i...ECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, if both ECC and RSA certificates could be used simulataneously.https://gitlab.nic.cz/knot/knot-resolver/-/issues/578test aggressive cache on NSEC3PARAM rotation2020-08-20T10:05:40+02:00Vladimír Čunátvladimir.cunat@nic.cztest aggressive cache on NSEC3PARAM rotationI don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:...I don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:
- NSEC3PARAM is the [data collected](https://tools.ietf.org/html/rfc5155#section-4.2) but it's taken from NSEC3 records directly.
- For this purpose, using NSEC is like one more possible NSEC3PARAM configuration.
- Reading from cache is designed to consider the last two NSEC3PARAMs that's been written for that zone.
- Code reference: identifiers containing `nsec_p`.https://gitlab.nic.cz/knot/knot-resolver/-/issues/583new statistics for encrypted transports2020-06-19T14:17:50+02:00Petr Špačeknew statistics for encrypted transportsIt would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connect...It would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connection reuse?https://gitlab.nic.cz/knot/knot-resolver/-/issues/588control socket drops long outputs2020-09-17T13:22:45+02:00Petr Špačekcontrol socket drops long outputsControl socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | soca...Control socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | socat - unix-connect:$(ls control/*) | wc -c
223362
```
I.e. the output is truncated after 223362 bytes. This value is not a constant, it varies. Expected output should be 1024*1024*10 bytes `a` + 2x2 bytes of prompt `> `.
Strace:
```
read(23, "__binary\nstring.rep('a', 1024*10"..., 65536) = 40
dup(23) = 24
fcntl(24, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
write(24, "\0\240\0\1aaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096) = 4096
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10481664) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10262400) = 109632
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10152768) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 9933504) = -1 EAGAIN (Resource temporarily unavailable)
close(24) = 0
```
The whole `io_tty_process_input()` function is a mess and should be refactored into smaller pieces, and most importantly rewritten to use libuv for writes as well.https://gitlab.nic.cz/knot/knot-resolver/-/issues/589document threat model2020-07-11T22:10:59+02:00Petr Špačekdocument threat model- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not ...- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not count on it, encrypting only DNS traffic does not hide ithttps://gitlab.nic.cz/knot/knot-resolver/-/issues/590document bug reporting procedure2020-07-10T14:10:23+02:00Petr Špačekdocument bug reporting procedure- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...https://gitlab.nic.cz/knot/knot-resolver/-/issues/59364-bit ARM: remaining issues2020-10-01T10:53:36+02:00Santiago64-bit ARM: remaining issues(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
...(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
- - -
#### Original post
Hi there,
It seems to be known that kresd doesn't work on arm64, but I haven't found this particular build error document (so sorry for the possible noise). knot-resolver 5.1.x doesn't build on Debian due to a luajit error (bad light userdata pointer). The full build log is in https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
And this is the relevant part:
````
...
Message: --- config_tests dependencies ---
Running command: /usr/bin/luajit -l cqueues -e os.exit(0)
--- stdout ---
--- stderr ---
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
../tests/meson.build:27:4: ERROR: Problem encountered: missing luajit package: cqueues
````
Cheers,
-- Santiagohttps://gitlab.nic.cz/knot/knot-resolver/-/issues/598ability to reload ssl certificate on certificate change2020-11-25T09:26:51+01:00TomVnzability to reload ssl certificate on certificate changeI was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls ...I was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls = true, cert = "\<CERT\>", key = "\<KEY\>"}, '<webmgmt|doh>') --for DoH|webmgmt
net.listen('0.0.0.0', 53, { kind = 'dns' })
net.listen('0.0.0.0', 443, { kind = 'doh' })
net.listen('0.0.0.0', 853, { kind = 'tls' })
net.listen('0.0.0.0', 8453, { kind = 'webmgmt' })
net.tls("\<CERT\>", "\<KEY\>") --for DoT
</code>
But, if knot-resolver is running as unprivileged user then it can't rebind to privileged ports. And this needs to be scripted somehow.
An alternative way would be for the process that creates the new SSL certificates to restart knot-resolver but then that process would need to run as root.
So for now, I'm using a custom systemd path / service combo to monitor certificate file for any changes and then reload knot-resolver that way.
Would be keen to know of any thoughts to simplyfy this, or even the ability to reload the certificate could be added into knot-resolver itself - I know rpz files are monitored and reloaded when changed so this seems somewhat similar.https://gitlab.nic.cz/knot/knot-resolver/-/issues/602cache size exposed in Lua API can get out of sync2020-11-04T11:53:33+01:00Petr Špačekcache size exposed in Lua API can get out of syncThis is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should ...This is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168309): (+1 comment)
> I wonder if `cache.current_size` returns correct size if some rounding took place inside the backend.https://gitlab.nic.cz/knot/knot-resolver/-/issues/603cache: get rid of mdb_env_sync()2020-09-07T17:52:07+02:00Petr Špačekcache: get rid of mdb_env_sync()Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/kno...Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169608): (+1 comment)
> Out of curiosity, why the sync is necessary here?https://gitlab.nic.cz/knot/knot-resolver/-/issues/604cache: zero-downtime restart is not supported across versions which change ca...2020-11-04T11:53:33+01:00Petr Špačekcache: zero-downtime restart is not supported across versions which change cache format/versionCurrently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in ...Currently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in release notes when it is necessary to stop all instances before starting new ones.
See rest of the [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683): (+1 comment)
> I wonder how this magic would work in situation where:
> - kresd instances 1+2 are running version 5.y.z with cache in /var/cache/knot-resolver
> - kresd binary gets updated to version 6.0.0
> - admin restarts instance 1 first (according to https://knot-resolver.readthedocs.io/en/v5.1.2/systemd-multiinst.html#zero-downtime-restarts) and restarts instance 2 later
> I guess instance 2 would not detect this unless cache overflows, so most likely instance 2 will write data in old format into cache versioned by version 6.0.0.
>
> Am I correct?
>
> If so I think we should open issue and keep it in mind for future cache rewrite/migration to custom data structure.https://gitlab.nic.cz/knot/knot-resolver/-/issues/605cache: explore better ways to detect cache changes made by other processes2020-11-04T11:53:32+01:00Petr Špačekcache: explore better ways to detect cache changes made by other processeskresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following ...kresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310): (+1 comment)
> Why not use https://docs.libuv.org/en/v1.x/guide/filesystem.html#file-change-events ?https://gitlab.nic.cz/knot/knot-resolver/-/issues/606incorporate DNS Shotgun into kresd CI2020-10-30T11:55:49+01:00Petr Špačekincorporate DNS Shotgun into kresd CIThe following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scena...The following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scenation into pytests/connection tests?https://gitlab.nic.cz/knot/knot-resolver/-/issues/615disallow mixing protocols in net.listen()2022-02-16T07:24:37+01:00Tomas Krizekdisallow mixing protocols in net.listen()Due to our reuseport facility, it is possible to use `net.listen()` to bind multiple protocols to a single (ip, port) combination. I can't think of any valid use-case and the most likely cause - typo - will cause misbehavior instead of a...Due to our reuseport facility, it is possible to use `net.listen()` to bind multiple protocols to a single (ip, port) combination. I can't think of any valid use-case and the most likely cause - typo - will cause misbehavior instead of a crash.
```
-- this isn't valid or supported
net.listen('::1', 443, { kind = 'tls' })
net.listen('::1', 443, { kind = 'doh2' })
```
I think the resolver should crash in these cases.https://gitlab.nic.cz/knot/knot-resolver/-/issues/621always keep RRSIG and its RRset in single data structure2020-10-07T18:04:01+02:00Petr Špačekalways keep RRSIG and its RRset in single data structureProblem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all as...Problem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all associated RRSIGs (multiple of them!).
Investigation how this could be done in most efficient way is needed.
Maybe this approach could be beneficial also to libknot/Knot DNS so let's not forget to talk to them.
Cc @lpeltan @dsalzman and gang.https://gitlab.nic.cz/knot/knot-resolver/-/issues/623declarative config - Lua API extension2020-11-25T13:22:36+01:00Vaclav Sraierdeclarative config - Lua API extensionI would like to open a discussion as a follow up after #536. The problem remains and this proposal attempts to fix it differently.
# Problem (re)statement
Current configuration is practically a Lua program, which is a nightmare for mul...I would like to open a discussion as a follow up after #536. The problem remains and this proposal attempts to fix it differently.
# Problem (re)statement
Current configuration is practically a Lua program, which is a nightmare for multiple reasons:
* non-programmers have hard time understanding what is going on
* Lua language makes it hard to detect mistakes in the config
* run-time reconfiguration requires doing each change N times for N processes
* currently it exposes low-level stuff and it prone to crashes on invalid use (#182)
# Proposal
## kresd
We could extend kresd API with the following function:
```lua
--- Sets the resolver to supplied state regardless of what was configured
--- before. Options that aren't specified in the argument are set to their
--- default value
---
--- @param cfg Table corresponding to the existing YANG model
function configure(cfg)
```
And optionally with this:
```lua
--- Returns a table corresponding to the existing YANG model with the current
--- configuration.
function dump_configuration()
```
### Motivation
* extends existing API, this change will not break any existing setup
* works with simple data formats so it is quite feasible to implement the whole functionality in pure Lua
* Updates of policies or other large data might be performed by the existing API, side-stepping the new configuration functions, alleviating performance issues with the declarative API.
* relatively simple to implement
* new file configuration format might be easily added later on allowing direct declarative configuration
* implements foundation for dynamically reloadable configuration - adding it on top of the declarative configuration (in the previous bullet point) would be quite straightforward
### Known issues
* At least some validation of the data format must be present in every kresd instance. By exposing these functions publicly, there is no way to go around that. An option might be to make something very similar but private. Then a centralized configuration tool (see bellow) could do the validation eliminating the need for validation by every instance.
### To be considered
* Is it really a good idea to use Lua tables as the configuration format? Lua is not backward compatible between releases which might lead to potential problems. Using JSON instead might be more future proof and it might integrate better with existing tools.
* Do we really want to stick to the existing Lua API? Wouldn't it be better to implement something completely new allowing us to ditch the existing API at some point in the future?
## Centralized management of multiple instances
To enable centralized management of multiple instances, a separate tool can be developed utilizing both new functions described above. It could provide any type of external API (NETCONF, REST API, sysrepo, different centralized configuration file...) and bridge it to our two new functions, calling them for all resolver instances as necessary. We could even implement this in a form of a library for commanding all kresd instances on the system at once, leaving the external API implementation up to interested parties in their specific technologies.
Basics of this were already written by @amrazek in the form of the `kres-watcher` tool.https://gitlab.nic.cz/knot/knot-resolver/-/issues/624Graph not shown in web management (webmgmt)2020-10-12T09:33:09+02:00Ghost UserGraph not shown in web management (webmgmt)I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae0...I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae028abbd19e8a33a6544c70f393970e/knot-webmgmt-0.png)
![knot-webmgmt-1](/uploads/8d37984831fc956075084c5588472ad9/knot-webmgmt-1.png)
**Error Log:**
```
DevTools failed to load SourceMap: Could not load content for http://127.0.0.1:8053/dist/dygraph.min.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
dygraph.min.js:5 Can't plot empty data set
Q.parseArray_ @ dygraph.min.js:5
Q.start_ @ dygraph.min.js:5
Q.__init__ @ dygraph.min.js:4
Q @ dygraph.min.js:4
(anonymous) @ kresd.js:89
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 jQuery.Deferred exception: chartElement is not defined ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (http://127.0.0.1:8053/kresd.js:357:2)
at mightThrow (http://127.0.0.1:8053/jquery.js:2:15044)
at process (http://127.0.0.1:8053/jquery.js:2:15698) undefined
jQuery.Deferred.exceptionHook @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 Uncaught ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (kresd.js:357)
at mightThrow (jquery.js:2)
at process (jquery.js:2)
(anonymous) @ kresd.js:357
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
jQuery.readyException @ jquery.js:2
(anonymous) @ jquery.js:2
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
```
**Knot Resolver Configuration:**
```
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
net.listen('127.0.0.1', 8053, { kind = 'webmgmt' })
-- Load useful modules
modules = {
'policy',
'http'
}
-- Cache size
cache.size = 1 * GB
-- Forward to upstream servers (8.8.8.8 and 1.1.1.1) using DoT
policy.add(policy.all(policy.TLS_FORWARD({
{'8.8.8.8', hostname='dns.google'},
{'1.1.1.1', hostname='cloudflare-dns.com'}
})))
```
**Knot Resolver Version:**
```
root@engine:/etc/knot-resolver# apt-cache policy knot-resolver
knot-resolver:
Installed: 5.1.3-2
Candidate: 5.1.3-2
Version table:
*** 5.1.3-2 500
500 http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolver-latest/xUbuntu_20.04 Packages
100 /var/lib/dpkg/status
3.2.1-3ubuntu2 500
500 http://kambing.ui.ac.id/ubuntu focal/universe amd64 Packages
```
Thank You.https://gitlab.nic.cz/knot/knot-resolver/-/issues/625DNS64: recognize ipv4only.arpa (RFC 8880)2022-01-04T12:00:41+01:00Vladimír Čunátvladimir.cunat@nic.czDNS64: recognize ipv4only.arpa (RFC 8880)> Forwarding or iterative recursive resolvers that have been explicitly configured to perform DNS64 address synthesis in support of a companion NAT64 gateway (i.e., "DNS64 recursive resolvers") MUST recognize 'ipv4only.arpa' as special.
...> Forwarding or iterative recursive resolvers that have been explicitly configured to perform DNS64 address synthesis in support of a companion NAT64 gateway (i.e., "DNS64 recursive resolvers") MUST recognize 'ipv4only.arpa' as special.
but there might be more requirements to follow in the [RFC 8880](https://www.rfc-editor.org/rfc/rfc8880.html).
EDIT: to be clear, resolvers not configured to do DNS64 synthesis SHOULD NOT recognize these names as special.https://gitlab.nic.cz/knot/knot-resolver/-/issues/626Can't validate `k.root-servers.net A` with minimization off and cold cache.2020-12-20T18:15:44+01:00Štěpán BalážikCan't validate `k.root-servers.net A` with minimization off and cold cache.Reproducer:
```lua
option('NO_MINIMIZE', true)
-- maybe wait a bit for priming to end
cache.clear()
verbose(true)
-- dig +dnssec @resolver k.root-servers.net A
```
```
[00000.00][plan] plan 'k.root-servers.net.' type 'A' uid [35628.00]...Reproducer:
```lua
option('NO_MINIMIZE', true)
-- maybe wait a bit for priming to end
cache.clear()
verbose(true)
-- dig +dnssec @resolver k.root-servers.net A
```
```
[00000.00][plan] plan 'k.root-servers.net.' type 'A' uid [35628.00]
[35628.00][iter] 'k.root-servers.net.' type 'A' new uid was assigned .01, parent uid .00
[35628.01][resl] => using root hints
[35628.01][iter] 'k.root-servers.net.' type 'A' new uid was assigned .02, parent uid .00
[35628.02][resl] >< TA: '.'
[35628.02][plan] plan '.' type 'DNSKEY' uid [35628.03]
[35628.03][iter] '.' type 'DNSKEY' new uid was assigned .04, parent uid .02
[35628.04][resl] => id: '54250' querying: '2001:500:a8::e#00053' score: 10 zone cut: '.' qname: '.' qtype: 'DNSKEY' proto: 'udp'
[35628.04][resl] => id: '54250' querying: '192.203.230.10#00053' score: 10 zone cut: '.' qname: '.' qtype: 'DNSKEY' proto: 'udp'
[35628.04][iter] <= rcode: NOERROR
[35628.04][vldr] <= parent: updating DNSKEY
[35628.04][vldr] <= answer valid, OK
[35628.04][cach] => stashed . DNSKEY, rank 060, 824 B total, incl. 1 RRSIGs
[ta_signal_query] signalling query trigered: _ta-4f66.
[35628.04][resl] <= server: '2001:500:a8::e' rtt: >= 229 ms
[35628.04][resl] <= server: '192.203.230.10' rtt: 29 ms
[35628.02][iter] 'k.root-servers.net.' type 'A' new uid was assigned .05, parent uid .00
[35628.05][resl] => id: '03562' querying: '192.203.230.10#00053' score: 29 zone cut: '.' qname: 'K.roOt-seRVers.NEt.' qtype: 'A' proto: 'udp'
[00000.00][plan] plan '_ta-4f66.' type 'NULL' uid [65566.00]
[65566.00][iter] '_ta-4f66.' type 'NULL' new uid was assigned .01, parent uid .00
[65566.01][resl] => using root hints
[65566.01][iter] '_ta-4f66.' type 'NULL' new uid was assigned .02, parent uid .00
[65566.02][resl] >< TA: '.'
[65566.02][plan] plan '.' type 'DNSKEY' uid [65566.03]
[65566.03][iter] '.' type 'DNSKEY' new uid was assigned .04, parent uid .02
[65566.04][cach] => satisfied by exact RRset: rank 060, new TTL 172800
[65566.04][iter] <= rcode: NOERROR
[65566.04][vldr] <= parent: updating DNSKEY
[65566.04][vldr] <= answer valid, OK
[65566.02][iter] '_ta-4f66.' type 'NULL' new uid was assigned .05, parent uid .00
[65566.05][resl] => id: '37696' querying: '2001:500:2f::f#00053' score: 10 zone cut: '.' qname: '_ta-4F66.' qtype: 'NULL' proto: 'udp'
[35628.05][iter] <= rcode: NOERROR
[35628.05][vldr] >< cut changed, needs revalidation
[35628.05][resl] <= server: '192.203.230.10' rtt: 21 ms
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][plan] plan 'net.' type 'DS' uid [35628.06]
[35628.06][iter] 'net.' type 'DS' new uid was assigned .07, parent uid .05
[35628.07][resl] => id: '15869' querying: '2001:500:1::53#00053' score: 10 zone cut: '.' qname: 'NEt.' qtype: 'DS' proto: 'udp'
[65566.05][resl] => id: '37696' querying: '192.5.5.241#00053' score: 10 zone cut: '.' qname: '_ta-4F66.' qtype: 'NULL' proto: 'udp'
[35628.07][resl] => id: '15869' querying: '198.97.190.53#00053' score: 10 zone cut: '.' qname: 'NEt.' qtype: 'DS' proto: 'udp'
[65566.05][iter] <= rcode: NXDOMAIN
[65566.05][vldr] <= answer valid, OK
[65566.05][cach] => stashed . NSEC, rank 060, 308 B total, incl. 1 RRSIGs
[65566.05][cach] => stashed . SOA, rank 060, 358 B total, incl. 1 RRSIGs
[65566.05][cach] => nsec_p stashed for . (new, hash: 0)
[65566.05][resl] <= server: '2001:500:2f::f' rtt: >= 225 ms
[65566.05][resl] <= server: '192.5.5.241' rtt: 25 ms
[65566.05][resl] AD: request classified as SECURE
[65566.05][resl] finished: 4, queries: 2, mempool: 98352 B
[35628.07][iter] <= rcode: NOERROR
[35628.07][vldr] <= DS: OK
[35628.07][vldr] <= parent: updating DS
[35628.07][vldr] <= answer valid, OK
[35628.07][cach] => stashed net. DS, rank 060, 330 B total, incl. 1 RRSIGs
[35628.07][resl] <= server: '2001:500:1::53' rtt: >= 250 ms
[35628.07][resl] <= server: '198.97.190.53' rtt: 50 ms
[35628.05][resl] >< TA: '.'
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][plan] plan 'net.' type 'DS' uid [35628.08]
[35628.08][iter] 'net.' type 'DS' new uid was assigned .09, parent uid .05
[35628.09][cach] => satisfied by exact RRset: rank 060, new TTL 86400
[35628.09][iter] <= rcode: NOERROR
[35628.09][vldr] <= DS: OK
[35628.09][vldr] <= parent: updating DS
[35628.09][vldr] <= answer valid, OK
[35628.05][resl] >< TA: '.'
[35628.05][resl] => resuming yielded answer
[35628.05][vldr] >< no valid RRSIGs found: k.root-servers.net. A (0 matching RRSIGs, 0 expired, 0 not yet valid, 0 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[35628.05][vldr] <= continuous revalidation, fails
[35628.05][cach] => stashed k.root-servers.net. A, rank 027, 20 B total, incl. 0 RRSIGs
[35628.05][cach] => not overwriting A k.root-servers.net.
[35628.00][resl] request failed, answering with empty SERVFAIL
[35628.05][resl] finished: 8, queries: 3, mempool: 49200 B
```
And we get an empty SERVFAIL as an answer. :(https://gitlab.nic.cz/knot/knot-resolver/-/issues/629early detection for dropped answers over TCP connection2021-12-08T10:24:06+01:00Petr Špačekearly detection for dropped answers over TCP connectionProblem
=======
Currently individual DNS queries over TCP connection do not have per-query timer and we leave to TCP stack to handle packet loss. This works fine for network-level problems but does not work for queries dropped at applica...Problem
=======
Currently individual DNS queries over TCP connection do not have per-query timer and we leave to TCP stack to handle packet loss. This works fine for network-level problems but does not work for queries dropped at application-level.
Issue seen in the field: #551
I.e. queries are dropped on server side and clients get SERVFAIL once the whole TCP connection times out.
Another instance of this problem is Unbound's default limit for number of queries resolved in parallel over a single TCP connection: Before commit https://github.com/NLnetLabs/unbound/commit/f81d0ac0474cc8904e1240a512b935c8e466f81b Unbound would process only 32 queries in parallel and keep other queries on the same TCP connection hanging, potentially leading to long periods without responses.
Vague proposal
==============
- Use per-query timeout also for queries over TCP/TLS/HTTPS and evaluate if the query should be resent using other transport if it times out.
- Detect "suspicious" TCP connection states when deduplicating connections and skip over "suspicious" connections. For example, do not reuse connection if it has queries hanging on it for longer than 3 seconds.
TODO: Is there some other TCP-level tunning we can do?
Related: #447https://gitlab.nic.cz/knot/knot-resolver/-/issues/630daf: improve multi-instance support2020-10-23T12:02:33+02:00Tomas Krizekdaf: improve multi-instance supportCurrently, the DAF module can work when using multiple instances, but only as long as:
- all the instances are started before any rules are configured
- no instance is ever separately restarted (or crashes)
This could be improved by:
- ...Currently, the DAF module can work when using multiple instances, but only as long as:
- all the instances are started before any rules are configured
- no instance is ever separately restarted (or crashes)
This could be improved by:
- using deterministic IDs that are tied to the rule (e.g. a hash)
- have some mechanism that can be used to pull/push the entire current configuration instead of a single update (to sync an instance state with others after restart)https://gitlab.nic.cz/knot/knot-resolver/-/issues/631remove deprecated -f/--forks option2020-10-27T17:13:01+01:00Tomas Krizekremove deprecated -f/--forks optionProblems with `--forks` feature:
- Does not support dynamic restart (related: #268)
- Does not support watchdog
- First process is single point of failure
- Per-instance configuration via environment variables is harder
- Fixing this pra...Problems with `--forks` feature:
- Does not support dynamic restart (related: #268)
- Does not support watchdog
- First process is single point of failure
- Per-instance configuration via environment variables is harder
- Fixing this practically means re-implementing systemd or supervisord, which is obviously a bad idea.
Related: #529
Task list:
- [ ] remove `-f` option and related forking code
- [ ] `worker.count` should also be removed
- [ ] remove -f usage from all testing scripts, deckard, respdiff etc.
- [ ] update our benchmakring docker image to be able to run multiple kresd instances without `-f`6.0.0https://gitlab.nic.cz/knot/knot-resolver/-/issues/632control protocol redesign2020-10-27T17:39:35+01:00Petr Špačekcontrol protocol redesignVersion affected: 5.2.0
Current control protocol has several deficiencies:
- Input commands are read as text, individual commands are delimited with `\n` byte. This prevents user from sending multi-line commands or their parameters beca...Version affected: 5.2.0
Current control protocol has several deficiencies:
- Input commands are read as text, individual commands are delimited with `\n` byte. This prevents user from sending multi-line commands or their parameters because the embedded `\n` breaks implicit command boundaries.
- Output is always string from `table_print()`. Consequently:
- control protocol cannot represent e.g. Lua errors - these lead to empty output.
- sending structured data to another instance is PITA as it has to be serialized into string before it is returned to `table_print()`, and this serialized string is then (again) decorated by `table_print()` with string delimiters `'`
I don't know what's best approach to address this but I think it is worth exploring existing solutions (protobuf? something else?) before inventing our own serialization format and control protocol.https://gitlab.nic.cz/knot/knot-resolver/-/issues/635ci: add respdiff tests for XDP2020-10-30T15:21:26+01:00Tomas Krizekci: add respdiff tests for XDPXDP should be tested on real interfaces, which requires some changes to respdiff configuration (using real interface instead of loopback, root privileges, ...). This might be easier to achieve once we simplify our testing infrastructure....XDP should be tested on real interfaces, which requires some changes to respdiff configuration (using real interface instead of loopback, root privileges, ...). This might be easier to achieve once we simplify our testing infrastructure. (https://gitlab.nic.cz/knot/knot-resolver-ansible/-/issues/3)https://gitlab.nic.cz/knot/knot-resolver/-/issues/637cache: sharing across containers requires special options2022-11-18T16:56:08+01:00Petr Špačekcache: sharing across containers requires special optionsVersion: 5.1.3 originally but any version really
Error
=====
```
[cache] LMDB error: Resource temporarily unavailable
[cache] LMDB error: Resource temporarily unavailable
[cache] incompatible cache database detected, purging
[cache] rea...Version: 5.1.3 originally but any version really
Error
=====
```
[cache] LMDB error: Resource temporarily unavailable
[cache] LMDB error: Resource temporarily unavailable
[cache] incompatible cache database detected, purging
[cache] reading version returned: -11
[system] interactive mode
[00000.00][plan] plan '.' type 'NS' uid [65536.00]
[65536.00][iter] '.' type 'NS' new uid was assigned .01, parent uid .00
[cache] LMDB error: Resource temporarily unavailable
[65536.01][cach] => exact hit error: -11 Resource temporarily unavailable
```
Reproducer
==========
Attempt to share cache across two or more Docker containers:
```
docker run -P -w /tmp/kresd -v /tmp/shared:/tmp/kresd -ti cznic/knot-resolver:v5.1.3
```
Minimal reproducer without Docker: Run two processes using command
```
unshare -Up --fork kresd
```
Root cause
==========
This is caused by LMDB dependency on unique PID numbers (for reader slots?). This assumption does not hold for Docker containers (because of its use of PID namespaces). LMDB upstream [does not seem to care](https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/thread/TL4XPCHRRGBV6SWBQIARC6E5XZNJ4SDX/).
Workaround
==========
Disable PID namespace, i.e. run Docker containers using `docker run --pid=host`, which prevents non-unique PIDs among containers.
Alternative is to run additional containers with the same PID namespace as the first container using `docker run --pid=container:name_of_the_first_container`, but disadvantage is that exiting the first container will terminate all others as well. I.e. this prevents dynamic instance restarts.https://gitlab.nic.cz/knot/knot-resolver/-/issues/638[discussion] cache backend redesign2020-12-04T16:34:21+01:00Petr Špaček[discussion] cache backend redesignLet's discuss problems we have with current LMDB-based cache backend. We need to analyze if these are fixable or we need to redesign cache backend.
Problems with LMDB itself
- Database overfill leads to irrecoverable state where while D...Let's discuss problems we have with current LMDB-based cache backend. We need to analyze if these are fixable or we need to redesign cache backend.
Problems with LMDB itself
- Database overfill leads to irrecoverable state where while DB practically becomes read only and the only ways forward are either enlarge database or delete it. Together with inability to detect if committing a transaction will lead to this state prevents us from reliably keeping cache with constant size, leading to race conditions in overflow handling etc. (#605)
- Transactions have [undefined limits](https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/message/VI7K5NWV46J6DACITXVS7X2SM3HZIXVB/) on them, forcing us to [jump through hoops](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042/diffs?commit_id=c651fbf24017f26435b86e69e9ce73c7f5976b97).
- LMDB depends on unique PID values - this assumption does not hold when sharing cache across containers (#637).
Other cache-related problems: #602, #604https://gitlab.nic.cz/knot/knot-resolver/-/issues/647server selection: collect and use TCP connection information2021-11-08T13:39:08+01:00Štěpán Balážikserver selection: collect and use TCP connection informationThe following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184337): (+3 comments)
> I'm either blind or it is not used anywher...The following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184337): (+3 comments)
> I'm either blind or it is not used anywhere. Can you point me to the place where it gets used, please?
`tcp_waiting` and `tcp_connected` and respective function and its calls have been commented out (in 6ef74faf922c5962401747b5aa3a9e01e92e50ff) until we use this information in the server selection process.
This will ultimately be related to #629 for example.https://gitlab.nic.cz/knot/knot-resolver/-/issues/648server selection: implement a way to do asynchronous NS name resolution2020-11-30T14:11:28+01:00Štěpán Balážikserver selection: implement a way to do asynchronous NS name resolutionThe following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184348): (+6 comments)
> I do not see this flag in use. Is it inten...The following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184348): (+6 comments)
> I do not see this flag in use. Is it intentional?https://gitlab.nic.cz/knot/knot-resolver/-/issues/651dnstap module spawns a thread2020-12-07T11:00:36+01:00Vladimír Čunátvladimir.cunat@nic.czdnstap module spawns a threadThat's not consistent with kresd architecture, though I can't think of a particular reason why it might cause a problem. Note that this thread will get spawned for each kresd process, so it might be a bit wasteful.
We might prefer to r...That's not consistent with kresd architecture, though I can't think of a particular reason why it might cause a problem. Note that this thread will get spawned for each kresd process, so it might be a bit wasteful.
We might prefer to rewrite the module by utilizing the shared libuv loop (to know when socket is ready to receive more data), but maybe the [fstrm tools](https://farsightsec.github.io/fstrm/overview.html) don't provide good support for that. If we drop the thread, this library might not be worth depending on anymore (as the framing is trivial).https://gitlab.nic.cz/knot/knot-resolver/-/issues/654insufficient caching of some uncommon wildcards2020-12-11T09:46:52+01:00Vladimír Čunátvladimir.cunat@nic.czinsufficient caching of some uncommon wildcardsIn an NSEC3-signed zone, if a wildcard is nested deeper than directly under the apex, positive expansions from it may not be cached properly (but they succeed). Testing example: `foo.t.cunat.cz AAAA`.
The issue is that aggressive cache...In an NSEC3-signed zone, if a wildcard is nested deeper than directly under the apex, positive expansions from it may not be cached properly (but they succeed). Testing example: `foo.t.cunat.cz AAAA`.
The issue is that aggressive cache thinks it needs to additionally provide an NSEC3 record matching the closest (provable) encloser, but that's not true in this case (because the wildcard record proves encloser's existence). This NSEC3 record must exist but resolver probably hasn't obtained it, so synthesis from cache (usually) fails.
Fortunately, typical wildcard usage I see is directly under the apex `*.example.com`. We may also be "saved" by queries for non-existing types on the same name (e.g. AAAA), as those need this NSEC3 record and thus the only downside would be its "unneeded" addition into the corresponding positive wildcard expansions.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.cz