Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2023-07-04T12:35:27+02:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/769failure to start the manager2023-07-04T12:35:27+02:00Vaclav Sraierfailure to start the managerHappens just about once in a while in our CI, nothing regular. Don't know how to reproduce. Rerunning the job always fixes the issue.
```
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 428ms:INFO:knot_resolver_man...Happens just about once in a while in our CI, nothing regular. Don't know how to reproduce. Rerunning the job always fixes the issue.
```
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 428ms:INFO:knot_resolver_manager.server:Loading initial configuration from /etc/knot-resolver/config.yml
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 437ms:INFO:knot_resolver_manager.server:Validating initial configuration...
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 439ms:WARNING:knot_resolver_manager.log:Changing logging level to 'INFO'
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Starting service manager auto-selection...
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Available subprocess controllers are ('supervisord',)
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 440ms:INFO:knot_resolver_manager.kresd_controller:Selected controller 'supervisord'
Oct 10 11:56:00 runner-114-project-147-concurrent-1-799966 env[5260]: 441ms:INFO:knot_resolver_manager.kresd_controller.supervisord:Supervisord is already running, we will just update its config...
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: knot-resolver.service: Main process exited, code=exited, status=1/FAILURE
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: knot-resolver.service: Failed with result 'exit-code'.
Oct 10 11:56:05 runner-114-project-147-concurrent-1-799966 systemd[1]: Failed to start Knot Resolver Manager.
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/747Expired gpg key in OBS2022-09-03T18:37:20+02:00Vladimír Čunátvladimir.cunat@nic.czExpired gpg key in OBS.deb users of our [upstream repo](https://www.knot-resolver.cz/download/) can't update anymore (Debian, Ubuntu).
Message examples:
```
# apt update
[...]
W: GPG error: http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolve....deb users of our [upstream repo](https://www.knot-resolver.cz/download/) can't update anymore (Debian, Ubuntu).
Message examples:
```
# apt update
[...]
W: GPG error: http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolver-latest/Debian_11 InRelease: The following signatures were invalid: EXPKEYSIG 74062DB36A1F4009 home:CZ-NIC OBS Project <home:CZ-NIC@build.opensuse.org>
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
```
The key:
```
pub rsa2048 2018-02-15 [SC] [expired: 2022-06-21]
45737F9C8BC3F3ED2791818274062DB36A1F4009
uid [ expired] home:CZ-NIC OBS Project <home:CZ-NIC@build.opensuse.org>
```https://gitlab.nic.cz/knot/knot-resolver/-/issues/746daemon/http: returning status 400 to handshake with dnscrypt-proxy2022-06-23T09:39:55+02:00Oto Šťávadaemon/http: returning status 400 to handshake with dnscrypt-proxyWhen [`dnscrypt-proxy`](https://github.com/DNSCrypt/dnscrypt-proxy) attempts a handshake with `kresd`, status code 400 is returned.
On Gitter, user `jlongua` reported getting this log message:
```
Jun 16 13:41:55 draco.plan9-ns2.com dn...When [`dnscrypt-proxy`](https://github.com/DNSCrypt/dnscrypt-proxy) attempts a handshake with `kresd`, status code 400 is returned.
On Gitter, user `jlongua` reported getting this log message:
```
Jun 16 13:41:55 draco.plan9-ns2.com dnscrypt-proxy[5775]: [2022-06-16 13:41:55] [ERROR] Webserver returned code 400
```
When I try it locally with a simple Docker image of dnscrypt-proxy, I get this:
```
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] dnscrypt-proxy 2.1.1
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Network connectivity detected
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Now listening to 0.0.0.0:53 [UDP]
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Now listening to 0.0.0.0:53 [TCP]
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Source [relays] loaded
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Source [public-resolvers] loaded
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] Firefox workaround initialized
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [ERROR] 400 Bad Request
dnscrypt-proxy-dnsdist-1 | [2022-06-17 06:58:33] [NOTICE] dnscrypt-proxy is waiting for at least one server to be reachable
```Oto ŠťávaOto Šťávahttps://gitlab.nic.cz/knot/knot-resolver/-/issues/727DNS64: PTR synthesis yields SERVFAIL for some cache contents2022-03-21T11:03:34+01:00Ondřej CaletkaDNS64: PTR synthesis yields SERVFAIL for some cache contentsSummary
-------
When cache is cold, PTR synthesis of DNS64 module works well. When cache gets populated by quering without DNS64 synthesis on, PTR synthesis stops working and SERVFAIL is returned instead.
Steps to reproduce
-----------...Summary
-------
When cache is cold, PTR synthesis of DNS64 module works well. When cache gets populated by quering without DNS64 synthesis on, PTR synthesis stops working and SERVFAIL is returned instead.
Steps to reproduce
------------------
```
# cat /etc/knot-resolver/kresd.conf
-- SPDX-License-Identifier: CC0-1.0
-- vim:syntax=lua:set ts=4 sw=4:
-- Refer to manual: https://knot-resolver.readthedocs.org/en/stable/
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
--net.listen('127.0.0.1', 443, { kind = 'doh2' })
net.listen('::1', 53, { kind = 'dns', freebind = true })
net.listen('::1', 853, { kind = 'tls', freebind = true })
--net.listen('::1', 443, { kind = 'doh2' })
-- Load useful modules
modules = {
'hints > iterate', -- Allow loading /etc/hosts or custom root hints
'stats', -- Track internal statistics
'predict', -- Prefetch expiring/frequent records
'dns64',
'view',
}
-- Disable DNS64 for IPv4
view:addr('0.0.0.0/0', policy.all(policy.FLAGS('DNS64_DISABLE')))
-- Cache size
cache.size = 100 * MB
```
First query over IPv6 works as expected:
```
# kdig @::1 -x 64:ff9b::101:101 +noall +answer
1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. 60 IN CNAME 1.1.1.1.in-addr.arpa.
1.1.1.1.in-addr.arpa. 1265 IN PTR one.one.one.one.
```
Query over IPv4, where DNS64 is disabled, also works properly with `NXDOMAIN`:
```
# kdig @127.0.0.1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 41713
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
;; AUTHORITY SECTION:
ip6.arpa. 3600 IN SOA b.ip6-servers.arpa. nstld.iana.org. 2021111921 1800 900 604800 3600
```
After this query, PTR synthesis does not work anymore and yields `SERVFAIL`:
```
# kdig @::1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 25807
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
```
Clearing the cache restores correct behavior for a while.https://gitlab.nic.cz/knot/knot-resolver/-/issues/723manager: race condition with watchdog while starting workers2022-03-04T12:17:09+01:00Vaclav Sraiermanager: race condition with watchdog while starting workersHow to reproduce:
1. disable worker count limit
2. set worker count to 1000
3. run the manager
4. watch the world burn (like really, these steps will trash your system)
5. manager crashes (on my machine after starting 183 instances of k...How to reproduce:
1. disable worker count limit
2. set worker count to 1000
3. run the manager
4. watch the world burn (like really, these steps will trash your system)
5. manager crashes (on my machine after starting 183 instances of kresd)
I am guessing, that the same behavior could be reproduced by hammering manager with worker count change requests. But I haven't tested that.Vaclav SraierVaclav Sraierhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/722server selection: practical issues with some Microsoft domains2022-03-14T11:17:15+01:00Vladimír Čunátvladimir.cunat@nic.czserver selection: practical issues with some Microsoft domains With some Microsoft domains (outlook.com, office.com, office365.com) a small part of the nameservers is non-responsive, but kresd (sometimes) does not gracefully fall back to the other servers.
The same issue can surely happen with som... With some Microsoft domains (outlook.com, office.com, office365.com) a small part of the nameservers is non-responsive, but kresd (sometimes) does not gracefully fall back to the other servers.
The same issue can surely happen with someone else's names as well, but this set seems far the most commonly encountered in practice. It might be related to the NS server names being served by the same partially broken set.https://gitlab.nic.cz/knot/knot-resolver/-/issues/673trust_anchors.set_insecure may miss some names2021-05-21T01:52:53+02:00Vladimír Čunátvladimir.cunat@nic.cztrust_anchors.set_insecure may miss some namesIf the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.If the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/671TLS_FORWARD can get stuck on broken addresses (v5.3.0)2021-03-24T16:09:15+01:00Vladimír Čunátvladimir.cunat@nic.czTLS_FORWARD can get stuck on broken addresses (v5.3.0)With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:48...With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:4860::8844', hostname='dns.google' },
})))
```
but part of addresses disabled, e.g.
```bash
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
```
some queries get stuck in a very long "loop" of attempting connection to the non-working IPs, even though half of them works. Example log snippet: [tls_forward.log](/uploads/a5716360f9a3e6879160ff0766e37add/tls_forward.log)
_!1143 doesn't trigger here; it wasn't meant for forwarding and individual addresses might be broken for other reasons anyway._5.3.1https://gitlab.nic.cz/knot/knot-resolver/-/issues/657policy: actions don't populate OPT when they should2021-11-23T19:52:44+01:00Vladimír Čunátvladimir.cunat@nic.czpolicy: actions don't populate OPT when they should[RFC 6891](https://tools.ietf.org/html/rfc6891#section-6.1.1):
> If an OPT record is present in a received request, compliant responders MUST include an OPT record in their respective responses.
Original report: https://forum.turris.cz...[RFC 6891](https://tools.ietf.org/html/rfc6891#section-6.1.1):
> If an OPT record is present in a received request, compliant responders MUST include an OPT record in their respective responses.
Original report: https://forum.turris.cz/t/kresd-response-missing-opt-pseudo-rr/14437
It causes practical issues with systemd-resolved (see the report).https://gitlab.nic.cz/knot/knot-resolver/-/issues/645FORMERR does not trigger EDNS fallback2021-10-11T13:06:06+02:00Petr ŠpačekFORMERR does not trigger EDNS fallbackVersion: 5.2.0
Domain `spam.molax.co.kr.` qtype `A` does not work with EDNS. Auth servers correctly return FORMERR but kresd 5.2.0 does not fallback to non-EDNS and SERVFAILs request from client.
[spam.molax.co.kr.A.log](/uploads/edde7...Version: 5.2.0
Domain `spam.molax.co.kr.` qtype `A` does not work with EDNS. Auth servers correctly return FORMERR but kresd 5.2.0 does not fallback to non-EDNS and SERVFAILs request from client.
[spam.molax.co.kr.A.log](/uploads/edde70e988fcf6ab810e693802c8896d/spam.molax.co.kr.A.log)
We need to:
- fix kresd
- investigate why test https://gitlab.nic.cz/knot/deckard/-/blob/master/sets/resolver/iter_formerr.rpl did not detect this and fix it!https://gitlab.nic.cz/knot/knot-resolver/-/issues/622map() command mangles return values2020-11-24T18:34:20+01:00Petr Špačekmap() command mangles return valuesAffected version: 5.1.3 and most likely all older versions as well
Problem
=======
In short current `map()` command is broken for arrays of return values different with length != 1 and also for certain data types.
Example with three re...Affected version: 5.1.3 and most likely all older versions as well
Problem
=======
In short current `map()` command is broken for arrays of return values different with length != 1 and also for certain data types.
Example with three return values. This is wildly inconsistent even between consecutive executions on the same `kresd -f4` instances:
```
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 1, 2, 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
> map('1, 2, 3')
[1] => 1
[2] => 3
[3] => 3
[4] => 3
```
Example with nil - it gets turned into empty tables _but not for all four instances_:
```
> map('nil')
[1] => {
}
[2] => {
}
[3] => {
}
```
Example with errors - errors are incorrectly turned into strings so the map() caller cannot distinguish them from real string return values:
```
> map('error("test")')
[1] => test
[2] => test
[3] => test
[4] => test
```
In short it is utterly broken.
Proposal
========
`map()` is mostly broken and also undocumented so we change it more or less freely. Here is how I would do it:
- Follower instances will use `pcall` to determine if a the single call inside map() succeeded or not and send results back to leader
- map() running on leader instance will receive full outputs from `pcall` on followers and check if all calls succeeded. If not `map()` will throw out an error.
- Leader will check if number of return values from each instance is exactly one (from each follower) - if not map() on leader will throw out an error.
- if no error occured map() will return two tables:
- first table with results from each instance in format `{return value from first instance}, return value from second instance, return value from third instance, ...}` - order of return values is not defined and cannot be relied on
- second table with corresponding names of control sockets (? maybe we can skip this and add it later?)
- Usage if map() caller wants to receive multiple return values: `map('{expression to be evaluated}')` will produce table of tables {{return values from first instance}, {return values from second instance}, ...}, {name of first instance, name of second instance, ...}
- Usage if map() caller wants to also receive errors:
- Caller can either wrap intended expression in pcall: `map('{pcall(expression to be evaluated)}')`
- or we can define a table format which will be used by errors thrown from map()
This approach also allows to handle variable number of return values and also `nil` values in a safe manner. Let's define two helper functions:
```
function tab_pack(...)
local tab = {...}
tab.n = select('#', ...)
return tab
end
function tab_unpack(tab)
return unpack(tab, 1, tab.n)
end
```
Now map() caller can do: `map('tab_pack(nil,2,nil)')` and receive results like:
```
> map('tab_pack(nil,2,nil)')
[1] => {
[2] => 2
[n] => 3
}
[2] => {
[2] => 2
[n] => 3
}
```
The auto-generated table field `n` tells the caller that original result contained 3 values so the caller can iterate over [1],[2],[3] and safely find that first and third return value were nil.5.2.0https://gitlab.nic.cz/knot/knot-resolver/-/issues/614failure: forwarding + signatures + CNAME to sibling2020-10-23T10:03:46+02:00Vladimír Čunátvladimir.cunat@nic.czfailure: forwarding + signatures + CNAME to siblingMaybe there are some additional conditions needed to trigger the problem.
Example log:
<details><pre>
[44705.16][resl] => id: '14071' querying: '193.17.47.1#00053' score: 21 zone cut: 'dns-oarc.net.' qname: 'Rate.dnS-OaRc.NET.' qtype:...Maybe there are some additional conditions needed to trigger the problem.
Example log:
<details><pre>
[44705.16][resl] => id: '14071' querying: '193.17.47.1#00053' score: 21 zone cut: 'dns-oarc.net.' qname: 'Rate.dnS-OaRc.NET.' qtype: 'A' proto: 'udp'
[44705.16][iter] <= answer received:
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 14071
;; Flags: qr rd ra cd QUERY: 1; ANSWER: 4; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused
;; QUESTION SECTION
rate.dns-oarc.net. A
;; ANSWER SECTION
rate.dns-oarc.net. 120 CNAME dev.dns-oarc.net.
rate.dns-oarc.net. 120 RRSIG CNAME 8 3 120 20201028101102 20200928091102 12093 dns-oarc.net. Rttx3oznjSpRgoKZPNi1vYY3fP8KDq+y82p9cs+upwXHdpx/sB8pS4LvHlME53fJ5wqf8vmcAY07kJU4z4PKFam6Qj0a7De2zX3+JEFhZhUGV4UGxgbgX1lZTfS/bRPBmO/vxwCproIiIgWxgZLOvJb4kyCtscD8QWWxn2Tijvk=
dev.dns-oarc.net. 300 A 77.72.225.245
dev.dns-oarc.net. 300 RRSIG A 8 3 300 20201026021701 20200928021701 25608 dev.dns-oarc.net. faVj+wxFlFAjg2PQm9Dj4zjA2Ad57cXKv62YFVZ1x0zAUaBXnN+95YEfHwuBanu9P7REBKyaL47NmRo9BPOIzwxTbD610lUEWjx9OkMzmZwJOr5EddraB523q2BLToqJX344NBPNywtMUuYKPQQlBKzFN+Av/gstXGUCfyccU2E=
;; ADDITIONAL SECTION
[44705.16][iter] <= rcode: NOERROR
[44705.16][vldr] >< bogus signatures: dev.dns-oarc.net. A (3 matching RRSIGs, 0 expired, 0 not yet valid, 3 invalid signer, 0 invalid label count, 0 invalid key, 0 invalid crypto, 0 invalid NSEC)
[44705.16][vldr] >< cut changed (new signer), needs revalidation
[44705.16][resl] <= server: '185.43.135.1' rtt: 135 ms
[44705.16][plan] plan 'rate.dns-oarc.net.' type 'DS' uid [44705.17]
[44705.17][iter] 'rate.dns-oarc.net.' type 'DS' new uid was assigned .18, parent uid .16
[44705.18][cach] => skipping exact packet: rank 025 (min. 030), new TTL -501
[44705.18][cach] => trying zone: ., NSEC, hash 0
[44705.18][cach] => NSEC sname: range search miss (!covers)
[44705.18][cach] => skipping zone: ., NSEC, hash 0;new TTL -123456789, ret -2
[ ][nsre] score 21 for 185.43.135.1#00053; cached RTT: 120
[ ][nsre] score 21 for 193.17.47.1#00053; cached RTT: 11
[44705.18][resl] => id: '45855' querying: '185.43.135.1#00053' score: 21 zone cut: 'dev.dns-oarc.net.' qname: 'rAtE.dNs-oArC.nET.' qtype: 'DS' proto: 'udp'
[44705.18][resl] => id: '45855' querying: '193.17.47.1#00053' score: 21 zone cut: 'dev.dns-oarc.net.' qname: 'rAtE.dNs-oArC.nET.' qtype: 'DS' proto: 'udp'
[44705.18][iter] <= answer received:
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 45855
;; Flags: qr rd ra cd QUERY: 1; ANSWER: 4; AUTHORITY: 0; ADDITIONAL: 1
;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused
;; QUESTION SECTION
rate.dns-oarc.net. DS
;; ANSWER SECTION
rate.dns-oarc.net. 120 CNAME dev.dns-oarc.net.
rate.dns-oarc.net. 120 RRSIG CNAME 8 3 120 20201028101102 20200928091102 12093 dns-oarc.net. Rttx3oznjSpRgoKZPNi1vYY3fP8KDq+y82p9cs+upwXHdpx/sB8pS4LvHlME53fJ5wqf8vmcAY07kJU4z4PKFam6Qj0a7De2zX3+JEFhZhUGV4UGxgbgX1lZTfS/bRPBmO/vxwCproIiIgWxgZLOvJb4kyCtscD8QWWxn2Tijvk=
dev.dns-oarc.net. 300 DS 65191 8 2 7202E542EC7177402116BE5EABB2366EAA1EEE8196A03934B2870A11DF174102
dev.dns-oarc.net. 300 RRSIG DS 8 3 300 20201028101102 20200928091102 12093 dns-oarc.net. jwebuweys5NQ5g/QYmltaYRxs7s1pvWXtvS/yH3JounDaBklEseOvfumUv9VdxbxT7a/U/rwWDg3CNtXlOO+4W8WpQp94Tz0wSAwT/UcA5hh38hPXnqJ/nH3gvRmEUi8iEMwl2c615IZ9YX4zNXh07oNfPeWTgzjKu6nHNF8bKA=
;; ADDITIONAL SECTION
[44705.18][iter] <= rcode: NOERROR
[44705.18][vldr] <= no useful RR in authoritative answer
[44705.18][cach] => stashed packet: rank 025, TTL 120, DS rate.dns-oarc.net. (472 B)
[44705.00][resl] request failed, answering with empty SERVFAIL
[44705.18][resl] finished: 8, queries: 5, mempool: 49200 B
</pre></details>Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/597garbage collector does not handle cache overflow2020-09-10T18:03:36+02:00Petr Špačekgarbage collector does not handle cache overflowGC exits if it fails to delete some records from cache _because the cache is overfull_. In other words the GC exits when resolver needs it most.
```
Usage: 95.80%
Cache analyzed in 0.01 secs, 7764 records, limit category is 54.
854 reco...GC exits if it fails to delete some records from cache _because the cache is overfull_. In other words the GC exits when resolver needs it most.
```
Usage: 95.80%
Cache analyzed in 0.01 secs, 7764 records, limit category is 54.
854 records to be deleted using 0.15 MBytes of temporary memory, 0 records skipped due to memory limit.
Warning: skipping deletion because of error (not enough space provided)
Warning: skipping deletion because of error (not enough space provided)
Warning: skipping deletion because of error (not enough space provided)
Warning: skipping deletion because of error (not enough space provided)
Warning: skipping deletion because of error (not enough space provided)
Warning: skipping deletion because of error (not enough space provided)
Error: transaction failed (not enough space provided)
Deleted 200 records (0 already gone) types TYPE29154 TYPE59527 TYPE44187 TYPE36047 TYPE45693 TYPE21714 TYPE50204 TYPE51332 TYPE44444 TYPE29269 TYPE3130 TYPE46908 TYPE42383 TYPE45769 TYPE44996 TYPE52982 TYPE3964 TYPE27428 TYPE48741 TYPE41612 TYPE6865 TYPE27634 TYPE11442 TYPE59684 TYPE7524 TYPE35361 TYPE54929 TYPE35156 TYPE41909 TYPE47722 TYPE39628 TYPE41358 TYPE34831 TYPE14502 TYPE44987 TYPE31613 TYPE54239 TYPE48620 TYPE38013 TYPE18764 TYPE14055 TYPE44216 TYPE59777 TYPE44082 TYPE54725 TYPE22458 TYPE24155 TYPE28478 TYPE54779 TYPE36220 TYPE51461 TYPE25536 TYPE52075 TYPE34900 TYPE56540 TYPE20833 TYPE53735 TYPE36127 TYPE1196 TYPE41734 TYPE52917 TYPE47446 TYPE2667 TYPE46684 TYPE53393 TYPE51980 TYPE48422 TYPE8606 TYPE60457 TYPE17578 TYPE42928 TYPE2558 TYPE50788 TYPE56583 TYPE53266 TYPE7786 TYPE23574 TYPE42124 TYPE30050 TYPE48447 TYPE2899 TYPE56431 TYPE2027 TYPE10014 TYPE30069 TYPE10495 TYPE8553 TYPE27614 TYPE30114 TYPE1749 TYPE30103 TYPE39247 TYPE52317 TYPE12223 TYPE15458 TYPE29030 TYPE14759 TYPE18893 TYPE54959 TYPE23394 TYPE34964 TYPE50367 TYPE49032 TYPE3520 TYPE47228 TYPE45727 TYPE53351 TYPE10951 TYPE48483 TYPE55134 TYPE15948 TYPE11818 TYPE41057 TYPE27592 TYPE39439 TYPE44299 TYPE20265 TYPE37406 TYPE49793 TYPE37190 TYPE34190 TYPE52182 TYPE51724 TYPE37423 TYPE40471 TYPE33384 TYPE26887 TYPE24555 TYPE9772 TYPE40292 TYPE1881 TYPE28454 TYPE16893 TYPE12828 TYPE35800 TYPE48615 TYPE23795 TYPE17868 TYPE52707 TYPE29353 TYPE15356 TYPE17423 TYPE43681 TYPE29103 TYPE6193 TYPE10192 TYPE1533 TYPE52733 TYPE25324 TYPE18661 TYPE8028 TYPE15130 TYPE1390 TYPE20894 TYPE46928 TYPE20775 TYPE34785 TYPE35033 TYPE25865 TYPE29467 TYPE35999 TYPE19689 TYPE36486 TYPE26812 TYPE47835 TYPE12544 TYPE59178 TYPE40217 TYPE48360 TYPE6498 TYPE27278 TYPE20611 TYPE33166 TYPE26178 TYPE29607 TYPE41861 TYPE46627 TYPE46523 TYPE39303 TYPE31506 TYPE29641 TYPE57230 TYPE45646 TYPE18200 TYPE15259 TYPE9469 TYPE38975 TYPE35866 TYPE56761 TYPE7671 TYPE2275 TYPE8828 TYPE38107 TYPE40836 TYPE28134 TYPE46671 TYPE32355 TYPE38288 TYPE42284 TYPE26703
It took 0.00 secs, 1 transactions (not enough space provided)
Error (not enough space provided)
```
Version: kresd 5.1.2Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/595garbage collector does not handle on-line cache resize2020-11-24T16:05:11+01:00Petr Špačekgarbage collector does not handle on-line cache resizeCache resize underneath running GC leads to fatal error:
```
Error starting DB transaction (MDB_MAP_RESIZED: Database contents grew beyond environment mapsize).
Error (MDB_MAP_RESIZED: Database contents grew beyond environment mapsize)
```Cache resize underneath running GC leads to fatal error:
```
Error starting DB transaction (MDB_MAP_RESIZED: Database contents grew beyond environment mapsize).
Error (MDB_MAP_RESIZED: Database contents grew beyond environment mapsize)
```Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/585[graphite] Prevents kresd to start if graphite server is not available2020-06-29T17:05:09+02:00Fre[graphite] Prevents kresd to start if graphite server is not availablekresd fails to start up when the graphite server is not available:
`kresd[2281]: [system] error while loading config: /usr/lib/knot-resolver/kres_modules/graphite.lua:102: socket:connect: No route to host (workdir '/var/lib/knot-resolve...kresd fails to start up when the graphite server is not available:
`kresd[2281]: [system] error while loading config: /usr/lib/knot-resolver/kres_modules/graphite.lua:102: socket:connect: No route to host (workdir '/var/lib/knot-resolver')`
This should be a warning, not a critical error preventing start up of kresd.
This is Knot Resolver 5.1.1 running on Debian Buster.https://gitlab.nic.cz/knot/knot-resolver/-/issues/582fix locking around cache preallocation2020-06-25T14:52:04+02:00Petr Špačekfix locking around cache preallocation Caveats in [LMDB docs](http://www.lmdb.tech/doc/index.html) suggest that our cache preallocation might break LMDB locking:
> Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() ca... Caveats in [LMDB docs](http://www.lmdb.tech/doc/index.html) suggest that our cache preallocation might break LMDB locking:
> Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.https://gitlab.nic.cz/knot/knot-resolver/-/issues/542[tls_client] session resumption does not work properly2022-02-18T11:53:56+01:00Vladimír Čunátvladimir.cunat@nic.cz[tls_client] session resumption does not work properlyIt doesn't break handshake but resumption never happens. Maybe it's broken just on TLS 1.3, or some similar condition. I tried this with quad-{1,8,9} and it looks the same in verbose log.
We do receive resumption tickets from upstream...It doesn't break handshake but resumption never happens. Maybe it's broken just on TLS 1.3, or some similar condition. I tried this with quad-{1,8,9} and it looks the same in verbose log.
We do receive resumption tickets from upstream
```
[gnutls] (4) HSK[0x1644310]: NEW SESSION TICKET (4) was received. Length 246[496], frag offset 0, frag length: 246, sequence: 0
```
but never send it on on re-connection (no idea why so far)
```
[gnutls] (4) EXT[0x1644310]: Preparing extension (Session Ticket/35) for 'client hello'
[gnutls] (4) EXT[0x1644310]: Sending extension Session Ticket/35 (0 bytes)
```
and thus the session can't resume.
```
[tls_client] TLS session has not resumed
```
_Tested with latest releases: Knot Resolver 4.3.0 and GnuTLS 3.6.11.1._https://gitlab.nic.cz/knot/knot-resolver/-/issues/514Segfault on 4.2.1 on armv72019-10-04T08:52:06+02:00Jonathan CoetzeeSegfault on 4.2.1 on armv7I updated my personal knot-based container from 4.2.0 to 4.2.1 and it now fails to start up. Running `kresd` manually from inside the container shows that it's segfaulting on startup ([core](/uploads/daae8f89a99a18c67ce5e2ab92e9b518/core...I updated my personal knot-based container from 4.2.0 to 4.2.1 and it now fails to start up. Running `kresd` manually from inside the container shows that it's segfaulting on startup ([core](/uploads/daae8f89a99a18c67ce5e2ab92e9b518/core)). Turning on verbose logging doesn't seem to reveal anything. This is on my RPi 4 running up-to-date Raspbian Buster (image runs without error on my MacBook Pro). If you have an armv7 environment to test with I've pushed two tags `jonocoetzee/private-dns:v4.2.0` and `jonocoetzee/private-dns:v4.2.1` ([repo](https://gitlab.com/jonocoetzee/private-dns)). Let me know if there's any other info you need.4.2.2Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/512prefill: deadlock issue2019-12-20T14:32:34+01:00Vladimír Čunátvladimir.cunat@nic.czprefill: deadlock issueThe https download of the (root) zone is blocking and it uses OS DNS. That combination will dead-lock e.g. in the case when kresd is the (only) resolver for the OS on which it runs. _Originally discovered in #506._
Plan &ndash; implem...The https download of the (root) zone is blocking and it uses OS DNS. That combination will dead-lock e.g. in the case when kresd is the (only) resolver for the OS on which it runs. _Originally discovered in #506._
Plan – implementation details: I expect we'd better convert the fetch to use `lua-http` library, as it's asynchronous and has a relatively [convenient API for this](https://daurnimator.github.io/lua-http/0.3/#retrieving-a-document).https://gitlab.nic.cz/knot/knot-resolver/-/issues/493Resolver stops working and returns SERVFAIL until restarted2022-02-04T17:49:26+01:00ValdikSSResolver stops working and returns SERVFAIL until restartedSome time after normal operation, knot-resolver stops resolving any domains and returns SERVFAIL on all DNS queries.
I have the following configuration:
```
# cat /etc/knot-resolver/kresd.conf
user('knot-resolver','knot-resolver')
cache...Some time after normal operation, knot-resolver stops resolving any domains and returns SERVFAIL on all DNS queries.
I have the following configuration:
```
# cat /etc/knot-resolver/kresd.conf
user('knot-resolver','knot-resolver')
cache.size = 300 * MB
net.ipv6 = false
modules = {
'hints > iterate', -- Load /etc/hosts and allow custom root hints
'stats', -- Track internal statistics
'predict', -- Prefetch expiring/frequent records
}
-- minimum TTL = 2 minutes
cache.min_ttl(120)
dofile("/etc/knot-resolver/knot-aliases-alt.conf")
policy.add(
policy.suffix(
policy.STUB(
{'127.0.0.4'}
),
policy.todnames(blocked_hosts)
)
)
# cat /etc/knot-resolver/knot-aliases-alt.conf
blocked_hosts = {
"0000a-fast-proxy.de.",
"002cc20.icu.",
"007ingyenletoltes.hu.",
"007rc.biz.",
"007slots.com.",
"00seeds.com.",
"010119azino777.com.",
"010119azino777.ru.",
…
"zzzes.ru.",
"zzztorrent.net.",
"zzzz1.live.",
"zzzz2.live.",
}
```
Both normal recursive queries and queries which should be forwarded to 127.0.0.4 (from blocked_hosts) fail to work.
I've just enabled verbose logging to monitor the issue, but the log seems to buffer a lot. I see new information in journald's journalctl in spikes, a large log every 30 seconds or so. I'm not sure if this is some sort of cache and is to be expected, or it shows some kind of lock problem.
It even triggered a watchdog once:
```
systemd[1]: kresd@1.service: Watchdog timeout (limit 10s)!
systemd[1]: kresd@1.service: Killing process 23036 (kresd) with signal SIGABRT.
systemd[1]: kresd@1.service: Main process exited, code=killed, status=6/ABRT
systemd[1]: kresd@1.service: Unit entered failed state.
systemd[1]: kresd@1.service: Failed with result 'watchdog'.
systemd[1]: kresd@1.service: Service hold-off time over, scheduling restart.
```
The issue happens irregularly. It used to works fine for weeks but in the last 3 days it happened for 3 times. Sometimes it takes dozens of hours, some time only several minutes. I did not update the configuration and updated the software only after second time. It happens on 4.1.0.
Right now I'm running verbose logging and will update this issue when it happens again.