Hey, I can confirm that the issue is still present in Turris OS 7.0 on Turris Mox running kernel 5.15.151.
Hey, I can confirm that the issue is still present in Turris OS 7.0 on Turris Mox running kernel 5.15.151.
In my setup, it happens from time to time that Knot Resolver provides wrong answer to a DoH client querying A record of an IPv4-only name when DNS64 module is active. It happens only when these conditions are met:
A
but no AAAA
recorddns64
module is loadeddoh2
and asking concurrently for A
and AAAA
record (the queries can come via completely independent HTTP/2 sessions though)If all these conditions are fulfilled, then Knot resolver sometimes answers the A query with referral received from parent zone of the queried name. I was able to reproduce the issue on these names:
github.com
duckduckgo.com
liberec.cz
ipv4only.arpa
I reproduce the issue on a Knot Resolver 5.7.1 installed from EPEL repository on Fedora 39 with this configuration:
(cache size is set to lowest possible value to increase the probability of hitting the issue)
modules = {'dns64'}
net.listen('::1', 443, { kind = 'doh2' })
cache.size = 32768
user('knot-resolver','knot-resolver')
I use this script to keep repeating queries using doh utility until A
records are missing from the response. That happens at most after ca. 15 minutes:
#!/bin/bash
domain=${1-github.com}
# Enable debugging
socat - unix-connect:/run/knot-resolver/control/1 <<EOF
policy.add(policy.suffix(policy.DEBUG_ALWAYS, policy.todnames({'$domain'})))
EOF
while true;
do
date
out="$(doh -k $domain https://[::1]/dns-query)";
echo "$out";
grep -q "^A:" <<<"$out" || break;
sleep 1;
done
date
I was not able to reproduce the issue using kdig
tool, possibly because it sends queries sequentially and my shell was not fast enough to spawn second instance of kdig
before the first one finishes.
I am attaching a packet capture together with TLS key log, as well as kreds syslogs of the issue
demonstrated when querying ipv4only.arpa
. The issue is very well visible with
Wireshark filter set to: lower(dns.qry.name) == "ipv4only.arpa"
Packets 31 - 188 show correct behavior, packets 256 - 422 show the issue,
particularly packet 359 which contains referral from packet 354 instead of
answer from packet 417:
No. Protocol Info
31 DoH Standard query 0x0000 A ipv4only.arpa
36 DoH Standard query 0x0000 AAAA ipv4only.arpa
65 DNS Standard query 0x53fb AAAA ipV4oNlY.arpa OPT
66 DNS Standard query 0x9e3d A iPv4onLY.ARPA OPT
67 DNS Standard query response 0x53fb AAAA ipV4oNlY.arpa NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org NSEC iris.arpa RRSIG OPT
69 DNS Standard query response 0x9e3d A iPv4onLY.ARPA NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org NSEC iris.arpa RRSIG OPT
108 DNS Standard query 0xb804 AAAA iPV4oNLY.aRpa OPT
124 DNS Standard query response 0xb804 AAAA iPV4oNLY.aRpa SOA sns.dns.icann.org OPT
142 DNS Standard query 0x4de9 A Ipv4onlY.aRPa OPT
144 DNS Standard query response 0x4de9 A Ipv4onlY.aRPa NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org NSEC iris.arpa RRSIG OPT
174 DNS Standard query 0xc998 A IpV4oNly.ARPa OPT
179 DNS Standard query response 0xc998 A IpV4oNly.ARPa A 192.0.0.170 A 192.0.0.171 NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org OPT
184 DoH Standard query response 0x0000 AAAA ipv4only.arpa AAAA 64:ff9b::c000:aa AAAA 64:ff9b::c000:ab SOA sns.dns.icann.org
188 DoH Standard query response 0x0000 A ipv4only.arpa A 192.0.0.170 A 192.0.0.171
256 DoH Standard query 0x0000 A ipv4only.arpa
261 DoH Standard query 0x0000 AAAA ipv4only.arpa
287 DNS Standard query 0x23b6 AAAA ipV4oNlY.arPa OPT
288 DNS Standard query 0x8503 A IpV4ONLy.ARpA OPT
292 DNS Standard query response 0x23b6 AAAA ipV4oNlY.arPa NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org NSEC iris.arpa RRSIG OPT
293 DNS Standard query response 0x8503 A IpV4ONLy.ARpA NS b.iana-servers.net NS ns.icann.org NS a.iana-servers.net NS c.iana-servers.net NSEC iris.arpa RRSIG OPT
328 DNS Standard query 0x4ab4 AAAA iPV4ONLy.arpa OPT
330 DNS Standard query response 0x4ab4 AAAA iPV4ONLy.arpa SOA sns.dns.icann.org OPT
350 DNS Standard query 0x17fa A ipv4ONLY.ARpa OPT
354 DNS Standard query response 0x17fa A ipv4ONLY.ARpa NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org NSEC iris.arpa RRSIG OPT
359 DoH Standard query response 0x0000 A ipv4only.arpa NS ns.icann.org NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net
407 DNS Standard query 0x0f40 A IPv4oNly.arpA OPT
417 DNS Standard query response 0x0f40 A IPv4oNly.arpA A 192.0.0.170 A 192.0.0.171 NS a.iana-servers.net NS b.iana-servers.net NS c.iana-servers.net NS ns.icann.org OPT
422 DoH Standard query response 0x0000 AAAA ipv4only.arpa AAAA 64:ff9b::c000:aa AAAA 64:ff9b::c000:ab SOA sns.dns.icann.org
One of my network interfaces is named mtg-dns
. If I put it into the declarative config like this:
network:
listen:
- interface: mtg-dns
- interface: mtg-dns
kind: dot
- interface: mtg-dns
kind: doh2
kresd fails to start, logging this error:
kresd0[7036]: [system] error while loading config: kresd0.conf:137: attempt to perform arithmetic on field 'mtg' (a nil value) (workdir '/run/knot-resolver')
I am running kresd 6.0.4 from Fedora COPR on Oracle Linux 9.
On Turris 1.X running 6.4.2 with rainbow version 0.1.4-1, when brightness is set to less than or equal to 223 (on the precise scale), weird things start to happen:
# rainbow brightness -p 223
# uci show rainbow
rainbow.all=led
rainbow.all.brightness='064'
# rainbow reset -n
# uci show rainbow
rainbow.all=led
rainbow.all.brightness='004'
# rainbow reset -n
# uci show rainbow
rainbow.all=led
rainbow.all.brightness='000'
After the second restart, the LEDs are completely dark. When brightness is set up to more than 223, it will get stored as 255 and this value survives unlimited amount of restarts. But that is just too bright.
In kresd version 5.6.0 with DNS64 module enabled, when resolving tudelft.account.worldcat.org
, DNS64 does not kick in:
$ dig tudelft.account.worldcat.org a
; <<>> DiG 9.16.37 <<>> tudelft.account.worldcat.org a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52064
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;tudelft.account.worldcat.org. IN A
;; ANSWER SECTION:
tudelft.account.worldcat.org. 2459 IN CNAME emea.account.worldcat.org.
emea.account.worldcat.org. 28 IN A 193.240.184.98
$ dig tudelft.account.worldcat.org aaaa
; <<>> DiG 9.16.37 <<>> tudelft.account.worldcat.org aaaa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63626
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 4 (Forged Answer): (BHD4: DNS64 synthesis)
;; QUESTION SECTION:
;tudelft.account.worldcat.org. IN AAAA
;; AUTHORITY SECTION:
worldcat.org. 653 IN SOA michelle.ns.cloudflare.com. dns.cloudflare.com. 2312413286 10000 2400 604800 1800
The zone in question is hosted by Cloudflare and has DNSSEC enabled so my wild guess is that it has something to do with the way Cloudflare signs negative answers.
Great find. Although this is more related to #355. The broadcast leak between VLANs (this issue) seems to be fixed with kernel version 6.1, but still present in current version 5.15.
Possibly related to #355.
In my setup, Turris Mox have all interfaces in a bridge and VLAN filtering is used to setup different roles for different ports. In this output of bridge vlan
command, port lan1
has only one allowed VLAN number 60 that is also PVID. This port is used as an access port for computers.
# bridge vlan
port vlan-id
eth0 20 PVID Egress Untagged
21
60
lan1 60 PVID Egress Untagged
lan2 62 PVID Egress Untagged
lan3 20 PVID Egress Untagged
21
60
lan4 22 PVID Egress Untagged
br-guest_turris 1 PVID Egress Untagged
br-lan 20
21
22
60
62
wlan1 22 PVID Egress Untagged
wlan0 22 PVID Egress Untagged
wlan0-1 62 PVID Egress Untagged
Despite this setup, I can see some tagged frames with VLAN tag 20 or 22 leaking into the lan1 port. Only multicast traffic leaks like this. This is especially harmful for Windows, since that OS mostly ignores 802.1q header and receive data from all VLANs, breaking IPv6 configuration every time a RA is sent into some of the other VLANs.
Hey! I am suffering from this issue as well. It happens only on 1 GB version of MOX A, 512 MB versions seem to be rock stable. Only TOS 6.0 is affected. The kernel panics happen with no relation to modules attached every few hours. Here is a list what I have already tried:
The last crash today is this:
[22242.897755] SError Interrupt on CPU0, code 0xbf000001 -- SError
[22242.897780] CPU: 0 PID: 4528 Comm: foris-controlle Not tainted 5.15.74 #0
[22242.897791] Hardware name: CZ.NIC Turris Mox Board (DT)
[22242.897795] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[22242.897804] pc : el0_da+0x18/0x50
[22242.897822] lr : el0t_64_sync_handler+0x60/0xb0
[22242.897831] sp : ffffffc00b833e80
[22242.897834] x29: ffffffc00b833e80 x28: ffffff8000240000 x27: 0000000000000000
[22242.897850] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[22242.897860] x23: 0000000080000000 x22: 0000007fa9cad740 x21: 00000000ffffffff
[22242.897872] x20: ffffffc03714d000 x19: ffffffc00b833eb0 x18: 0000000000000000
[22242.897883] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[22242.897894] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[22242.897904] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[22242.897914] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[22242.897924] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc00896f3ac
[22242.897934] x2 : ffffffc00896f3c4 x1 : 0000000092000018 x0 : 0000007fa9f3b510
[22242.897949] Kernel panic - not syncing: Asynchronous SError Interrupt
[22242.897953] SMP: stopping secondary CPUs
[22242.897963] Kernel Offset: disabled
[22242.897965] CPU features: 0x00000000,20000802
[22242.897971] Memory Limit: none
Hey! I discovered that the same problem appears in Omnia running HBL too. In this setup:
root@omnia:~# bridge vlan show
port vlan-id
lan0 21 PVID Egress Untagged
lan1 60 PVID Egress Untagged
lan2 60 PVID Egress Untagged
lan3 60
lan4 20 PVID Egress Untagged
21
60
br-guest-turris 1 PVID Egress Untagged
br-lan 20
21
60
wlan1 20 PVID Egress Untagged
wlan0 20 PVID Egress Untagged
wlan0-1 60 PVID Egress Untagged
wlan0-2 21 PVID Egress Untagged
Packets received on wlan0-1
and forwarded to lan2
or lan1
get tagged with vlan ID 60 while traffic between the router itself (br-lan.60
interface) and ports lan2
or lan1
goes untagged. Also ingress traffic from lan1
or lan2
to wlan0-1
flows without any tags.
With current hbl kernel 5.15.59, there is no access to the Real Time Clock:
# ls /dev/rtc*
ls: /dev/rtc*: No such file or directory
# ls /sys/bus/i2c/devices/
#
Possibly related: The time jump during boot probably triggers some bug in lighttpd, that break TLS capability. Restarting lighttpd works around this issue.
# cat /var/log/lighttpd/error.log
2022-08-04 18:27:45: (../src/server.c.1588) server started (lighttpd/1.4.65)
2022-08-04 18:27:54: (../src/server.c.267) warning: clock jumped 3570 secs
2022-08-04 18:27:54: (../src/server.c.275) attempting graceful restart in < ~5 seconds, else hard restart
2022-08-04 19:27:24: (../src/server.c.1019) [note] graceful shutdown started
2022-08-04 19:27:24: (../src/server.c.2097) server stopped by UID = 0 PID = 4432
2022-08-04 19:27:24: (../src/server.c.1588) server started (lighttpd/1.4.65)
2022-08-04 19:27:28: (../src/connections.c.716) unexpected TLS ClientHello on clear port (2001:db8::fe5)
From computer:
# curl https://turris.example/ -v
* Trying 2001:db8::1:443...
* Connected to … port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* error:1408F10B:SSL routines:ssl3_get_record:wrong version number
* Closing connection 0
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
Hey, thanks! I can confirm that LEDs now do work. I worked around missing rainbow by a small shell script like this:
#!/bin/sh
cd /sys/class/leds/
for n in rgb\:*/multi_intensity;
do
# default color
echo 0 255 187 > "$n"
done
echo 255 255 255 > rgb\:wan/multi_intensity
echo 255 255 255 > rgb\:indicator-1/multi_intensity
echo 255 17 0 > rgb\:power/multi_intensity
echo 255 100 0 > rgb\:wlan-1/multi_intensity
echo 100 255 0 > rgb\:wlan-2/multi_intensity
echo 255 0 34 > rgb\:wlan-3/multi_intensity
But it seems that in order to actually change the color, one has to overwrite trigger
. So I set up software based triggers for all LEDs in /etc/config/system
like this:
config led 'led_lan4'
option name 'lan4'
option sysfs 'rgb:lan-4'
option trigger 'netdev'
option mode 'link tx rx'
option dev 'lan4'
In my setup, Mox acts both as a router for some vlans as well as a switch for others. I put all ports including eth0 into one bridge and use vlan filtering.
config device
option name 'br-lan'
option type 'bridge'
option bridge_empty '1'
option force_link '1'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
list ports 'eth0'
config bridge-vlan
option device 'br-lan'
option vlan '22'
list ports 'lan3'
list ports 'lan4'
config bridge-vlan
option device 'br-lan'
option vlan '60'
list ports 'eth0:t'
list ports 'lan1'
config bridge-vlan
option device 'br-lan'
option vlan '62'
list ports 'lan2'
config bridge-vlan
option device 'br-lan'
option vlan '20'
list ports 'eth0'
config bridge-vlan
option device 'br-lan'
option vlan '21'
list ports 'eth0:t'
config interface 'wan'
option device 'br-lan.20'
…
config interface 'lan'
option device 'br-lan.22'
…
# bridge vlan show
port vlan-id
eth0 20 PVID Egress Untagged
21
60
lan1 60 PVID Egress Untagged
lan2 62 PVID Egress Untagged
lan3 22 PVID Egress Untagged
lan4 22 PVID Egress Untagged
br-lan 20
21
22
60
62
wlan0 22 PVID Egress Untagged
wlan0-1 62 PVID Egress Untagged
# uname -r
5.4.203
After upgrade to kernel 5.15.50 in HBL, I am having troubles with VLAN 60, that just traverses tagged on eth0
and untagged on lan1
port. The ingress traffic on lan1
gets tagged and delivered to eth0
but in egress direction, the tag is not stripped when leaving lan1
interface despite bridge vlan show
showing Egress Untagged
. There are no problems with other VLANs which don't traverse between ethernet ports. Alo if I use different LAN port in place of eth0 for uplink, problem is gone.
After upgrade of the kernel from version 5.4.203-1-da0ddeb89bb0e25d2a575f62263f6300 to version 5.15.50-1-bbde583666f0d21706f8da40fd4a8532, LEDs stopped working on Omnia. It seems that the driver is missing:
# ls /sys/class/leds/
ath10k-phy0 ath9k-phy1 mmc0::
# rainbow all enable
Failed to open file: No such file or directory
Reverting to previous kernel fixes the issue.
When cache is cold, PTR synthesis of DNS64 module works well. When cache gets populated by quering without DNS64 synthesis on, PTR synthesis stops working and SERVFAIL is returned instead.
# cat /etc/knot-resolver/kresd.conf
-- SPDX-License-Identifier: CC0-1.0
-- vim:syntax=lua:set ts=4 sw=4:
-- Refer to manual: https://knot-resolver.readthedocs.org/en/stable/
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
--net.listen('127.0.0.1', 443, { kind = 'doh2' })
net.listen('::1', 53, { kind = 'dns', freebind = true })
net.listen('::1', 853, { kind = 'tls', freebind = true })
--net.listen('::1', 443, { kind = 'doh2' })
-- Load useful modules
modules = {
'hints > iterate', -- Allow loading /etc/hosts or custom root hints
'stats', -- Track internal statistics
'predict', -- Prefetch expiring/frequent records
'dns64',
'view',
}
-- Disable DNS64 for IPv4
view:addr('0.0.0.0/0', policy.all(policy.FLAGS('DNS64_DISABLE')))
-- Cache size
cache.size = 100 * MB
First query over IPv6 works as expected:
# kdig @::1 -x 64:ff9b::101:101 +noall +answer
1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. 60 IN CNAME 1.1.1.1.in-addr.arpa.
1.1.1.1.in-addr.arpa. 1265 IN PTR one.one.one.one.
Query over IPv4, where DNS64 is disabled, also works properly with NXDOMAIN
:
# kdig @127.0.0.1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 41713
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
;; AUTHORITY SECTION:
ip6.arpa. 3600 IN SOA b.ip6-servers.arpa. nstld.iana.org. 2021111921 1800 900 604800 3600
After this query, PTR synthesis does not work anymore and yields SERVFAIL
:
# kdig @::1 -x 64:ff9b::101:101
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 25807
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 0
;; QUESTION SECTION:
;; 1.0.1.0.1.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.b.9.f.f.4.6.0.0.ip6.arpa. IN PTR
Clearing the cache restores correct behavior for a while.