Self sign-up has been disabled due to increased spam activity. If you want to get access, please send an email to a project owner (preferred) or at gitlab(at)nic(dot)cz. We apologize for the inconvenience.
When remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses.
On the other hand servers SHOULD not close the connections early, without reasons for the particular case... so hopefully this won't happen that often in practice; FRITZ! seems a notable case. I'll keep copying the important points from that discussion to here.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
FRITZ!Box 7590 with FRITZ!OS 07.12 as router, DHCP server, and DNS resolver (default behavior)
Turris MOX 4.0.5 in mode (host) computer, DHCP client (default behavior)
DNSSEC disabled for simplicity
$ ssh root@turris.local
$ pkgupdate
The SSH console gives:
INFO:Target Turris OS: 4.0.5line not foundline not foundline not foundERROR:runtime: [string "requests"]:395: [string "utils"]:427: URI download failed: Couldn't resolve host 'repo.turris.cz'
FRITZ!OS accepts only one DNS transaction per TCP connection. FRITZ!OS closes the TCP connection after 15 seconds. AVM ‘confirmed’ the issue, however, rejected several DNS transactions in one TCP connection as feature request.
I look closely at the first TCP flow, starting at packet numbered "8" in wireshark, and the behavior of FRITZ! (192.168.0.1) seems really weird to me. It answers the first two queries basically immediately (A and AAAA pair for the same name), and then there comes a looong period (~15 seconds) where it ACKs all the queries coming but never replies... and after that long time it closes the connection.
I mean, if they don't want to answer anything anymore, why keep the connection open? Even if we improved our side, there's "unavoidable" large delay due to them keeping it open, as we can have no idea that they won't reply.
Yepp, that is the problem. It looks like, AVM expects the querier to close the TCP connection. But then, as you state, how should a querier (supporting multiple ones) known that. dig on the command line stays there for those 15 seconds and cannot do anything else.
Sorry, I don't really see what could standard-compliant client do (at least without previous knowledge, probably hand-configured). https://tools.ietf.org/html/rfc7766 says that both sides should support query pipelining...
From point of view and knowledge, there is no alternative than to never timeout. That is the way dig does, so everyone should do it. Some day, the other party, here AVM, moves.
AVM does not have a public bug tracker. Anyway, the ID is 3301056. They simply rejected it as feature request. They placed an assumption about a rate-limiter within the FRITZ!OS. However, I have no clue how that can kick-in when even things like dig @fritz.box +short +tcp +keepopen example.com A example.net A
fail.
@vcunat I am not sure I understand that part. A super-section states SHOULD (for Connection Reuse), a sub-section states MUST (for Query Pipelining). What is that about? If a super-section states SHOULD nothing below can be a MUST. Puh. I am really too stupid for RFCs …
My understanding is that the server MUST expect that clients will attempt pipelining and SHOULD support answering all requests that come that way – in particular, servers SHOULD NOT close the connection right after the first answer (that part is very old, actually). It's not clear what exactly this "expect" means about the behavior, so I suppose FRITZ! might still say they're compliant-ish; they apparently don't care about this feature.
From point of view and knowledge, there is no alternative than to never timeout. That is the way dig does, so everyone should do it. Some day, the other party, here AVM, moves.
What do you mean? I'm confused.
In any case, "never timeout" is not an option when it comes to anything on TCP. It would open path to attacks like SYN flood, SlowLoris etc.
Guys, I did my job by reporting and tracking down the cause as much as I could. Now, it is your job to find a solution. That is not my job, especially because I do not understand why a TCP client can be attacked. Anyway, just to re-emphasise the severity: Because of this, Turris OS is not able to update! This is not a special, rare configuration but the default of both, Turris OS and FRITZ!OS. Finally, when kresd waits these 15 seconds – it does wait already – all subsequent queries for the same domain within that 15 seconds fail (return a good looking empty answer). Therefore, the update script fails so severely because it asks for repo.turris.cz several times in a row.
I am still tracking down why kresd is switching to TCP, if that is something special in combination of FRITZ!OS too. Until then, this issue affects a vast majority in Germany, as FRITZ!Box is theIAD here.
@traud: well, yes, you're in the unfortunate position between these two implementations whose positions (honestly) seem dead-locked in a state that this configuration just won't work.
My practical recommendation is... just avoid this set up, i.e. is there a reason why you need Turris to forward DNS to FRITZ!Box? In my opinion the most natural mode of operation is to not use forwarding at all – one click in Foris GUI – and there are also a few other easy options to forward to some public services (usually secured by TLS).
@traud Do you remember when it started happening? I doubt you are the only Turris OS user in Germany so I want to find out what changed to see if we can fix that.
One more thing: Please provide model number + firmware version so we have enough information when talking to AVM. Thank you for your time and patience!
Sure … you are telling the wrong one. I could not care less because I ‘solved’ this issue for me long ago. However, it took me a lot of effort to find the root cause (DNS), to find a workaround. There are a lot of other approaches, like changing the default in Turris OS, adding more verbose logs, or even removing that CNAME/A to proxy.turris.cz and just using an A record. Again, it is not my job to dictate a solution. If you do not care either, that’s it. All I can do is offering my help if you need more testing or answer subsequent questions.
model number + firmware
FRITZ!Box 7590 FRITZ!OS 07.12
FRITZ!Box 7490 FRITZ!OS 07.19-76429 (that is the current head, a beat version)
Both are mentioned in the ticket which AVM has. If you like, I check other current branches of FRITZ!OS like 06.8x.
when it started
Day 1. In January, I got this Turris MOX used. It was never unpacked. It was still on Turris OS 4.0 – not sure if the Web interface printed the exact version back then. If you need the serial number to track which was the shipping version, I can provide that. Actually, on the forum, a lot of other users face the same symptom. Again, it is just the symptom; we do not know if it is the same cause. However, I confirmed with this user, that his Turris MOX failed to update and he is using a FRITZ!Box as well. Actually, he did not not notice that is Turris MOX was not at the latest version and still created a report. Go figure!
By the way, the rescue modes use the resolver of FRITZ!Box as well. Although those modes should be affected as well, at least rescue mode 6 is not trapped by this, actually it never switches to TCP.
Thank you for keeping track! Did AVM tell you this? Interesting, neither did they notify me as original ticket creator nor via the release notes.
Anyway, yes, both issues are solved by that update (multiple TCP queries and the caching behavior in case of query-case randomization). Because of the latter, Turris OS is not going to run into DNS over TCP at all. Nevertheless, this is something to watch out for, because other DNS implementations might behave similar. And, worse, many FRITZ!Box are not receiving this major update. It is questionable whether AVM is going to backport this to their older FRITZ!OS branches 07.1x, 07.0x, 06.8x, 06.5x, and 06.3x which still got security updates (full list). AVM is not that transparent but shows similar behavior like Microsoft with their Windows feature upgrades.
Actually, even me is affected by this policy because my (normal) main FRITZ!Box (and my spare/backup box) will not get to that major update. Consequently, my out-of-box experience with Turris OS would not change. Luckily, we know zillion of workarounds for Turris OS now, like disabling the query-case randomization, choosing DNS over TLS, or going for a new FRITZ!Box …