Timeouts with some PROXY v2 encapsulated queries in Knot 3.2.0
Hi,
I've been tracking down some differences in behavior between the original patch implementing PROXY v2 support and the PROXYv2 support in Knot DNS 3.2.0.
I have dnstap-replay
set up to replay dnstap-logged traffic from Knot 3.0.x to Knot 3.2.0 with a PROXY v2 header. dnstap-replay
can detect, count, and log timeouts, and when the "test" server (the target of the replayed, PROXY v2 encoded traffic) gets upgraded to Knot 3.2.0 I'm sometimes seeing a significant increase in the number of timeouts reported by dnstap-replay
. (Even though it's an increase, it's from a very low base rate, so the absolute number of timeouts is still very, very low compared to all the "good" traffic being processed.)
Looking at some samples of the queries that are causing timeouts when sent to Knot 3.2.0 with a PROXY v2 header, they appear to be queries that are broken somehow and cause a FORMERR
response from both Knot 3.0.x and 3.2.0 when sent without a PROXY v2 header.
I think the correct behavior should be to process a PROXY v2 encapsulated query as if it were a normal, non-encapsulated query, so long as the PROXY v2 header is decoded successfully. So there shouldn't be any difference in behavior with regard to whether a response is sent by Knot or not determined solely by whether the query was encapsulated or not.
Looking into the code, I think the problem is here:
It looks like the knot_pkt_t
result of the call to knot_pkt_parse()
on the inner DNS query payload isn't propagated to the caller when the inner knot_pkt_parse()
fails, so the followup processing that would determine that the correct response is a FORMERR
can't take place. Instead the original, outer knot_pkt_parse()
failure when attempting to parse the whole PROXY v2 packet as a DNS message will get processed after control returns to the packet handler, e.g. here:
(Side note: I think the new knot_pkt_t
object q
might be leaking in the failure return on line 61? Side note 2: The comment block on lines 54-58 doesn't match with the following block of code, since the offset was already calculated and checked at lines 34-37.)
I wrote a patch (see attached) that seems to fix this non-responsive behavior and results in the expected FORMERR
responses to malformed DNS queries that are PROXY v2 encapsulated, but I'm not sure it's 100% correct.
Thanks!
0001-proxyv2_header_strip-Operate-correctly-on-malformed-.patch