Very long kdig responses over TLS
I am testing kdig +tls
+ dnsdist on localhost, and the kdig is reporting response time ~40 ms, even when there is no load on the dnsdist server. Verbose log of the dnsdist says that getting answer took hunders of microseconds, so there is something strange. When I try dig +tls
(from bind9-dnsutils debian package) with the same query, I got Query time: 0 msec
.
If you look at strace from the dnsdist process, you see that after sending server hello, it received some 6 bytes and then waited 43 ms for the query.
[pid 1363633] 0.006441 write(120, "\26\3\3\0\233\2\0\0\227\3\3\350JArya\367V\0\21NL\7\272H\313\272\34\37\373\261"..., 2162) = 2162
[pid 1363633] 0.000105 read(120, 0x7fc5f8007e93, 5) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1363633] 0.000087 epoll_ctl(108, EPOLL_CTL_ADD, 120, {EPOLLIN, {u32=120, u64=120}}) = 0
[pid 1363633] 0.000051 epoll_wait(108, [{EPOLLIN, {u32=120, u64=120}}], 1024, 500) = 1
[pid 1363633] 0.000796 read(120, "\24\3\3\0\1", 5) = 5
[pid 1363633] 0.000100 read(120, "\1", 1) = 1
[pid 1363633] 0.000097 read(120, 0x7fc5f8007e93, 5) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1363633] 0.000111 epoll_wait(108, [{EPOLLIN, {u32=120, u64=120}}], 1024, 500) = 1
[pid 1363633] 0.042519 read(120, "\27\3\3\0E", 5) = 5
[pid 1363633] 0.000112 read(120, "/\334\200T\224\307\315\204u\322\0317o\367\376\200\263\366\277\226\205\207l\214\n\211\214\327\303\235 C"..., 69) = 69
This confirms tcp dump (also note that the 6 bytes is TLS Change Cipher Spec message):
6 0.008818 853 127.0.0.1 TLSv1.3 2228 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data, Application Data
7 0.008824 38723 127.0.0.1 TCP 66 38723 → 853 [ACK] Seq=374 Ack=2163 Win=65536 Len=0 TSval=2827511561 TSecr=2827511561
8 0.009691 38723 127.0.0.1 TLSv1.3 72 Change Cipher Spec
9 0.052419 853 127.0.0.1 TCP 66 853 → 38723 [ACK] Seq=2163 Ack=380 Win=65536 Len=0 TSval=2827511605 TSecr=2827511562
10 0.052433 38723 127.0.0.1 TLSv1.3 292 Application Data, Application Data
But strace output for kdig says, that it sent TLS Change Cipher Spec message, then immediately DNS query using sendmsg
. And they waited 42 ms for a DNS answer:
0.000118 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\27\3\3\0E/\334\200T\224\307\315\204u\322\0317o\367\376\200\263\366\277\226\205\207l\214\n\211\214"..., iov_len=74}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 74
0.000155 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\27\3\3\0\223\211\v\r8\216\235\323\25\217\370K>\363d{\370\364\337Y\r%Af\252\5\6\276"..., iov_len=152}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 152
0.000089 recvfrom(3, 0x556092194843, 5, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
0.000093 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])
0.042279 recvfrom(3, "\27\3\3\0\352", 5, 0, NULL, NULL) = 5
0.000131 recvfrom(3, "sRr\3\317N\241\357\26\245\265 \217\22\233\227\254\356\1\314\212\326E \th\234p9\177\274\355"..., 234, 0, NULL, NULL) = 234
So it seems that the reason of the delay is that the Linux TCP stack for some reason delays ACK for the 6 bytes of the TLS Change Cipher Spec message.
dig +tls
does not have the same issue very likely because it uses write
syscall instead of sendmsg
, and so kernel concats TLS Change Cipeher Spec message and DNS query data into one packet. See tcp dump:
6 0.007710 853 127.0.0.1 TLSv1.3 2189 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data, Application Data
7 0.007716 39623 127.0.0.1 TCP 66 39623 → 853 [ACK] Seq=294 Ack=2124 Win=65536 Len=0 TSval=2830770825 TSecr=2830770825
8 0.008180 39623 127.0.0.1 TLSv1.3 146 Change Cipher Spec, Application Data
9 0.008320 853 127.0.0.1 TLSv1.3 321 Application Data