Skip to content
Snippets Groups Projects
Verified Commit a39ebf18 authored by Tomas Krizek's avatar Tomas Krizek
Browse files

docs: add proper documentation

parent 21189c77
No related branches found
No related tags found
1 merge request!28add documentation
Showing
with 892 additions and 158 deletions
......@@ -2,188 +2,91 @@
Realistic DNS benchmarking tool which supports multiple transport protocols:
- **DNS-over-TLS (DoT)**
- **DNS-over-HTTPS (DoH)**
- UDP
- TCP
- DNS-over-TLS (DoT)
- DNS-over-HTTPS (DoH)
*DNS Shotgun is capable of simulating hundreds of thousands of clients.*
*DNS Shotgun is capable of simulating hundreds of thousands of DoT/DoH
clients.*
Every client establishes its own connection when communicating over TCP-based
protocol. This makes the tool uniquely suited for realistic benchmarking since
its traffic patterns are very similar to real clients.
Every client establishes its own connection(s) when communicating over
TCP-based protocol. This makes the tool uniquely suited for realistic DoT/DoH
benchmarks since its traffic patterns are very similar to real clients.
## Current status (2020-09-14)
DNS Shotgun exports a number of statistics, such as query latencies, number of
handshakes and connections, response rate, response codes etc. in JSON format.
The toolchain also provides scripts that can plot these into readable charts.
- fully supported UDP, TCP and DNS-over-TLS with
[dnsjit](https://github.com/DNS-OARC/dnsjit) 1.0.0
- fully supported DNS-over-HTTPS with development version of dnsjit
- traffic can be replayed only over IPv6
- user interface
- may be unstable
- only very basic UI available
- more complex scenarios are no supported yet
(e.g. simultaneously using multiple protocols)
- pellet.py is functional, but it is very slow and requires python-dpkt from
master
## Features
## Overview
- Supports DNS over UDP, TCP, TLS and HTTP/2
- Allows mixed-protocol simultaneous benchmark/testing
- Can bind to multiple source IP addresses
- Customizable client behaviour (idle time, TLS versions, HTTP method, ...)
- Replays captured queries over selected protocol(s) while keeping original timing
- Suitable for high-performance realistic benchmarks
- Tools to plot charts from output data to evaluate results
DNS Shotgun is capable of simulating real client behaviour by replaying
captured traffic over selected protocol(s). The timing of original queries as
well as their content is kept intact.
## Caveats
This tool requires large amount of source PCAPs. These are ideally captured
directly on your network to simulate the behaviour of your own clients. The
captured PCAPs are then pre-processed into DNS Shotgun "pellets", which are
input files that contain the selected amount of simulated clients based on the
original traffic.
- Requires captured traffic from clients
- Setup for proper benchmarks can be quite complex
- Isn't suitable for testing with very low number of clients/queries
- Backward compatibility between versions isn't kept
Realistic high-performance benchmarking requires complex setup, especially for
TCP-based protocols. However, the authors of this tool have successfully used it
to benchmark and test various DNS implementations with up to hundreds of
thousands of clients (meaning _connections_ for TCP-based transports) using
commodity hardware.
## Documentation
## Input data
[https://knot.pages.nic.cz/shotgun](https://knot.pages.nic.cz/shotgun)
To have a realistic simulation of clients, no synthetic queries are created.
Instead, an input PCAP must be provided. There are the following assumptions:
## Showcase
- Each IP address represents a unique client.
- The packets are ordered by ascending time.
- Only UDP packets arriving to port 53 are used.
The following charts highlight the unique capabilities of DNS Shotgun.
Measurements are demonstrated using DNS over TCP. In our test setup, DNS
Shotgun was able to keep sending/receiving:
The PCAP is then sliced into the requested time periods, and DNS queries are
collected for each client. The output PCAP contains the exact same queries,
only the msgid is renumbered to be sequential (to avoid issues with multiple
in-flight TCP queries with potentially the same msgid).
- 400k queries per second over
- **500k simultaneously active TCP connections**, with about
- 25k handshakes per second, which amounts to
- 1.6M total established connections during the 60s test runtime.
The input data can be created with:
![Active Connections](docs/showcase/connections.png)
![Handshakes](docs/showcase/handshakes.png)
```
./pellet.py input.pcap -c CLIENTS -t TIME -r RESOLVER_IP
```
where `CLIENTS` is the number of required clients and `TIME` is the selected
time period. `RESOLVER_IP` is necessary to extract only the traffic towards the
resolver and not other upstream servers.
### Socket statistics on server
## Replaying the traffic
### UDP
```
./shotgun.lua -P udp -p 53 -s "::1" pellets.pcap
```
# ss -s
Total: 498799 (kernel 0)
TCP: 498678 (estab 498466, closed 52, orphaned 0, synrecv 0, timewait 54/0), ports 0
### TCP
```
./shotgun.lua -P tcp -p 53 -s "::1" pellets.pcap
./shotgun.lua -P tcp -p 53 -s "::1" -e 0 pellets.pcap # no idle timeout
```
### DNS-over-TLS (DoT)
Transport Total IP IPv6
* 0 - -
RAW 4 1 3
UDP 19 2 17
TCP 498626 5 498621
INET 498649 8 498641
FRAG 0 0 0
```
./shotgun.lua -P dot -p 853 -s "::1" pellets.pcap
./shotgun.lua -P dot -p 853 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" pellets.pcap
./shotgun.lua -P dot -p 853 -s "::1" --tls-priority "NORMAL:%NO_TICKETS" pellets.pcap
```
### DNS-over-HTTPS (DoH)
```
./shotgun.lua -P doh -p 443 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" pellets.pcap
./shotgun.lua -P doh -p 443 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" -M POST pellets.pcap
```
### High-performance benchmarking
```
./shotgun.lua \
-P tcp \
-s "fd00:dead:beef::cafe" \
-T 15 \
--bind-pattern "fd00:dead:beef::%x" \
--bind-num 8 \
pellets.pcap
```
To be able to scale-up to hundreds of thousands of TCP connections, multiple
source IP addresses are needed. It's possible to utilize [unique-local
addresses](https://en.wikipedia.org/wiki/Unique_local_address) in IPv6. Our rule
of thumb is to use one IP per every 30k clients (when the port range is extended
to allow 60k ephemeral ports).
Check out the kernel documentation for tuning the network stack for TCP. Other tips:
```
ulimit -n 1000000
sysctl -w net.ipv4.ip_local_port_range="1025 60999"
stsctl -w net.core.rmem_default="8192000"
```
The entire setup process is quite complex and repetitive when taking multiple
measurements. There is some ansible automation for DNS Shotgun in the
[resolver-benchmarking](https://gitlab.nic.cz/knot/resolver-benchmarking)
repository.
## Docker container
For ease of use, docker container with shotgun is available. Note that running
``--privileged`` can improve its performance by a few percent, if you don't mind
the security risk.
```
docker run registry.nic.cz/knot/shotgun:v20200914 --help
```
The following example can be used to test the prototype to simulate UDP clients.
Process captured PCAP and extract clients 50k clients within 30 seconds of traffic:
```
docker run \
-v "$PWD:/data:rw" \
registry.nic.cz/knot/shotgun/pellet:v20200914 \
-o /data/pellets.pcap \
-c 1000 \
-t 10 \
-r $RESOLVER_IP \
/data/captured.pcap
```
Replay the clients against IPv6 localhost server:
```
docker run \
--network host \
-v "$PWD:/data:rw" \
registry.nic.cz/knot/shotgun:v20200914 \
-O /data \
-s "::1" \
/data/pellets.pcap
```
## Interpreting the results
DNS Shotgun's output is one JSON file per every thread. These can be merged
together and then various plots describing the latencies, connection statistics
etc. can be generated using our utility scripts in the `tools/` directory.
### Test setup
## Dependencies
- DNS over TCP against [TCP echo server](https://gitlab.nic.cz/knot/echo-server)
- two physical servers: one for DNS Shotgun, another for the echo server
- both servers have 16 CPUs, 32 GB RAM, 10GbE network card (up to 64 queues)
- servers were connected directly to each other - no latency
- TCP network stack was tuned and there was no firewall
When using the sources, the following dependencies are needed.
## License
### pellet.py
DNS Shotgun is released under GPLv3 or later.
- python3
- python-dpkt (latest from git, commit 2c6aada35 or newer)
- python-dnspython
## Thanks
### shotgun.lua
We'd like to thank the [Comcast Innovation
Fund](https://innovationfund.comcast.com) for sponsoring the work to support
the use of TCP, DoT and DoH protocols.
- dnsjit 1.0.0 for UDP, TCP and DoT
- development version of dnsjit for DoH
DNS Shogun is built of top of the [dnsjit](https://github.com/DNS-OARC/dnsjit)
engine. We'd like to thank DNS-OARC and Jerry Lundström for the development and
continued support of dnsjit.
# Analyzing Clients
When you've created a pellets file that is ready to use for DNS Shotgun replay,
you may want to verify you didn't distort the original client population. There
is a tool that can be used to compare client distribution and activity between
the original traffic capture and the pellets file.
!!! note
This steps is optional and may not be neccessary for larger client
populations or for client populations with similar behaviour. Nevertheless,
it's better to check your assumptions.
First, you need to run client analysis script for both the original capture (or
rather the `filtered.pcap` file) and the processed pellets file.
```
$ pcap/count-packets-per-ip.lua -r filtered.pcap --csv filtered.csv
$ pcap/count-packets-per-ip.lua -r pellets.pcap --csv pellets.csv
```
Then, you can use another tool to plot a chart of these results.
```
$ tools/plot-client-distribution.py -o clients.png filtered.csv pellets.csv
```
## Client distribution chart
The following charts demonstrates how queries are distributed among clients. It
can be used to read how active are your clients or how many overall queries
your resolver receives from which clients.
!!! warning
The following chart displays absolute number of queries, not QPS. When
comparing multiple distributions, always make sure to use PCAPs of the same
duration.
![Client distribution chart](img/clients.png)
There are several blobs on the chart that represent groups of clients. The area
of the blob visually signifies the total amount of queries that were received
from these clients.
For each blob, you can locate its center and read the X and Y axes values.
Please note that both axis are logarithmic. On the Y-axis you can read the mean
number of queries that a client represented in the blob has sent. On the
X-axis, you can read the percentage of clients that are represented by this
blob.
In the example above, the first blob from the left shows that almost 80 % of
clients send less than 10 queries. Around 20 % of clients send between 10 and
100 queries. Even though the remaining clients represent around 1 % of the
total client population, we can see that these clients generate significant
query traffic.
The comparison shows the two samples are quite similar. In case these
differences are significant, you may want to consider changes to pellets files.
If you used `pcap/limit-clients.lua` to generate these, using a different
`-s/--seed` might help.
# Capturing Traffic
When replaying traffic using DNS Shotgun, you need to provide it with a PCAP
that contains extracted client data, or "*pellets*". You may not use an
arbitrary PCAP file. Instead, you must pre-process the raw PCAP capture into
pellets as described in the following sections.
!!! note
DNS Shotgun's measurements are only as good as the data you feed it.
Quality of input data that most accurately represents your clients is
crucial for realistic benchmarking. Results can vary greatly for different
client populations.
## Raw capture assumptions
To start, you need a traffic capture from your network to work with. It only
needs to contain UDP DNS queries from clients towards your resolver. Other
traffic may be present as well, but it will be filtered out.
### Packets must be sorted by increasing timestamp
Some network or hardware conditions may cause the packets to appear in
different order. To ensure correct order, use the `reodercap` command from
tshark/wireshark.
```
$ reordercap raw.pcap ordered.pcap
```
### Unique IP means unique client
Client needs to be somehow identified in the captured traffic. We decided to
use IP address to tell clients apart. This should be a reasonable assumption,
unless your clients are behind NAT.
!!! warning
If your real clients are behind NAT, this has major consequences and should
be acounted for, since multiple real clients will be bundled in a single
simulated one.
### Only UDP packets are used
If large number of your clients already use DoT, DoH or TCP, you need to
somehow get their queries into plain UDP format. For example, Knot Resolver can
[mirror](https://knot-resolver.readthedocs.io/en/v5.2.1/modules-policy.html#policy.MIRROR)
incoming queries to UDP.
## Filtering DNS queries
In this step, UDP DNS queries from clients are extracted from the raw PCAP. If
the raw capture includes queries from resolver to upstream servers, it is
_crucial_ to provide the script with resolver IP address(es) to filter out
outgoing queries.
```
$ pcap/filter-dnsq.lua -r ordered.pcap -w filtered.pcap -a $RESOLVER_IP
```
!!! tip
You may also use this script to work with traffic directly captured from
interface chosen with `-i`. See `--help` for usage.
# Configuration File
!!! tip
You can find configuration files for presets in
[`config/`](https://gitlab.nic.cz/knot/shotgun/-/tree/master/config). They
are an excellent starting point to create your own configurations.
Configuration is written in [TOML](https://toml.io/en/). There are multiple sections that may have additional subsections.
- `[traffic]` contains one or more subsections that each define client behaviour, including protocol
- `[charts]` is an optional section which can contain subsections that define charts that should be automatically plotted
- `[defaults.traffic]` is an optional section that makes it possible specify defaults shared by all traffic senders
## [traffic] section
You can define one or more traffic senders with specific client behaviour. Every traffic sender has a name and may have multiple parameters. At the very least, each traffic sender must define `protocol`.
This is an example of minimal configuration file sending all traffic as DNS-over-TLS using defaults for everything. The name of the traffic sender here is "DoT".
```
[traffic]
[traffic.DoT]
protocol = "dot"
```
The following configuration parameters for traffic senders are supported.
### protocol
- `udp`: DNS over UDP
- `tcp`: DNS over TCP
- `dot`: DNS over TLS over TCP
- `doh`: DNS over HTTP/2 over TLS over TCP
### weight
When multiple traffic senders are defined, weight affects the client
distribution between them. Weight is relative to the sum of all weights.
Integer or float. Defaults to 1.
### idle_time_s
Determines whether clients keep the connection in idle state, i.e. leaving it
established after they have received all answers and currently have no more
queries to send. Idle time of 0 means the client will close the connection as
soon as possible.
Integer. Defaults to 10 seconds.
### gnutls_priority
[GnuTLS priority string](https://gnutls.org/manual/html_node/Priority-Strings.html)
which can be used to select TLS protocol version and features, for example:
```
gnutls_priority = "NORMAL:%NO_TICKETS" # don't use TLS Session Resumption
gnutls_priority = "NORMAL:-VERS-ALL:+VERS-TLS1.3" # only use TLS 1.3
```
String. Defaults to `NORMAL` which is determined by the system's GnuTLS library.
### http_method
- `GET`
- `POST`
### timeout_s
Individual query timeout in seconds.
Integer. Defaults to 2 seconds.
!!! warning
Increasing the query timeout can negatively impact DNS Shotgun's
performance and is not recommended.
### handshake_timeout_s
Timeout for establishing a connection in seconds.
Integer. Defaults to 5 seconds.
### Advanced settings
You shouldn't use these unless you need to.
- `cpu_factor`: override the default CPU thread distribution (UDP: 1, TCP:2, DoT/DoH: 3)
- `max_clients`: number of clients each dnssim instance can hold (per-thread settings)
- `channel_size`: number of queries that can be buffered before thread starts to block
- `batch_size`: number of queries processed in each loop
### CLI overrides
The following options can be used to override the CLI options for `replay.py`.
Values in configuration file always take precedence before CLI options.
- `server`: target server's IPv4/IPv6 address
- `dns_port`: target server's port for plain DNS (UDP and TCP)
- `dot_port`: target server's port for DNS-over-TLS
- `doh_port`: target server's port for DNS-over-HTTPS
## [charts] section
This section is optional and is only provided as a convenience to automate
plotting charts after the test. Anything defined in this section can be
achieved by using the plotting scripts directly.
Similarly to the `[traffic]` section, it also contains named subsections. Every
such subsection must contain `type` which determines the charts that should be
plotted. For example:
```
[charts]
[charts.response-rate]
type = "response-rate"
```
### type
Type determines which chart will be plotted. The following charts are supported:
- `response-rate`: [Response Rate Chart](response-rate-chart.md)
- `latency`: [Latency Histogram](latency-histogram.md)
- `connections`: [Connection Chart](connection-chart.md)
### title
Title of the chart.
### output
Output filename for the chart. Various file extensions can be used. Defaults to using svg.
### Other parameters
These depend on the specific chart type. Generally, any option that can be
passed directly to the plotting scripts can also be specified in the config.
Refer to the tools `--help` for possible options.
## [defaults] section
### [defaults.traffic] section
This section can provide defaults for all traffic senders. If a specific
traffic sender re-defines the same parameter, the traffic sender-specific value
takes precedence before the default value.
Any parameter that can be specified for traffic senders in `[traffic]` section
can also be specified in this section. For example, to override the default
behavior to not use TLS Session Resumption, you can use:
```
[defaults]
[defaults.traffic]
gnutls_priority = "NORMAL:%NO_TICKETS"
```
# Configuration Presets
You can either use a configuration preset or create your own configuration. It
is possible to replay the original traffic over various different protocols
with different client behaviours simultaneously. For example, you can split
your traffic into 60 % UDP, 20 % DoT and 20 % DoH.
There are the following predefined use-cases for simplicity of use without the
need to create a configuration file. You can pass these values instead of
filepath to `-c/--config` option of `replay.py` utility.
- `udp`
- 100 % DNS-over-UDP clients
- `tcp`
- 100 % well-behaved DNS-over-TCP clients
- `dot`
- 100 % well-behaved DNS-over-TLS clients using TLS Session Resumption
- `doh`
- 50 % well-behaved DNS-over-HTTPS GET clients using TLS Session Resumption
- 50 % well-behaved DNS-over-HTTPS POST clients using TLS Session Resumption
- `mixed`
- 60 % DNS-over-UDP clients
- 5 % well-behaved DNS-over-TCP clients
- 5 % aggressive DNS-over-TCP clients
- 10 % well-behaved DNS-over-TLS clients using TLS Session Resumption
- 5 % well-behaved DNS-over-TLS clients without TLS Session Resumption
- 10 % well-behaved DNS-over-HTTPS GET clients using TLS Session Resumption
- 5 % well-behaved DNS-over-TLS POST clients using TLS Session Resumption
!!! note
You can find configuration files for presets in
[`config/`](https://gitlab.nic.cz/knot/shotgun/-/tree/master/config). They
are an excellent starting point to create your own configurations.
# Connection Chart
The connection chart can be used to visualize connection-related information,
such as the number of active established connections, handshake attempts,
successful TLS Session Resumptions or failed handshakes.
```
$ tools/plot-connections.py -k active -- DoT.json
$ tools/plot-connections.py -k tcp_hs tls_resumed failed_hs -t "Handshakes over Time" DoT.json
```
The optional parameter `-k/--kind` can be used to select which data should be
plotted. The following values are supported.
- `active` means the number of currently active established connections
- `tcp_hs` means the number of TCP handshake attempts in the last second
- `failed_hs` means the number of failed handshakes. All kinds of connection
setup failures will be included, whether it's TCP handshake timeout, TLS
negotiation failure or anything else.
- `tls_resumed` means the number of connection that were resumed with TLS
Session Resumption during the last second
!!! tip
Using the `--` to separate a list of JSON files after specifying
`-k/--kind` might be needed in some cases.
![connections](img/connections.png)
![handshakes](img/handshakes.png)
# Extracting Clients
Once you have the `filtered.pcap` with DNS queries from clients, you can
process them into *pellets* - the pre-processed input files for DNS Shotgun.
All the content of these files will be used during the replay stage - all
clients for the entire duration of the file.
The following example takes the entire `filtered.pcap` and transforms it into
pellets. The pellets file will contain all the clients and it will have the
same duration as the original file.
```
$ pcap/extract-clients.lua -r filtered.pcap -O $OUTPUT_DIR
```
The produced pellets file is ready to be used as the input for DNS Shotgun
replay.
## Splitting original capture into multiple pellets files
It can be useful to have a long original capture file, which contains more
clients and queries. However, since the pellets file will be replayed in its
entirety, you may want to split the original file into multiple pellets files
with shorter duration.
For example, if your initial capture file is 30 minutes long and you could
split it into fifteen two minute pellets files with the `-d/--duration` option.
```
$ pcap/extract-clients.lua -r filtered.pcap -O $OUTPUT_DIR -d 120
```
!!! tip
Is it useful to keep a collection of these original pellets files of same
duration. They can be later combined to create different test cases.
## Scaling-up the traffic
If you want to stress-test your infrastructure, you can combine these pellets
files together to effectively scale-up the traffic. The pellets files are
created in a way that you can simply use `mergecap` utility to combine them.
```
$ mergecap -w scaled.pcap $OUTPUT_DIR/*
```
## Limiting the traffic
It is also possible to take a pellets file and scale-down its traffic. This is
done on a per-client basis. Either client's entire query stream will be
present, or the client won't be present at all.
To limit the overall traffic, you can select the portion of the clients that
should be included. This can range from 0 to 1. For example, let's suppose we
want to scale-down the number of clients in the pellets file to 30 %.
```
$ pcap/limit-clients.lua -r pellets.pcap -w limited.pcap -l 0.3
```
docs/img/clients.png

118 KiB

docs/img/connections.png

30.4 KiB

docs/img/handshakes.png

42.4 KiB

docs/img/latency.png

109 KiB

docs/img/response-rate.png

41.9 KiB

# DNS Shotgun
Realistic DNS benchmarking tool which supports multiple transport protocols:
- **DNS-over-TLS (DoT)**
- **DNS-over-HTTPS (DoH)**
- UDP
- TCP
*DNS Shotgun is capable of simulating hundreds of thousands of DoT/DoH
clients.*
Every client establishes its own connection(s) when communicating over
TCP-based protocol. This makes the tool uniquely suited for realistic DoT/DoH
benchmarks since its traffic patterns are very similar to real clients.
DNS Shotgun exports a number of statistics, such as query latencies, number of
handshakes and connections, response rate, response codes etc. in JSON format.
The toolchain also provides scripts that can plot these into readable charts.
## Features
- Supports DNS over UDP, TCP, TLS and HTTP/2
- Allows mixed-protocol simultaneous benchmark/testing
- Can bind to multiple source IP addresses
- Customizable client behaviour (idle time, TLS versions, HTTP method, ...)
- Replays captured queries over selected protocol(s) while keeping original timing
- Suitable for high-performance realistic benchmarks
- Tools to plot charts from output data to evaluate results
## Caveats
- Requires captured traffic from clients
- Setup for proper benchmarks can be quite complex
- Isn't suitable for testing with very low number of clients/queries
- Backward compatibility between versions isn't kept
## Code Repository
[https://gitlab.nic.cz/knot/shotgun](https://gitlab.nic.cz/knot/shotgun)
# Installation
There are two options for using DNS Shotgun. You can either install the
dependencies and use the scripts from the repository directly, or use a
pre-built docker image.
## Using script directly
You can use the toolchain scripts directly from the git repository. You need to
ensure you have the required dependencies installed. Also make sure to check
out some tagged version, as the development happens in master branch.
```
$ git clone https://gitlab.nic.cz/knot/shotgun.git
$ git checkout v20210203
```
### Dependencies
When using the scripts directly, the following dependencies are needed. If you
only wish to process shotgun JSON output (e.g. plot charts), then dnsjit isn't
required.
- [dnsjit](https://github.com/DNS-OARC/dnsjit): Can be installed from [DNS-OARC
repositories](https://dev.dns-oarc.net/packages/).
- Python 3.6 or later
- Python dependencies from [requirements.txt](https://gitlab.nic.cz/knot/shotgun/-/blob/master/requirements.txt)
- (optional) tshark/wireshark for some PCAP pre-processing
## Docker Image
Pre-built image can be obtained from [CZ.NIC DNS Shotgun
Registry](https://gitlab.nic.cz/knot/shotgun/container_registry/65).
```
$ docker pull registry.nic.cz/knot/shotgun:v20210203
```
Alternately, you can build the image yourself from Dockerfile in the repository.
### Docker Usage
- Make sure to run with `--network host`.
- Mount input/output directories and files with `-v/--volume`.
- Using `--privileged` might slightly improve performance if you don't mind the security risk.
```
$ docker run \
--network host \
-v "$PWD:/mnt" \
registry.nic.cz/knot/shotgun:v20210203 \
$COMMAND
```
# Key Concepts
DNS Shotgun is capable of simulating real client behaviour by replaying
captured traffic over selected protocol(s). The timing of original queries as
well as their content is kept intact.
Realistic high-performance benchmarking requires complex setup, especially for
TCP-based protocols. However, the authors of this tool have successfully used it
to benchmark and test various DNS implementations with up to hundreds of
thousands of clients (meaning _connections_ for TCP-based transports) using
commodity hardware. This requires [performance tuning](performance-tuning.md)
that is described in later section.
## Client
These docs often mention "*client*" and we often use it to describe DNS
infrastructure throughput in addition to queries per second (QPS). What is a
considered a client and why does it matter?
A client is the origin of one or more queries and it is supposed to represent a
single device, i.e. anything from a CPE such as home/office router to a mobile
device. Since traffic patterns of various devices can vary greatly, it is
crucial to use traffic that most accurately represents your real clients.
In plain DNS sent over UDP the concept of client doesn't matter, since UDP is a
stateless protocol and a packet is just a packet. Thus, QPS throughput may be
sufficient metric for UDP.
In stateful DNS protocols, such as DoT, DoH or TCP, much of the overhead and
performance cost is caused by establishing the connection over which queries
are subsequently sent. Therefore, the concept of client becomes crucial for
benchmarking stateful protocols.
!!! note
As an extreme example, consider 10k QPS sent over a single DoH connection
versus establishing a 10k DoH connections, each with 1 QPS. While both
scenarios have the same overall QPS, the second one will consume vastly more
resources, especially when establishing the connections.
### Client replay guarantees
DNS Shotgun aims to provide the most realistic client behaviour when replaying
the traffic. When you run DNS Shotgun, there are the following guarantees when
using a stateful protocol.
- **Multiple clients never share a single connection.**
- **Each client attempts to establish at least one connection.**
- **A client may have zero, one or more (rarely) active established connections
at any time**, depending on its traffic and behavior.
## Real traffic
A key focus of this toolchain is to make the benchmarks as realistic as
possible. Therefore, no synthetic queries or clients are generated. To
effectively use this tool, you need to have large amount of source PCAPs.
Ideally, these contain the traffic from your own network.
!!! note
In case you'd prefer to use synthetic client/queries anyway, you can just
generate the traffic and capture it in PCAP for further processing. Doing that
is outside of the scope of this documentation.
### Traffic replay guarantees
- **Content of DNS messages is left intact.** Messages without proper DNS header
or question section will be discarded.
- **Timing of the DNS messages is kept as close to the original traffic as
possible.** If the tool detects time skew larger than one second, it aborts the
test. However, real time difference may be slightly longer due to various
buffers.
# Latency Histogram
This very useful chart is a bit difficult to read and understand, but it
provides a great deal of information about the overall latency from client side
perspective. We use the logarithmic percentile histogram to display this data.
[This
article](https://blog.powerdns.com/2017/11/02/dns-performance-metrics-the-logarithmic-percentile-histogram/)
provides an in-depth explanation about the chart and how to interpret it.
```
$ tools/plot-latency.py -t "DNS Latency Overhead" UDP.json TCP.json DoT.json DoH.json
```
![latency overhead](img/latency.png)
The chart above illustrates why comparing just the response rate isn't a
sufficient metric. For all protocols compared in this case, you'd get around
99.5 % response rate. However, when you examine the client latency, you can see
clear differences.
In the chart, 80 % of all queries are represented by the rightmost part of the
chart - between the "slowest percentile" of 20 % and 100 %. For these
queries, the latency for UDP, TCP, DoT or DoH is the same, which is one
round trip. These represent immediate answers from the resolver (e.g. cached or
refused), which are sent either over UDP or over an already established
connection (for stateful protocols). The latency is 10 ms, or 1 RTT.
The most interesting part is between the 5 % and 20 % slowest percentile. For
these 15 % of all queries, there are major differences between the latency of
UDP, TCP and DoT/DoH. This illustrates the latency cost of setting up a
connection where none is present. UDP is stateless and requires just 1 RTT. TCP
requires an extra round trip to establish the connection and the latency for the
client becomes 2 RTTs. Finally, both DoT and DoH require an additional round
trip for the TLS handshake and thus the overall latency cost becomes 3 RTTs.
The trailing 5 % of queries show no difference between protocols, since these
are queries that aren't answered from cache and the delay is introduced by the
communication between the resolver and the upstream servers. The last 0.5 % of
queries aren't answered by the resolver within 2 seconds and are considered a
timeout by the client.
# Performance Tuning
Any high-performance benchmark setup requires separate server for generating
traffic which then sends the traffic to the target server under test. In order
to scale-up DNS Shotgun to be able to perform well under heavy load, some
performance tuning and network adjustments are needed.
!!! tip
An example of performance tuning we use in our benchmarks can be found in
our [ansible
role](https://gitlab.nic.cz/knot/resolver-benchmarking/-/tree/master/roles/tuning).
## Number of file descriptors
Make sure the number of available file descriptors is sufficient. It's
typically necessary when running DNS Shotgun from terminal. When using docker,
the defaults are usually sufficient.
```
$ ulimit -n 1000000
```
## Ephemeral port range
Extending the ephemeral port range gives the tool more outgoing ports to work with.
```
$ sysctl -w net.ipv4.ip_local_port_range="1025 60999"
```
## NIC queues
High-end network cards typically has multiple queues. Ideally, you want to set
their number to be the same as number of available CPUs.
```
$ ethtool -L $INTERFACE combined $NCPU
```
!!! note
It's important that the NIC interrupts from different queues are handled
by different CPUs. If there are throughput issues, you may want to verify
this is the case.
## UDP
DNS Shotgun can generate quite bursty traffic. Increasing the receiving
server's socket memory can help to prevent that. If this buffer isn't
sufficient, it can cause packet loss.
```
$ sysctl -w net.core.rmem_default="8192000"
```
## TCP, DoT, DoH
Tuning the network stack for TCP isn't as straightforward and it's network-card
specific. It's best to refer to [kernel
documentation](https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/intel/ixgb.html#improving-performance)
for your specific network card.
## conntrack
For our benchmarks, we don't use iptables or any firewall. Especially the
`conntrack` module probably won't be able to handle serious load. Make sure the
conntrack module isn't loaded by kernel if you're not using it.
# Raw Output
In the output directory of DNS Shotgun's `replay.py` tool, the following
structure is created. Let's assume we ran a configuration that configure two
traffic senders - `DoT` and `DoH`.
```
$OUTDIR
├── .config # ignore this directory
│ └── luaconfig.lua # for debugging purposes only
├── data # directory with raw JSON output
│ ├── DoH # "DoH" traffic sender data
│ │ ├── DoH-01.json # raw data from first thread of DoH traffic sender
│ │ ├── DoH-02.json # raw data from second thread of DoH traffic sender
│ │ └── ... # raw data from other threads of DoH traffic sender
│ ├── DoH.json # merged raw data from all DoH sender threads
│ ├── DoT # "DoT" traffic sender data
│ │ ├── DoT-01.json # raw data from first thread of DoT traffic sender
│ │ ├── DoT-02.json # raw data from second thread of DoT traffic sender
│ │ └── ... # raw data from other threads of DoT traffic sender
│ └── DoT.json # merged raw data from all DoT sender threads
└── charts # directory with automatically plotted charts (if configured)
├── latency.svg # chart comparing latency of DoT and DoH clients
└── response-rate.svg # chart comparing the response rate of DoT and DoH clients
```
## data directory
This directory contains the raw JSON data. Since DNS Shotgun typically operates
with multiple threads, the results for each traffic sender are also provided
per each thread. However, since you typically don't care about the clients were
emulated, but only about their aggregate behaviour, a data file that contains
the combined results of all threads belonging to the configured traffic sender
is also provided.
Every configured traffic sender will have its own output directory of the same
name. Inside, per-thread raw data are available. The aggregate file is directly
in the `data/` directory as JSON file with the name of the configured traffic
sender. The aggregate file is the one you typically want to use.
!!! note
The raw JSON file is versioned and is not intended to be forward or
backward compatible with various DNS Shotgun versions. You should use the
same version of the toolchain for both replay and interpreting the data.
!!! tip
If you wish to explore, format or interpret the raw JSON data,
[jq](https://stedolan.github.io/jq/) utility can be useful for some
rudimentary processing.
## charts directory
This directory may not be present if you didn't configure any charts to be
automatically plotted in the configuration file. If it is available, it
contains the plotted charts that are described in the following sections.
When charts are plotted automatically, they always display data for all the
configure traffic senders with their predefined names. If you wish to customize
it, omit certain senders etc., you can use the plotting scripts
directly from CLI. These can be found in the `tools/` directory and you can
refer to their `--help` for usage.
# Replaying Traffic
Once you've prepared the input pellets file with clients and either have you
own configuration file or know which present you want to use, you can the the
following scripts to run DNS Shotgun.
```
$ replay.py -r pellets.pcap -c udp -s ::1
```
!!! tip
Use the `--help` option to explore other options.
During the replay, there is quite a bit of logging information that look like
this.
```
UDP-01 notice: total processed: 267; answers: 0; discarded: 2; ongoing: 172
```
The important thing to look out for is the number of `discarded` packets. In
case nearly all the packets are discarded or a large portion of them, it almost
certainly indicates some improper setup or input data. The test should be
aborted and the reason should be investigated. Increasing the `-v/--verbosity`
level might help.
## Binding to multiple source addresses
When sending traffic against a single IP/port combination of the target server,
the source IP address has a limited number of ports it can utilize. A single
IP address is insufficient to achieve hundreds of thousands of clients.
DNS Shotgun can bind to multiple sources addresses with the `-b/--bind-net`
option. You can specify either IP address or a newtork range using CIDR
notation. Multiple values (either IPs, ranges or any combination of those) can
be specified. When using CIDR notation, the network and broadcast address won't
be used.
```
$ replay.py -r pellets.pcap -c tcp -s fd00:dead:beef::cafe -b fd00:dead:beef::/124
```
!!! tip
Our rule of thumb is to use at least one source IP address per every 30k
clients. However, using more addresses is certainly better and can help to
avoid weird behaviour, slow performance and other issues that require
in-depth troubleshooting.
!!! note
If you're limited by the number of source addresses you can use, utilizing
either IPv6 unique-local addresses (fd00::/8) or private IPv4 ranges could
be helpful.
## Emulating link latency
!!! warning
This is an advanced topic and emulating latency isn't necessary for many
scenarios.
Overall latency will affect the user's experience with DNS resolution. It also
becomes much more relevant when using TCP and TLS, since the handshakes
introduce additional round trips. When benchmarks are done in the data center
with two servers that are directly connected to each other with practically no
latency, it can provide a skewed view of the expected end user latency.
Luckily, the `netem` Network Emulator makes it very simple to emulate various
network conditions. For example, emulating latency on the sender side can be
done quite easily. The following command adds 10 ms latency to outgoing
packets, effectively simulating RTT of 10 ms.
```
$ tc qdisc add dev $INTERFACE root netem limit 10000000 delay 10ms
```
!!! tip
For more possibilities, refer to `man netem.8`. Using a sufficiently large
buffer (limit) is essential for proper operation.
However, beware that the settings affect the entire interface. If you're going
to emulate latency, it's best if the resolver-client traffic is on a separate
interface, so the resolver-upstream traffic isn't negatively impacted.
# Response Rate Chart
This basic chart can display the overall response rate over time. It is also
possible to plot specific error codes, such as `NOERROR`.
```
$ tools/plot-response-rate.py -r 0 -o rr.png UDP.json
```
!!! tip
The image format depends on the output filename extension chosen with can
`-o/--output`. `svg` is used by default, but other formats such as `png`
are supported as well.
The following chart displays the answer rate and the rate of `NOERROR` answers.
In this measurement, the resolver was started with a cold cache. We can see the
overall response rate is close to 100 %. The `NOERROR` response rate slightly
increases over time from 72 % to around 75 % as the cache warms up.
![UDP response rate](img/response-rate.png)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment