docs: add proper documentation

a39ebf18 · Tomas Krizek · 21189c77 · a39ebf18 · a39ebf18 · a39ebf18
Verified Commit a39ebf18 authored 4 years ago by Tomas Krizek
--- a/README.md
+++ b/README.md
@@ -2,188 +2,91 @@

 Realistic DNS benchmarking tool which supports multiple transport protocols:

+  - **DNS-over-TLS (DoT)**
+  - **DNS-over-HTTPS (DoH)**
  - UDP
  - TCP
-  - DNS-over-TLS (DoT)
-  - DNS-over-HTTPS (DoH)

-*DNS Shotgun is capable of simulating hundreds of thousands of clients.*
+*DNS Shotgun is capable of simulating hundreds of thousands of DoT/DoH
+clients.*

-Every client establishes its own connection when communicating over TCP-based
-protocol. This makes the tool uniquely suited for realistic benchmarking since
-its traffic patterns are very similar to real clients.
+Every client establishes its own connection(s) when communicating over
+TCP-based protocol. This makes the tool uniquely suited for realistic DoT/DoH
+benchmarks since its traffic patterns are very similar to real clients.

-## Current status (2020-09-14)
+DNS Shotgun exports a number of statistics, such as query latencies, number of
+handshakes and connections, response rate, response codes etc. in JSON format.
+The toolchain also provides scripts that can plot these into readable charts.

- fully supported UDP, TCP and DNS-over-TLS with
-  [dnsjit](https://github.com/DNS-OARC/dnsjit) 1.0.0
- fully supported DNS-over-HTTPS with development version of dnsjit
- traffic can be replayed only over IPv6
- user interface
-    - may be unstable
-    - only very basic UI available
-    - more complex scenarios are no supported yet
-      (e.g. simultaneously using multiple protocols)
- pellet.py is functional, but it is very slow and requires python-dpkt from
-  master
+## Features

-## Overview
+- Supports DNS over UDP, TCP, TLS and HTTP/2
+- Allows mixed-protocol simultaneous benchmark/testing
+- Can bind to multiple source IP addresses
+- Customizable client behaviour (idle time, TLS versions, HTTP method, ...)
+- Replays captured queries over selected protocol(s) while keeping original timing
+- Suitable for high-performance realistic benchmarks
+- Tools to plot charts from output data to evaluate results

-DNS Shotgun is capable of simulating real client behaviour by replaying
-captured traffic over selected protocol(s). The timing of original queries as
-well as their content is kept intact.
+## Caveats

-This tool requires large amount of source PCAPs. These are ideally captured
-directly on your network to simulate the behaviour of your own clients. The
-captured PCAPs are then pre-processed into DNS Shotgun "pellets", which are
-input files that contain the selected amount of simulated clients based on the
-original traffic.
+- Requires captured traffic from clients
+- Setup for proper benchmarks can be quite complex
+- Isn't suitable for testing with very low number of clients/queries
+- Backward compatibility between versions isn't kept

-Realistic high-performance benchmarking requires complex setup, especially for
-TCP-based protocols. However, the authors of this tool have successfully used it
-to benchmark and test various DNS implementations with up to hundreds of
-thousands of clients (meaning _connections_ for TCP-based transports) using
-commodity hardware.
+## Documentation

-## Input data
+[https://knot.pages.nic.cz/shotgun](https://knot.pages.nic.cz/shotgun)

-To have a realistic simulation of clients, no synthetic queries are created.
-Instead, an input PCAP must be provided. There are the following assumptions:
+## Showcase

- Each IP address represents a unique client.
- The packets are ordered by ascending time.
- Only UDP packets arriving to port 53 are used.
+The following charts highlight the unique capabilities of DNS Shotgun.
+Measurements are demonstrated using DNS over TCP.  In our test setup, DNS
+Shotgun was able to keep sending/receiving:

-The PCAP is then sliced into the requested time periods, and DNS queries are
-collected for each client. The output PCAP contains the exact same queries,
-only the msgid is renumbered to be sequential (to avoid issues with multiple
-in-flight TCP queries with potentially the same msgid).
+- 400k queries per second over
+- **500k simultaneously active TCP connections**, with about
+- 25k handshakes per second, which amounts to
+- 1.6M total established connections during the 60s test runtime.

-The input data can be created with:
+![Active Connections](docs/showcase/connections.png)
+![Handshakes](docs/showcase/handshakes.png)

-```
-./pellet.py input.pcap -c CLIENTS -t TIME -r RESOLVER_IP
-```
-
-where `CLIENTS` is the number of required clients and `TIME` is the selected
-time period. `RESOLVER_IP` is necessary to extract only the traffic towards the
-resolver and not other upstream servers.
+### Socket statistics on server

-## Replaying the traffic
-
-### UDP
-
-```
-./shotgun.lua -P udp -p 53 -s "::1" pellets.pcap
 ```
+# ss -s
+Total: 498799 (kernel 0)
+TCP:   498678 (estab 498466, closed 52, orphaned 0, synrecv 0, timewait 54/0), ports 0

-### TCP
-
-```
-./shotgun.lua -P tcp -p 53 -s "::1" pellets.pcap
-./shotgun.lua -P tcp -p 53 -s "::1" -e 0  pellets.pcap  # no idle timeout
-```
-
-### DNS-over-TLS (DoT)
-
+Transport Total     IP        IPv6
+*        0         -         -
+RAW      4         1         3
+UDP      19        2         17
+TCP      498626    5         498621
+INET     498649    8         498641
+FRAG     0         0         0
 ```
-./shotgun.lua -P dot -p 853 -s "::1" pellets.pcap
-./shotgun.lua -P dot -p 853 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" pellets.pcap
-./shotgun.lua -P dot -p 853 -s "::1" --tls-priority "NORMAL:%NO_TICKETS" pellets.pcap
-```
-
-### DNS-over-HTTPS (DoH)
-
-```
-./shotgun.lua -P doh -p 443 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" pellets.pcap
-./shotgun.lua -P doh -p 443 -s "::1" --tls-priority "NORMAL:-VERS-ALL:+VERS-TLS1.3" -M POST pellets.pcap
-```
-
-### High-performance benchmarking
-
-```
-./shotgun.lua \
-	-P tcp \
-	-s "fd00:dead:beef::cafe" \
-	-T 15 \
-	--bind-pattern "fd00:dead:beef::%x" \
-	--bind-num 8 \
-	pellets.pcap
-```
-
-To be able to scale-up to hundreds of thousands of TCP connections, multiple
-source IP addresses are needed. It's possible to utilize [unique-local
-addresses](https://en.wikipedia.org/wiki/Unique_local_address) in IPv6. Our rule
-of thumb is to use one IP per every 30k clients (when the port range is extended
-to allow 60k ephemeral ports).
-
-Check out the kernel documentation for tuning the network stack for TCP. Other tips:
-
-```
-ulimit -n 1000000
-sysctl -w net.ipv4.ip_local_port_range="1025 60999"
-stsctl -w net.core.rmem_default="8192000"
-```
-
-The entire setup process is quite complex and repetitive when taking multiple
-measurements. There is some ansible automation for DNS Shotgun in the
-[resolver-benchmarking](https://gitlab.nic.cz/knot/resolver-benchmarking)
-repository.
-
-## Docker container
-
-For ease of use, docker container with shotgun is available. Note that running
-``--privileged`` can improve its performance by a few percent, if you don't mind
-the security risk.
-
-```
-docker run registry.nic.cz/knot/shotgun:v20200914 --help
-```
-
-The following example can be used to test the prototype to simulate UDP clients.
-
-Process captured PCAP and extract clients 50k clients within 30 seconds of traffic:
-
-```
-docker run \
-	-v "$PWD:/data:rw" \
-	registry.nic.cz/knot/shotgun/pellet:v20200914 \
-	-o /data/pellets.pcap \
-	-c 1000 \
-	-t 10 \
-	-r $RESOLVER_IP \
-	/data/captured.pcap
-```
-
-Replay the clients against IPv6 localhost server:
-
-```
-docker run \
-	--network host \
-	-v "$PWD:/data:rw" \
-	registry.nic.cz/knot/shotgun:v20200914 \
-	-O /data \
-	-s "::1" \
-	/data/pellets.pcap
-```
-
-## Interpreting the results

-DNS Shotgun's output is one JSON file per every thread. These can be merged
-together and then various plots describing the latencies, connection statistics
-etc. can be generated using our utility scripts in the `tools/` directory.
+### Test setup

-## Dependencies
+- DNS over TCP against [TCP echo server](https://gitlab.nic.cz/knot/echo-server)
+- two physical servers: one for DNS Shotgun, another for the echo server
+- both servers have 16 CPUs, 32 GB RAM, 10GbE network card (up to 64 queues)
+- servers were connected directly to each other - no latency
+- TCP network stack was tuned and there was no firewall

-When using the sources, the following dependencies are needed.
+## License

-### pellet.py
+DNS Shotgun is released under GPLv3 or later.

- python3
- python-dpkt (latest from git, commit 2c6aada35 or newer)
- python-dnspython
+## Thanks

-### shotgun.lua
+We'd like to thank the [Comcast Innovation
+Fund](https://innovationfund.comcast.com) for sponsoring the work to support
+the use of TCP, DoT and DoH protocols.

- dnsjit 1.0.0 for UDP, TCP and DoT
- development version of dnsjit for DoH
+DNS Shogun is built of top of the [dnsjit](https://github.com/DNS-OARC/dnsjit)
+engine. We'd like to thank DNS-OARC and Jerry Lundström for the development and
+continued support of dnsjit.
--- a/docs/analyzing-clients.md
+++ b/docs/analyzing-clients.md
+# Analyzing Clients
+
+When you've created a pellets file that is ready to use for DNS Shotgun replay,
+you may want to verify you didn't distort the original client population. There
+is a tool that can be used to compare client distribution and activity between
+the original traffic capture and the pellets file.
+
+!!! note
+    This steps is optional and may not be neccessary for larger client
+    populations or for client populations with similar behaviour. Nevertheless,
+    it's better to check your assumptions.
+
+First, you need to run client analysis script for both the original capture (or
+rather the `filtered.pcap` file) and the processed pellets file.
+
+```
+$ pcap/count-packets-per-ip.lua -r filtered.pcap --csv filtered.csv
+$ pcap/count-packets-per-ip.lua -r pellets.pcap --csv pellets.csv
+```
+
+Then, you can use another tool to plot a chart of these results.
+
+```
+$ tools/plot-client-distribution.py -o clients.png filtered.csv pellets.csv
+```
+
+## Client distribution chart
+
+The following charts demonstrates how queries are distributed among clients. It
+can be used to read how active are your clients or how many overall queries
+your resolver receives from which clients.
+
+!!! warning
+    The following chart displays absolute number of queries, not QPS. When
+    comparing multiple distributions, always make sure to use PCAPs of the same
+    duration.
+
+![Client distribution chart](img/clients.png)
+
+There are several blobs on the chart that represent groups of clients. The area
+of the blob visually signifies the total amount of queries that were received
+from these clients.
+
+For each blob, you can locate its center and read the X and Y axes values.
+Please note that both axis are logarithmic. On the Y-axis you can read the mean
+number of queries that a client represented in the blob has sent. On the
+X-axis, you can read the percentage of clients that are represented by this
+blob.
+
+In the example above, the first blob from the left shows that almost 80 % of
+clients send less than 10 queries. Around 20 % of clients send between 10 and
+100 queries. Even though the remaining clients represent around 1 % of the
+total client population, we can see that these clients generate significant
+query traffic.
+
+The comparison shows the two samples are quite similar. In case these
+differences are significant, you may want to consider changes to pellets files.
+If you used `pcap/limit-clients.lua` to generate these, using a different
+`-s/--seed` might help.
--- a/docs/capturing-traffic.md
+++ b/docs/capturing-traffic.md
+# Capturing Traffic
+
+When replaying traffic using DNS Shotgun, you need to provide it with a PCAP
+that contains extracted client data, or "*pellets*". You may not use an
+arbitrary PCAP file. Instead, you must pre-process the raw PCAP capture into
+pellets as described in the following sections.
+
+!!! note
+    DNS Shotgun's measurements are only as good as the data you feed it.
+    Quality of input data that most accurately represents your clients is
+    crucial for realistic benchmarking. Results can vary greatly for different
+    client populations.
+
+## Raw capture assumptions
+
+To start, you need a traffic capture from your network to work with. It only
+needs to contain UDP DNS queries from clients towards your resolver. Other
+traffic may be present as well, but it will be filtered out.
+
+### Packets must be sorted by increasing timestamp
+
+Some network or hardware conditions may cause the packets to appear in
+different order. To ensure correct order, use the `reodercap` command from
+tshark/wireshark.
+
+```
+$ reordercap raw.pcap ordered.pcap
+```
+
+### Unique IP means unique client
+
+Client needs to be somehow identified in the captured traffic. We decided to
+use IP address to tell clients apart. This should be a reasonable assumption,
+unless your clients are behind NAT.
+
+!!! warning
+    If your real clients are behind NAT, this has major consequences and should
+    be acounted for, since multiple real clients will be bundled in a single
+    simulated one.
+
+### Only UDP packets are used
+
+If large number of your clients already use DoT, DoH or TCP, you need to
+somehow get their queries into plain UDP format. For example, Knot Resolver can
+[mirror](https://knot-resolver.readthedocs.io/en/v5.2.1/modules-policy.html#policy.MIRROR)
+incoming queries to UDP.
+
+## Filtering DNS queries
+
+In this step, UDP DNS queries from clients are extracted from the raw PCAP. If
+the raw capture includes queries from resolver to upstream servers, it is
+_crucial_ to provide the script with resolver IP address(es) to filter out
+outgoing queries.
+
+```
+$ pcap/filter-dnsq.lua -r ordered.pcap -w filtered.pcap -a $RESOLVER_IP
+```
+
+!!! tip
+    You may also use this script to work with traffic directly captured from
+    interface chosen with `-i`. See `--help` for usage.
--- a/docs/configuration-file.md
+++ b/docs/configuration-file.md
+# Configuration File
+
+!!! tip
+    You can find configuration files for presets in
+    [`config/`](https://gitlab.nic.cz/knot/shotgun/-/tree/master/config).  They
+    are an excellent starting point to create your own configurations.
+
+Configuration is written in [TOML](https://toml.io/en/). There are multiple sections that may have additional subsections.
+
+- `[traffic]` contains one or more subsections that each define client behaviour, including protocol
+- `[charts]` is an optional section which can contain subsections that define charts that should be automatically plotted
+- `[defaults.traffic]` is an optional section that makes it possible specify defaults shared by all traffic senders
+
+## [traffic] section
+
+You can define one or more traffic senders with specific client behaviour. Every traffic sender has a name and may have multiple parameters. At the very least, each traffic sender must define `protocol`.
+
+This is an example of minimal configuration file sending all traffic as DNS-over-TLS using defaults for everything. The name of the traffic sender here is "DoT".
+
+```
+[traffic]
+[traffic.DoT]
+protocol = "dot"
+```
+
+The following configuration parameters for traffic senders are supported.
+
+### protocol
+
+- `udp`: DNS over UDP
+- `tcp`: DNS over TCP
+- `dot`: DNS over TLS over TCP
+- `doh`: DNS over HTTP/2 over TLS over TCP
+
+### weight
+
+When multiple traffic senders are defined, weight affects the client
+distribution between them.  Weight is relative to the sum of all weights.
+
+Integer or float. Defaults to 1.
+
+
+### idle_time_s
+
+Determines whether clients keep the connection in idle state, i.e. leaving it
+established after they have received all answers and currently have no more
+queries to send.  Idle time of 0 means the client will close the connection as
+soon as possible.
+
+Integer. Defaults to 10 seconds.
+
+### gnutls_priority
+
+[GnuTLS priority string](https://gnutls.org/manual/html_node/Priority-Strings.html)
+which can be used to select TLS protocol version and features, for example:
+
+```
+gnutls_priority = "NORMAL:%NO_TICKETS"  # don't use TLS Session Resumption
+gnutls_priority = "NORMAL:-VERS-ALL:+VERS-TLS1.3"  # only use TLS 1.3
+```
+
+String. Defaults to `NORMAL` which is determined by the system's GnuTLS library.
+
+### http_method
+
+- `GET`
+- `POST`
+
+### timeout_s
+
+Individual query timeout in seconds.
+
+Integer. Defaults to 2 seconds.
+
+!!! warning
+    Increasing the query timeout can negatively impact DNS Shotgun's
+    performance and is not recommended.
+
+### handshake_timeout_s
+
+Timeout for establishing a connection in seconds.
+
+Integer. Defaults to 5 seconds.
+
+### Advanced settings
+
+You shouldn't use these unless you need to.
+
+- `cpu_factor`: override the default CPU thread distribution (UDP: 1, TCP:2, DoT/DoH: 3)
+- `max_clients`: number of clients each dnssim instance can hold (per-thread settings)
+- `channel_size`: number of queries that can be buffered before thread starts to block
+- `batch_size`: number of queries processed in each loop
+
+### CLI overrides
+
+The following options can be used to override the CLI options for `replay.py`.
+Values in configuration file always take precedence before CLI options.
+
+- `server`: target server's IPv4/IPv6 address
+- `dns_port`: target server's port for plain DNS (UDP and TCP)
+- `dot_port`: target server's port for DNS-over-TLS
+- `doh_port`: target server's port for DNS-over-HTTPS
+
+## [charts] section
+
+This section is optional and is only provided as a convenience to automate
+plotting charts after the test. Anything defined in this section can be
+achieved by using the plotting scripts directly.
+
+Similarly to the `[traffic]` section, it also contains named subsections. Every
+such subsection must contain `type` which determines the charts that should be
+plotted. For example:
+
+```
+[charts]
+[charts.response-rate]
+type = "response-rate"
+```
+
+### type
+
+Type determines which chart will be plotted. The following charts are supported:
+
+- `response-rate`: [Response Rate Chart](response-rate-chart.md)
+- `latency`: [Latency Histogram](latency-histogram.md)
+- `connections`: [Connection Chart](connection-chart.md)
+
+### title
+
+Title of the chart.
+
+### output
+
+Output filename for the chart. Various file extensions can be used. Defaults to using svg.
+
+### Other parameters
+
+These depend on the specific chart type. Generally, any option that can be
+passed directly to the plotting scripts can also be specified in the config.
+Refer to the tools `--help` for possible options.
+
+## [defaults] section
+
+### [defaults.traffic] section
+
+This section can provide defaults for all traffic senders. If a specific
+traffic sender re-defines the same parameter, the traffic sender-specific value
+takes precedence before the default value.
+
+Any parameter that can be specified for traffic senders in `[traffic]` section
+can also be specified in this section. For example, to override the default
+behavior to not use TLS Session Resumption, you can use:
+
+```
+[defaults]
+[defaults.traffic]
+gnutls_priority = "NORMAL:%NO_TICKETS"
+```
--- a/docs/configuration-presets.md
+++ b/docs/configuration-presets.md
+# Configuration Presets
+
+You can either use a configuration preset or create your own configuration. It
+is possible to replay the original traffic over various different protocols
+with different client behaviours simultaneously. For example, you can split
+your traffic into 60 % UDP, 20 % DoT and 20 % DoH.
+
+There are the following predefined use-cases for simplicity of use without the
+need to create a configuration file. You can pass these values instead of
+filepath to `-c/--config` option of `replay.py` utility.
+
+- `udp`
+    - 100 % DNS-over-UDP clients
+- `tcp`
+    - 100 % well-behaved DNS-over-TCP clients
+- `dot`
+    - 100 % well-behaved DNS-over-TLS clients using TLS Session Resumption
+- `doh`
+    - 50 % well-behaved DNS-over-HTTPS GET clients using TLS Session Resumption
+    - 50 % well-behaved DNS-over-HTTPS POST clients using TLS Session Resumption
+- `mixed`
+    - 60 % DNS-over-UDP clients
+    - 5 % well-behaved DNS-over-TCP clients
+    - 5 % aggressive DNS-over-TCP clients
+    - 10 % well-behaved DNS-over-TLS clients using TLS Session Resumption
+    - 5 % well-behaved DNS-over-TLS clients without TLS Session Resumption
+    - 10 % well-behaved DNS-over-HTTPS GET clients using TLS Session Resumption
+    - 5 % well-behaved DNS-over-TLS POST clients using TLS Session Resumption
+
+!!! note
+    You can find configuration files for presets in
+    [`config/`](https://gitlab.nic.cz/knot/shotgun/-/tree/master/config).  They
+    are an excellent starting point to create your own configurations.
--- a/docs/connection-chart.md
+++ b/docs/connection-chart.md
+# Connection Chart
+
+The connection chart can be used to visualize connection-related information,
+such as the number of active established connections, handshake attempts,
+successful TLS Session Resumptions or failed handshakes.
+
+```
+$ tools/plot-connections.py -k active -- DoT.json
+$ tools/plot-connections.py -k tcp_hs tls_resumed failed_hs -t "Handshakes over Time" DoT.json
+```
+
+The optional parameter `-k/--kind` can be used to select which data should be
+plotted. The following values are supported.
+
+- `active` means the number of currently active established connections
+- `tcp_hs` means the number of TCP handshake attempts in the last second
+- `failed_hs` means the number of failed handshakes. All kinds of connection
+  setup failures will be included, whether it's TCP handshake timeout, TLS
+  negotiation failure or anything else.
+- `tls_resumed` means the number of connection that were resumed with TLS
+  Session Resumption during the last second
+
+!!! tip
+    Using the `--` to separate a list of JSON files after specifying
+    `-k/--kind` might be needed in some cases.
+
+![connections](img/connections.png)
+![handshakes](img/handshakes.png)
--- a/docs/extracting-clients.md
+++ b/docs/extracting-clients.md
+# Extracting Clients
+
+Once you have the `filtered.pcap` with DNS queries from clients, you can
+process them into *pellets* - the pre-processed input files for DNS Shotgun.
+All the content of these files will be used during the replay stage - all
+clients for the entire duration of the file.
+
+The following example takes the entire `filtered.pcap` and transforms it into
+pellets. The pellets file will contain all the clients and it will have the
+same duration as the original file.
+
+```
+$ pcap/extract-clients.lua -r filtered.pcap -O $OUTPUT_DIR
+```
+
+The produced pellets file is ready to be used as the input for DNS Shotgun
+replay.
+
+## Splitting original capture into multiple pellets files
+
+It can be useful to have a long original capture file, which contains more
+clients and queries. However, since the pellets file will be replayed in its
+entirety, you may want to split the original file into multiple pellets files
+with shorter duration.
+
+For example, if your initial capture file is 30 minutes long and you could
+split it into fifteen two minute pellets files with the `-d/--duration` option.
+
+```
+$ pcap/extract-clients.lua -r filtered.pcap -O $OUTPUT_DIR -d 120
+```
+
+!!! tip
+    Is it useful to keep a collection of these original pellets files of same
+    duration. They can be later combined to create different test cases.
+
+## Scaling-up the traffic
+
+If you want to stress-test your infrastructure, you can combine these pellets
+files together to effectively scale-up the traffic. The pellets files are
+created in a way that you can simply use `mergecap` utility to combine them.
+
+```
+$ mergecap -w scaled.pcap $OUTPUT_DIR/*
+```
+
+## Limiting the traffic
+
+It is also possible to take a pellets file and scale-down its traffic. This is
+done on a per-client basis. Either client's entire query stream will be
+present, or the client won't be present at all.
+
+To limit the overall traffic, you can select the portion of the clients that
+should be included. This can range from 0 to 1. For example, let's suppose we
+want to scale-down the number of clients in the pellets file to 30 %.
+
+```
+$ pcap/limit-clients.lua -r pellets.pcap -w limited.pcap -l 0.3
+```
--- a/docs/img/clients.png
+++ b/docs/img/clients.png
--- a/docs/img/connections.png
+++ b/docs/img/connections.png
--- a/docs/img/handshakes.png
+++ b/docs/img/handshakes.png
--- a/docs/img/latency.png
+++ b/docs/img/latency.png
--- a/docs/img/response-rate.png
+++ b/docs/img/response-rate.png
--- a/docs/index.md
+++ b/docs/index.md
+# DNS Shotgun
+
+Realistic DNS benchmarking tool which supports multiple transport protocols:
+
+  - **DNS-over-TLS (DoT)**
+  - **DNS-over-HTTPS (DoH)**
+  - UDP
+  - TCP
+
+*DNS Shotgun is capable of simulating hundreds of thousands of DoT/DoH
+clients.*
+
+Every client establishes its own connection(s) when communicating over
+TCP-based protocol. This makes the tool uniquely suited for realistic DoT/DoH
+benchmarks since its traffic patterns are very similar to real clients.
+
+DNS Shotgun exports a number of statistics, such as query latencies, number of
+handshakes and connections, response rate, response codes etc. in JSON format.
+The toolchain also provides scripts that can plot these into readable charts.
+
+## Features
+
+- Supports DNS over UDP, TCP, TLS and HTTP/2
+- Allows mixed-protocol simultaneous benchmark/testing
+- Can bind to multiple source IP addresses
+- Customizable client behaviour (idle time, TLS versions, HTTP method, ...)
+- Replays captured queries over selected protocol(s) while keeping original timing
+- Suitable for high-performance realistic benchmarks
+- Tools to plot charts from output data to evaluate results
+
+## Caveats
+
+- Requires captured traffic from clients
+- Setup for proper benchmarks can be quite complex
+- Isn't suitable for testing with very low number of clients/queries
+- Backward compatibility between versions isn't kept
+
+## Code Repository
+
+[https://gitlab.nic.cz/knot/shotgun](https://gitlab.nic.cz/knot/shotgun)
--- a/docs/installation.md
+++ b/docs/installation.md
+# Installation
+
+There are two options for using DNS Shotgun. You can either install the
+dependencies and use the scripts from the repository directly, or use a
+pre-built docker image.
+
+## Using script directly
+
+You can use the toolchain scripts directly from the git repository. You need to
+ensure you have the required dependencies installed. Also make sure to check
+out some tagged version, as the development happens in master branch.
+
+```
+$ git clone https://gitlab.nic.cz/knot/shotgun.git
+$ git checkout v20210203
+```
+
+### Dependencies
+
+When using the scripts directly, the following dependencies are needed. If you
+only wish to process shotgun JSON output (e.g. plot charts), then dnsjit isn't
+required.
+
+- [dnsjit](https://github.com/DNS-OARC/dnsjit): Can be installed from [DNS-OARC
+  repositories](https://dev.dns-oarc.net/packages/).
+- Python 3.6 or later
+- Python dependencies from [requirements.txt](https://gitlab.nic.cz/knot/shotgun/-/blob/master/requirements.txt)
+- (optional) tshark/wireshark for some PCAP pre-processing
+
+## Docker Image
+
+Pre-built image can be obtained from [CZ.NIC DNS Shotgun
+Registry](https://gitlab.nic.cz/knot/shotgun/container_registry/65).
+
+```
+$ docker pull registry.nic.cz/knot/shotgun:v20210203
+```
+
+Alternately, you can build the image yourself from Dockerfile in the repository.
+
+### Docker Usage
+
+- Make sure to run with `--network host`.
+- Mount input/output directories and files with `-v/--volume`.
+- Using `--privileged` might slightly improve performance if you don't mind the security risk.
+
+```
+$ docker run \
+    --network host \
+    -v "$PWD:/mnt" \
+    registry.nic.cz/knot/shotgun:v20210203 \
+    $COMMAND
+```
--- a/docs/key-concepts.md
+++ b/docs/key-concepts.md
+# Key Concepts
+
+DNS Shotgun is capable of simulating real client behaviour by replaying
+captured traffic over selected protocol(s). The timing of original queries as
+well as their content is kept intact.
+
+Realistic high-performance benchmarking requires complex setup, especially for
+TCP-based protocols. However, the authors of this tool have successfully used it
+to benchmark and test various DNS implementations with up to hundreds of
+thousands of clients (meaning _connections_ for TCP-based transports) using
+commodity hardware. This requires [performance tuning](performance-tuning.md)
+that is described in later section.
+
+## Client
+
+These docs often mention "*client*" and we often use it to describe DNS
+infrastructure throughput in addition to queries per second (QPS). What is a
+considered a client and why does it matter?
+
+A client is the origin of one or more queries and it is supposed to represent a
+single device, i.e. anything from a CPE such as home/office router to a mobile
+device. Since traffic patterns of various devices can vary greatly, it is
+crucial to use traffic that most accurately represents your real clients.
+
+In plain DNS sent over UDP the concept of client doesn't matter, since UDP is a
+stateless protocol and a packet is just a packet. Thus, QPS throughput may be
+sufficient metric for UDP.
+
+In stateful DNS protocols, such as DoT, DoH or TCP, much of the overhead and
+performance cost is caused by establishing the connection over which queries
+are subsequently sent. Therefore, the concept of client becomes crucial for
+benchmarking stateful protocols.
+
+!!! note
+    As an extreme example, consider 10k QPS sent over a single DoH connection
+    versus establishing a 10k DoH connections, each with 1 QPS. While both
+    scenarios have the same overall QPS, the second one will consume vastly more
+    resources, especially when establishing the connections.
+
+### Client replay guarantees
+
+DNS Shotgun aims to provide the most realistic client behaviour when replaying
+the traffic. When you run DNS Shotgun, there are the following guarantees when
+using a stateful protocol.
+
+- **Multiple clients never share a single connection.**
+- **Each client attempts to establish at least one connection.**
+- **A client may have zero, one or more (rarely) active established connections
+  at any time**, depending on its traffic and behavior.
+
+## Real traffic
+
+A key focus of this toolchain is to make the benchmarks as realistic as
+possible. Therefore, no synthetic queries or clients are generated. To
+effectively use this tool, you need to have large amount of source PCAPs.
+Ideally, these contain the traffic from your own network.
+
+!!! note
+    In case you'd prefer to use synthetic client/queries anyway, you can just
+    generate the traffic and capture it in PCAP for further processing. Doing that
+    is outside of the scope of this documentation.
+
+### Traffic replay guarantees
+
+- **Content of DNS messages is left intact.** Messages without proper DNS header
+  or question section will be discarded.
+- **Timing of the DNS messages is kept as close to the original traffic as
+  possible.** If the tool detects time skew larger than one second, it aborts the
+  test. However, real time difference may be slightly longer due to various
+  buffers.
--- a/docs/latency-histogram.md
+++ b/docs/latency-histogram.md
+# Latency Histogram
+
+This very useful chart is a bit difficult to read and understand, but it
+provides a great deal of information about the overall latency from client side
+perspective. We use the logarithmic percentile histogram to display this data.
+[This
+article](https://blog.powerdns.com/2017/11/02/dns-performance-metrics-the-logarithmic-percentile-histogram/)
+provides an in-depth explanation about the chart and how to interpret it.
+
+```
+$ tools/plot-latency.py -t "DNS Latency Overhead" UDP.json TCP.json DoT.json DoH.json
+```
+
+![latency overhead](img/latency.png)
+
+The chart above illustrates why comparing just the response rate isn't a
+sufficient metric. For all protocols compared in this case, you'd get around
+99.5 % response rate. However, when you examine the client latency, you can see
+clear differences.
+
+In the chart, 80 % of all queries are represented by the rightmost part of the
+chart - between the "slowest percentile" of 20 % and 100 %. For these
+queries, the latency for UDP, TCP, DoT or DoH is the same, which is one
+round trip. These represent immediate answers from the resolver (e.g. cached or
+refused), which are sent either over UDP or over an already established
+connection (for stateful protocols). The latency is 10 ms, or 1 RTT.
+
+The most interesting part is between the 5 % and 20 % slowest percentile. For
+these 15 % of all queries, there are major differences between the latency of
+UDP, TCP and DoT/DoH. This illustrates the latency cost of setting up a
+connection where none is present. UDP is stateless and requires just 1 RTT. TCP
+requires an extra round trip to establish the connection and the latency for the
+client becomes 2 RTTs. Finally, both DoT and DoH require an additional round
+trip for the TLS handshake and thus the overall latency cost becomes 3 RTTs.
+
+The trailing 5 % of queries show no difference between protocols, since these
+are queries that aren't answered from cache and the delay is introduced by the
+communication between the resolver and the upstream servers. The last 0.5 % of
+queries aren't answered by the resolver within 2 seconds and are considered a
+timeout by the client.
--- a/docs/performance-tuning.md
+++ b/docs/performance-tuning.md
+# Performance Tuning
+
+Any high-performance benchmark setup requires separate server for generating
+traffic which then sends the traffic to the target server under test.  In order
+to scale-up DNS Shotgun to be able to perform well under heavy load, some
+performance tuning and network adjustments are needed.
+
+!!! tip
+    An example of performance tuning we use in our benchmarks can be found in
+    our [ansible
+    role](https://gitlab.nic.cz/knot/resolver-benchmarking/-/tree/master/roles/tuning).
+
+## Number of file descriptors
+
+Make sure the number of available file descriptors is sufficient. It's
+typically necessary when running DNS Shotgun from terminal. When using docker,
+the defaults are usually sufficient.
+
+```
+$ ulimit -n 1000000
+```
+
+## Ephemeral port range
+
+Extending the ephemeral port range gives the tool more outgoing ports to work with.
+
+```
+$ sysctl -w net.ipv4.ip_local_port_range="1025 60999"
+```
+
+## NIC queues
+
+High-end network cards typically has multiple queues. Ideally, you want to set
+their number to be the same as number of available CPUs.
+
+```
+$ ethtool -L $INTERFACE combined $NCPU
+```
+
+!!! note
+    It's important that the NIC interrupts from different queues are handled
+    by different CPUs. If there are throughput issues, you may want to verify
+    this is the case.
+
+## UDP
+
+DNS Shotgun can generate quite bursty traffic. Increasing the receiving
+server's socket memory can help to prevent that. If this buffer isn't
+sufficient, it can cause packet loss.
+
+```
+$ sysctl -w net.core.rmem_default="8192000"
+```
+
+## TCP, DoT, DoH
+
+Tuning the network stack for TCP isn't as straightforward and it's network-card
+specific. It's best to refer to [kernel
+documentation](https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/intel/ixgb.html#improving-performance)
+for your specific network card.
+
+## conntrack
+
+For our benchmarks, we don't use iptables or any firewall. Especially the
+`conntrack` module probably won't be able to handle serious load. Make sure the
+conntrack module isn't loaded by kernel if you're not using it.
--- a/docs/raw-output.md
+++ b/docs/raw-output.md
+# Raw Output
+
+In the output directory of DNS Shotgun's `replay.py` tool, the following
+structure is created. Let's assume we ran a configuration that configure two
+traffic senders - `DoT` and `DoH`.
+
+```
+$OUTDIR
+├── .config               # ignore this directory
+│   └── luaconfig.lua     # for debugging purposes only
+├── data                  # directory with raw JSON output
+│   ├── DoH               # "DoH" traffic sender data
+│   │   ├── DoH-01.json   # raw data from first thread of DoH traffic sender
+│   │   ├── DoH-02.json   # raw data from second thread of DoH traffic sender
+│   │   └── ...           # raw data from other threads of DoH traffic sender
+│   ├── DoH.json          # merged raw data from all DoH sender threads
+│   ├── DoT               # "DoT" traffic sender data
+│   │   ├── DoT-01.json   # raw data from first thread of DoT traffic sender
+│   │   ├── DoT-02.json   # raw data from second thread of DoT traffic sender
+│   │   └── ...           # raw data from other threads of DoT traffic sender
+│   └── DoT.json          # merged raw data from all DoT sender threads
+└── charts                # directory with automatically plotted charts (if configured)
+    ├── latency.svg       # chart comparing latency of DoT and DoH clients
+    └── response-rate.svg # chart comparing the response rate of DoT and DoH clients
+```
+
+## data directory
+
+This directory contains the raw JSON data. Since DNS Shotgun typically operates
+with multiple threads, the results for each traffic sender are also provided
+per each thread. However, since you typically don't care about the clients were
+emulated, but only about their aggregate behaviour, a data file that contains
+the combined results of all threads belonging to the configured traffic sender
+is also provided.
+
+Every configured traffic sender will have its own output directory of the same
+name. Inside, per-thread raw data are available. The aggregate file is directly
+in the `data/` directory as JSON file with the name of the configured traffic
+sender. The aggregate file is the one you typically want to use.
+
+!!! note
+    The raw JSON file is versioned and is not intended to be forward or
+    backward compatible with various DNS Shotgun versions. You should use the
+    same version of the toolchain for both replay and interpreting the data.
+
+!!! tip
+    If you wish to explore, format or interpret the raw JSON data,
+    [jq](https://stedolan.github.io/jq/) utility can be useful for some
+    rudimentary processing.
+
+## charts directory
+
+This directory may not be present if you didn't configure any charts to be
+automatically plotted in the configuration file. If it is available, it
+contains the plotted charts that are described in the following sections.
+
+When charts are plotted automatically, they always display data for all the
+configure traffic senders with their predefined names. If you wish to customize
+it, omit certain senders etc., you can use the plotting scripts
+directly from CLI. These can be found in the `tools/` directory and you can
+refer to their `--help` for usage.
--- a/docs/replaying-traffic.md
+++ b/docs/replaying-traffic.md
+# Replaying Traffic
+
+Once you've prepared the input pellets file with clients and either have you
+own configuration file or know which present you want to use, you can the the
+following scripts to run DNS Shotgun.
+
+```
+$ replay.py -r pellets.pcap -c udp -s ::1
+```
+
+!!! tip
+    Use the `--help` option to explore other options.
+
+During the replay, there is quite a bit of logging information that look like
+this.
+
+```
+UDP-01 notice: total processed:       267; answers:         0; discarded:         2; ongoing:       172
+```
+
+The important thing to look out for is the number of `discarded` packets. In
+case nearly all the packets are discarded or a large portion of them, it almost
+certainly indicates some improper setup or input data. The test should be
+aborted and the reason should be investigated. Increasing the `-v/--verbosity`
+level might help.
+
+## Binding to multiple source addresses
+
+When sending traffic against a single IP/port combination of the target server,
+the source IP address has a limited number of ports it can utilize.  A single
+IP address is insufficient to achieve hundreds of thousands of clients.
+
+
+DNS Shotgun can bind to multiple sources addresses with the `-b/--bind-net`
+option. You can specify either IP address or a newtork range using CIDR
+notation. Multiple values (either IPs, ranges or any combination of those) can
+be specified. When using CIDR notation, the network and broadcast address won't
+be used.
+
+
+```
+$ replay.py -r pellets.pcap -c tcp -s fd00:dead:beef::cafe -b fd00:dead:beef::/124
+```
+
+!!! tip
+    Our rule of thumb is to use at least one source IP address per every 30k
+    clients.  However, using more addresses is certainly better and can help to
+    avoid weird behaviour, slow performance and other issues that require
+    in-depth troubleshooting.
+
+!!! note
+    If you're limited by the number of source addresses you can use, utilizing
+    either IPv6 unique-local addresses (fd00::/8) or private IPv4 ranges could
+    be helpful.
+
+## Emulating link latency
+
+!!! warning
+    This is an advanced topic and emulating latency isn't necessary for many
+    scenarios.
+
+Overall latency will affect the user's experience with DNS resolution. It also
+becomes much more relevant when using TCP and TLS, since the handshakes
+introduce additional round trips. When benchmarks are done in the data center
+with two servers that are directly connected to each other with practically no
+latency, it can provide a skewed view of the expected end user latency.
+
+Luckily, the `netem` Network Emulator makes it very simple to emulate various
+network conditions. For example, emulating latency on the sender side can be
+done quite easily. The following command adds 10 ms latency to outgoing
+packets, effectively simulating RTT of 10 ms.
+
+```
+$ tc qdisc add dev $INTERFACE root netem limit 10000000 delay 10ms
+```
+
+!!! tip
+    For more possibilities, refer to `man netem.8`. Using a sufficiently large
+    buffer (limit) is essential for proper operation.
+
+However, beware that the settings affect the entire interface. If you're going
+to emulate latency, it's best if the resolver-client traffic is on a separate
+interface, so the resolver-upstream traffic isn't negatively impacted.
--- a/docs/response-rate-chart.md
+++ b/docs/response-rate-chart.md
+# Response Rate Chart
+
+This basic chart can display the overall response rate over time. It is also
+possible to plot specific error codes, such as `NOERROR`.
+
+```
+$ tools/plot-response-rate.py -r 0 -o rr.png UDP.json
+```
+
+!!! tip
+    The image format depends on the output filename extension chosen with can
+    `-o/--output`. `svg` is used by default, but other formats such as `png`
+    are supported as well.
+
+The following chart displays the answer rate and the rate of `NOERROR` answers.
+In this measurement, the resolver was started with a cold cache. We can see the
+overall response rate is close to 100 %. The `NOERROR` response rate slightly
+increases over time from 72 % to around 75 % as the cache warms up.
+
+![UDP response rate](img/response-rate.png)