|
|
|
# Capture data compression
|
|
|
|
|
|
|
|
## Data rates for various storage
|
|
|
|
|
|
|
|
Data source: `akuma.20150106.145000.018146`
|
|
|
|
|
|
|
|
| storage | raw | lz4 -0 | lz4 -4 | lz4 -9 | xz -6 |
|
|
|
|
| --- | --- | --- | --- | --- | --- | --- |
|
|
|
|
| pcap (req+resp/q) | 18132 (491/q) | 7889 (213/q) | 6514 (176/q) | 6386 (173/q) | 3093 (84/q) |
|
|
|
|
| pcap (requests only/q) | 3354 (91/q) | 1633 (44/q) | 1290 (35/q) | 1260 (34/q) | 873 (24/q) |
|
|
|
|
| csv (req+resp*/q) | 3468 (94/q) | 1668 (45/q) | 1224 (33/q) | 1120 (30/q) | 749 (20/q) |
|
|
|
|
| protobuf (req+resp*/q) | 2408 (65/q) | 1392 (38/q) | 1084 (29/q) | 944 (26/q) | 770 (21/q) |
|
|
|
|
|
|
|
|
Note: (req+resp*) has the following fields: `flags client_addr client_port id qname qtype qclass request_time_us request_flags response_time_us response_flags`
|
|
|
|
|
|
|
|
### Postgres with CSV data
|
|
|
|
|
|
|
|
(Hex 16-bit numbers inefficiently stored as TEXT)
|
|
|
|
|
|
|
|
* plain Postgres: 21% (3900kB, 107B/q)
|
|
|
|
* with cstore_fdw, no compression: 20.0% (3716kB, 102B/q)
|
|
|
|
* with cstore_fdw, 'pglz' compression: 6.0% (1120kB, 30B/q)
|
|
|
|
|
|
|
|
Commands:
|
|
|
|
```sql
|
|
|
|
\timing
|
|
|
|
CREATE EXTENSION cstore_fdw;
|
|
|
|
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
|
|
|
|
|
|
|
|
CREATE TABLE dns_queries ( ts TEXT, addr INET, n1 INTEGER, n2 TEXT, qname TEXT, q1 TEXT, q2 TEXT );
|
|
|
|
CREATE FOREIGN TABLE dns_queries_none ( ts TEXT, addr INET, n1 INTEGER, n2 TEXT, qname TEXT, q1 TEXT, q2 TEXT ) SERVER cstore_server OPTIONS(compression 'none');
|
|
|
|
CREATE FOREIGN TABLE dns_queries_pglz ( ts TEXT, addr INET, n1 INTEGER, n2 TEXT, qname TEXT, q1 TEXT, q2 TEXT ) SERVER cstore_server OPTIONS(compression 'pglz');
|
|
|
|
|
|
|
|
COPY dns_queries FROM '/home/gavento/nic/data-akuma/akuma.20150106.145000.018146-all.csv' WITH DELIMITER '|' CSV;
|
|
|
|
# Time: 130.634 ms
|
|
|
|
COPY dns_queries_none FROM '/home/gavento/nic/data-akuma/akuma.20150106.145000.018146-all.csv' WITH DELIMITER '|' CSV;
|
|
|
|
# Time: 241.682 ms
|
|
|
|
COPY dns_queries_pglz FROM '/home/gavento/nic/data-akuma/akuma.20150106.145000.018146-all.csv' WITH DELIMITER '|' CSV;
|
|
|
|
# Time: 385.194 ms
|
|
|
|
|
|
|
|
COPY (SELECT (count(*), min(ts), max(ts), qname) FROM dns_queries GROUP BY qname) TO '/dev/null' WITH BINARY;
|
|
|
|
# Time: 75-85 ms
|
|
|
|
COPY (SELECT (count(*), min(ts), max(ts), qname) FROM dns_queries_none GROUP BY qname) TO '/dev/null' WITH BINARY;
|
|
|
|
# Time: 80-90 ms
|
|
|
|
COPY (SELECT (count(*), min(ts), max(ts), qname) FROM dns_queries_pglz GROUP BY qname) TO '/dev/null' WITH BINARY;
|
|
|
|
# Time: 80-90 ms
|
|
|
|
|
|
|
|
DROP TABLE dns_queries;
|
|
|
|
DROP FOREIGN TABLE dns_queries_none;
|
|
|
|
DROP FOREIGN TABLE dns_queries_pglz;
|
|
|
|
|
|
|
|
DROP SERVER cstore_server;
|
|
|
|
DROP EXTENSION cstore_fdw;
|
|
|
|
```
|
|
|
|
|
|
|
|
## Pcap compression
|
|
|
|
|
|
|
|
Small data from akuma (`akuma.20150106.145000.018146`, cca 20MB). Speed not reliable and only for local scale: tested on X220 (i5 2.5GHz), measuring user-time, cached file, written to `wc` (not to disk).
|
|
|
|
|
|
|
|
| alg | compression | speed MB/s |
|
|
|
|
| --- | --- | --- |
|
|
|
|
| pcap | 100% | - |
|
|
|
|
| snappy | 46% | 25 |
|
|
|
|
| lz4 -1 | 43% | 23 |
|
|
|
|
| lz4 -9 | 35% | 13 |
|
|
|
|
| gz -1 | 41% | 12 |
|
|
|
|
| gz -9 | 37% | 5.3 |
|
|
|
|
| bzip2 -1 | 39% | 4.7 |
|
|
|
|
| bzip2 -9 | 29% | 4.6 |
|
|
|
|
| xz -1 | 22% | 7.5 |
|
|
|
|
| xz -6 | 17% | 1.6 |
|
|
|
|
| xz -9 | 16% | 1.6 |
|
|
|
|
# Numerical estimates (throughput, volumes, RAM, ...)
|
|
|
|
|
|
|
|
## DNS volume estimates
|
|
|
|
|
|
|
|
### DNS data from nic.cz:
|
|
|
|
* [CZ.NIC server stats](https://dsc.nic.cz/?window=86400&server=all&plot=qtype)
|
|
|
|
* Total requests: 20-25k q/s, [last month](https://dsc.nic.cz/?server=all&plot=qtype&window=2000480&binsize=60), [one peak](https://dsc.nic.cz/?plot=qtype&window=3600&binsize=60&server=all&end=1450795156)
|
|
|
|
* Requests per server: relatively balanced (500-2000 q/s, peak 3000 q/s)
|
|
|
|
* QNAME length: 10-20, avg seems to be 15, [last month](https://dsc.nic.cz/?window=2004800&server=all&plot=qtype_vs_qnamelen)
|
|
|
|
* QNAME entropy: EN aplhabet has [4-5 bits/char](http://people.seas.harvard.edu/~jones/cscie129/papers/stanford_info_paper/entropy_of_english_9.htm), CZ has cca 5 bits/char, DNS name entropy?
|
|
|
|
* Request packet length: 45-90 bytes (20 or 40 bytes IPv4/6 header, 4/20 bytes UDP/TCP header, 12 bytes DNS header, cca 10-20 bytes QNAME) [protocol headers ref](http://www.networksorcery.com/enp/)
|
|
|
|
* Reply packet length: 300-500 bytes, avg <400 [last month](https://dsc.nic.cz/?yaxis=percent&plot=rcode_vs_replylen&window=2004800&binsize=60&server=all)
|
|
|
|
* IPv4 vs IPv6: [currently](https://dsc.nic.cz/?server=all&binsize=60&plot=dns_ip_version_vs_qtype&window=604800) 82%/18%
|
|
|
|
* TCP vs UDP: [currently](https://dsc.nic.cz/?window=604800&yaxis=percent&binsize=60&server=all&plot=direction_vs_ipproto) <2% TCP, >98% UDP
|
|
|
|
* Stored info per packet: ~12 bytes without IP/port, at most 30 bytes total
|
|
|
|
* IP (4 or 16 bytes), port 2 bytes
|
|
|
|
* DNS header (2 bytes code, ~2 bytes flags, ~0 bytes counts)
|
|
|
|
* QNAME (15*4.5/8= 8 bytes compressed)
|
|
|
|
* Reply status (~0.5 bytes)
|
|
|
|
|
|
|
|
### Consider nic.cz with average load (recording IP addrs)
|
|
|
|
* One server: 2k q/s, 8Mbit/s to process (incl. replies), 60kB/s to store
|
|
|
|
* All servers: 20k q/s, 600kB/s to store, 160 GB/month
|
|
|
|
|
|
|
|
### Consider nic.cz with very high load (recording IP addrs)
|
|
|
|
* One server: 5k q/s, 20Mbit/s to process (incl. replies), 150kB/s to store
|
|
|
|
* All servers: 100k q/s, 3MB/s to store
|
|
|
|
|
|
|
|
### Consider 1 GBit request stream (attack, one server, ignore replies)
|
|
|
|
* Packet size min 56 bytes, therefore max 2M q/s
|
|
|
|
* With IP addrs and QNAME: 60 MB/s to store
|
|
|
|
* With QNAME: 30 MB/s to store
|
|
|
|
* Only basic ~5 bytes: 10 MB/s |