|
|
# Capture data compression
|
|
|
# Capture data compression and estimates
|
|
|
|
|
|
## Data rates for various storage
|
|
|
|
|
|
## Data compression libraries
|
|
|
|
|
|
Independent frames within a compressed file allow block random access to the compressed data.
|
|
|
|
|
|
Some compression is necessary, even fastest compression smoothes major differences between storage formats (PCAP, CSV, binary, ...).
|
|
|
|
|
|
### Fast streaming libs
|
|
|
* [lz4](https://github.com/Cyan4973/lz4/) - configurable, fast, with benchmark, supports independent frames (needs external index), file format and commandline tools
|
|
|
* [snappy](https://google.github.io/snappy/) - by Google, used in Hadoop and elsewhere, no file format
|
|
|
* [LZO](http://www.oberhumer.com/opensource/lzo/) - older, less active
|
|
|
|
|
|
### Other
|
|
|
* gzip - slower, also supports independent frames and embeddable frame index (needs special compressors), slower
|
|
|
* [xz](http://tukaani.org/xz/), [file format](http://tukaani.org/xz/format.html), [liblzma git](http://git.tukaani.org/?p=xz.git;a=tree;f=src/liblzma/api;hb=HEAD) - slow, best compression, independent frames and an embedded block index, xz format and cmdline tools
|
|
|
* for entropy encoding inspiration: [text compression comparison](http://mattmahoney.net/dc/text.html) - too slow for our purpose
|
|
|
|
|
|
## Experimental data rates for various storage
|
|
|
|
|
|
Data source: `akuma.20150106.145000.018146`
|
|
|
|
... | ... | @@ -53,26 +70,8 @@ DROP SERVER cstore_server; |
|
|
DROP EXTENSION cstore_fdw;
|
|
|
```
|
|
|
|
|
|
## Pcap compression
|
|
|
|
|
|
Small data from akuma (`akuma.20150106.145000.018146`, cca 20MB). Speed not reliable and only for local scale: tested on X220 (i5 2.5GHz), measuring user-time, cached file, written to `wc` (not to disk).
|
|
|
|
|
|
| alg | compression | speed MB/s |
|
|
|
| --- | --- | --- |
|
|
|
| pcap | 100% | - |
|
|
|
| snappy | 46% | 25 |
|
|
|
| lz4 -1 | 43% | 23 |
|
|
|
| lz4 -9 | 35% | 13 |
|
|
|
| gz -1 | 41% | 12 |
|
|
|
| gz -9 | 37% | 5.3 |
|
|
|
| bzip2 -1 | 39% | 4.7 |
|
|
|
| bzip2 -9 | 29% | 4.6 |
|
|
|
| xz -1 | 22% | 7.5 |
|
|
|
| xz -6 | 17% | 1.6 |
|
|
|
| xz -9 | 16% | 1.6 |
|
|
|
# Numerical estimates (throughput, volumes, RAM, ...)
|
|
|
|
|
|
## DNS volume estimates
|
|
|
|
|
|
## DNS volume estimates (older)
|
|
|
|
|
|
### DNS data from nic.cz:
|
|
|
* [CZ.NIC server stats](https://dsc.nic.cz/?window=86400&server=all&plot=qtype)
|
... | ... | |