|
|
|
# Technical tools/libs choices
|
|
|
|
|
|
|
|
|
|
|
|
## Parsing DNS messages in C
|
|
|
|
|
|
|
|
### libknot
|
|
|
|
* [knot gitlab](https://gitlab.labs.nic.cz/labs/knot), [packet API header](https://gitlab.labs.nic.cz/labs/knot/blob/master/src/libknot/packet/pkt.h)
|
|
|
|
* dns lib for knot server, incl. DNSSEC, active devel, uses libUCW
|
|
|
|
* docs only locally: `git clone ...; cd ...; doxygen`
|
|
|
|
* [as used in knot-resolver](https://github.com/CZ-NIC/knot-resolver/blob/master/daemon/worker.c)
|
|
|
|
|
|
|
|
### ldns
|
|
|
|
* [ldns website](http://nlnetlabs.nl/projects/ldns/), [git](http://git.nlnetlabs.nl/ldns/), [docs](http://nlnetlabs.nl/projects/ldns/doc/)
|
|
|
|
* dns packet manipulation lib, incl. DNSSEC, active devel
|
|
|
|
* [parser usage in dnstap-ldns](https://github.com/dnstap/dnstap-ldns/blob/master/host2str.c)
|
|
|
|
|
|
|
|
### others
|
|
|
|
* wdns, resolver libs, ... - mostly very simple parsing API
|
|
|
|
* wireshark - based on ASN.1 grammar descriptions, probably not fast (also, ASN.1 is kinda ugly)
|
|
|
|
|
|
|
|
|
|
|
|
## Data serialization
|
|
|
|
|
|
|
|
What to consider:
|
|
|
|
* Speed
|
|
|
|
* Compact (beware: wasteful "string" field names in encoded structs)
|
|
|
|
* Stable in C (needs good implementation!), Java, then JS, C++, Python
|
|
|
|
* Accepted by the community, tools
|
|
|
|
* Dynamic vs static typing (schema-less and JSON-like harder to read with static languages (C, C++, Java))
|
|
|
|
|
|
|
|
(Compactness numbers from [here](https://github.com/eishay/jvm-serializers/wiki) )
|
|
|
|
|
|
|
|
### CBOR
|
|
|
|
[Concise Binary Object Representation web](http://cbor.io/), [RFC 7094](http://tools.ietf.org/html/rfc7049)
|
|
|
|
* type: binary JSON (with extras), no schema, dynamic types
|
|
|
|
* Clib: [libcbor](http://libcbor.org/) nice, streaming, simple refcounting for cleanup, (allocates a lot?)
|
|
|
|
* Clib: [tinycbor](https://github.com/01org/tinycbor) no streaming, a bit too simple
|
|
|
|
* plus: versatile format (streaming arrays, tags)
|
|
|
|
* minus: field names in structs as literal strings (wasteful)
|
|
|
|
* speed: probably ok
|
|
|
|
* compact: bad, ok (386) with numbered attributes
|
|
|
|
|
|
|
|
### Protobuf
|
|
|
|
[Protocol buffers](https://developers.google.com/protocol-buffers/?hl=en)
|
|
|
|
* type: schema and code generators, static types
|
|
|
|
* Clib: [protobuf-c](https://github.com/protobuf-c/protobuf-c) - TODO: test (API/speed)
|
|
|
|
* Clib: [protobluff](https://github.com/squidfunk/protobluff) - TODO: test (API/speed)
|
|
|
|
* minus: only contrib C libraries
|
|
|
|
* plus: streaming arrays/maps (via repeated fields)
|
|
|
|
* plus: C/C++/Java struct/class generation (typewise-easy parsing)
|
|
|
|
* plus: used by [dnstap DNS logs](http://dnstap.info/)
|
|
|
|
* speed: probably good
|
|
|
|
* compact: great (239)
|
|
|
|
|
|
|
|
### BSON
|
|
|
|
[Binary JSON web](http://bsonspec.org/)
|
|
|
|
* type: binary JSON, no schema
|
|
|
|
* Clib: [mongo libbson](https://github.com/mongodb/libbson) - nice, own allocation, easier construction ([BCON](http://api.mongodb.org/c/0.7.1/bcon.html))
|
|
|
|
* plus: arrays/maps streaming (unknown array size in advance)
|
|
|
|
* minus: field names in structs as strings (wasteful)
|
|
|
|
* plus: used natively in MongoDB ([nice C api](http://api.mongodb.org/c/0.7.1/index.html))
|
|
|
|
* mongoDB: max 16MB BSON chunks, [GridFS](https://docs.mongodb.org/manual/core/gridfs/) for more (fragmented, ...)
|
|
|
|
* speed: probably ok
|
|
|
|
* compact: bad (495), better (?) with numbered attributes
|
|
|
|
|
|
|
|
### MessagePack
|
|
|
|
[MessagePack web](http://msgpack.org/index.html)
|
|
|
|
* type: binary JSON, no schema, dynamic
|
|
|
|
* Clib: [msgpack-c](https://github.com/msgpack/msgpack-c), [C docs](https://github.com/msgpack/msgpack-c/wiki/v1_1_c_overview) - parses only whole trees (problems with bigger data?), mempools, refcounts
|
|
|
|
* minus: no streaming of maps/arrays (size known in advance)
|
|
|
|
* minus: field names in structs as literal strings (wasteful)
|
|
|
|
* speed: probably ok
|
|
|
|
* compact: bad, great (233) with numbered attributes
|
|
|
|
|
|
|
|
### Thrift
|
|
|
|
[Thrift](https://thrift.apache.org/)
|
|
|
|
* type: schema and code generators, static types
|
|
|
|
* Clib: [c_glib](https://github.com/apache/thrift/tree/master/lib/c_glib) - Glib based!
|
|
|
|
* minus: bad C library support (only GLib!!!) (bad docs)
|
|
|
|
* minus: RPC focus ...
|
|
|
|
* speed: probably ok
|
|
|
|
* compact: ok (349)
|
|
|
|
|
|
|
|
### Cap'n'proto
|
|
|
|
[Cap'n'proto](https://capnproto.org/)
|
|
|
|
* type: schema and code generators, static types
|
|
|
|
* Clib: [c-capnproto](https://github.com/jmckaskill/c-capnproto) - only writes!
|
|
|
|
* minus: bad C library support (only writing)
|
|
|
|
* speed: probably great
|
|
|
|
* compact: probably very bad
|
|
|
|
|
|
|
|
|
|
|
|
## Data compression
|
|
|
|
* fast stream: [snappy](https://google.github.io/snappy/) (100+ MB/s, used in hadoop and elsewhere)
|
|
|
|
* for entropy encoding inspiration: [text compression comparison](http://mattmahoney.net/dc/text.html) - too slow for our purpose
|
|
|
|
* good encoding (ProtoBuf etc.) will make compression harder and less needed
|
|
|
|
|
|
|
|
## Network packet capture
|
|
|
|
|
|
|
|
### Mirror UDP stream to/from port 53 (both dirs) to some port of collector
|
|
|
|
* plus: UDP defrag and header parsing done by kernel, can use recvmsg, libuv or similar
|
|
|
|
* minus: UDP only
|
|
|
|
|
|
|
|
### TcpTrace
|
|
|
|
* [web](http://www.tcptrace.org/download.html) - last release in 2003
|
|
|
|
* can reconstruct Tcp stream from pcap
|
|
|
|
* minus: old, ugly code base, unmaintained, [csv ported to github](https://github.com/blitz/tcptrace)
|
|
|
|
* miuns: program-like structure, global variables ...
|
|
|
|
|
|
|
|
### Wireshark/EPAN
|
|
|
|
* glib-based library, core of wireshark, [doxygen](https://www.wireshark.org/docs/wsar_html/epan/index.html)
|
|
|
|
* ASN1 and analysis based - what about speed?
|
|
|
|
* minus: glib
|
|
|
|
|
|
|
|
### libNIDS
|
|
|
|
* [web](http://libnids.sourceforge.net/), [github import](https://github.com/korczis/libnids)
|
|
|
|
* IP defrag and TCP reassembly lib
|
|
|
|
* minus: last dev in 2010, created in 2003 based on linux kernel 2.0.x
|
|
|
|
* minus: IPv4 only
|
|
|
|
|
|
|
|
### libNtoH
|
|
|
|
* [github](https://github.com/sch3m4/libntoh/)
|
|
|
|
* IP4+6 and TCP reassembly, quite new, stability?
|
|
|
|
|
|
|
|
### DPDK
|
|
|
|
* [web](http://dpdk.org/), libs (incl. low-level) for fast packet processing
|
|
|
|
* [ip fragment reassembly](http://dpdk.org/doc/guides/prog_guide/ip_fragment_reassembly_lib.html) - both IPv4 and IPv6
|
|
|
|
|
|
|
|
### standard netinet headers
|
|
|
|
* `netinet/in.h`, `netinet/ip.h`, `netinet/ip6.h` - manual header matching, partial (best effort) defragmentation
|
|
|
|
|
|
|
|
|
|
|
|
# Existing tools, libs and formats
|
|
|
|
|
|
|
|
A comparison of some similarly aimed projects.
|
|
|
|
|
|
|
|
## DnsTap (format+tools)
|
|
|
|
* [DnsTap web](http://dnstap.info/) by Robert Edmonds from [farsightsec](https://github.com/farsightsec)
|
|
|
|
* Capture within DNS server process (impl. for Unbounded, Knot)
|
|
|
|
* Logging with a [dnstap ProtoBuf](https://github.com/dnstap/dnstap.pb/blob/master/dnstap.proto)
|
|
|
|
* larger messages (not optimized for size, cca 50+ bytes vs 20 possible otherwise, would compression help?)
|
|
|
|
* seems to store compatible information, but only raw DNS message
|
|
|
|
* does not specify storage (either frame stream (below) or just ProtoBuf messages)
|
|
|
|
* implementation using:
|
|
|
|
* frame stream [fstrm](https://github.com/farsightsec/fstrm) for reliable frame dropping under load
|
|
|
|
* DNS parser [ldns](http://www.nlnetlabs.nl/projects/ldns/), [parser usage in dnstap-ldns](https://github.com/dnstap/dnstap-ldns/blob/master/host2str.c)
|
|
|
|
|
|
|
|
## DNSCap (tool)
|
|
|
|
* [dnscap web](https://www.dns-oarc.net/tools/dnscap), [dnscap git](https://github.com/verisign/dnscap)
|
|
|
|
* captures DNS packets (query+response), output in pcap, basic filtering options
|
|
|
|
* no query/response matching
|
|
|
|
* inspiration for simple pcap/parsing?
|
|
|
|
* does not do defragmentation / tcp stream reconstruction (see [source comments](https://github.com/verisign/dnscap/blob/3f3468f0c9ed7d2d554a23813b769b9b2924eaf1/dnscap.c#L1797))
|
|
|
|
|
|
|
|
## DNS Stats Collector (tool)
|
|
|
|
* [dsc git](https://github.com/DNS-OARC/dsc)
|
|
|
|
* captures packets, basic DNS parsing, counting stats, plus some XML and graph presentation
|
|
|
|
* inspiration for simple pcap/parsing?
|
|
|
|
|
|
|
|
## DNSTable, nmsg etc. from Farsightsec (format/lib/tool group)
|
|
|
|
* [dnstable](https://github.com/farsightsec/dnstable) - file based tables for DNS domain information (not queries)
|
|
|
|
* not directly useful, but very fast indexed storage of DNS records
|
|
|
|
* [blog entry on storage](https://www.farsightsecurity.com/Blog/20151028-ziegast-realtime-dnsdb/)
|
|
|
|
* [nmsg](https://github.com/farsightsec/nmsg) - format/library for storing various network message types (based on protobufs)
|
|
|
|
* common container for various data types, some tools to transfer, merge, convert, compress, ... data
|
|
|
|
* based on ProtoBufs, basic protocols such as: [DNS record](https://github.com/farsightsec/nmsg/blob/master/nmsg/base/dns.proto), [DNS query](https://github.com/farsightsec/nmsg/blob/master/nmsg/base/dnsqr.proto), HTTP, email, generic packet, ...
|
|
|
|
* blog posts: [intro](https://www.farsightsecurity.com/Blog/20150128-nmsg-intro/), [nmsgtool](https://www.farsightsecurity.com/Blog/20150204-mschiffm-nmsg-nmsgtool/), [format](https://www.farsightsecurity.com/Blog/20150211-mschiffm-nmsg-internals/)
|
|
|
|
* their DNS query ProtoBuf does not quite match our needs, we could add our own
|
|
|
|
* the data storage is inefficient (extra information cca 30 bytes/packet)
|
|
|
|
* could nmsgtool or libnmsg be useful? probably not if we store captured data in some DB
|
|
|
|
* [ncap](https://www.dns-oarc.net/tools/ncap) - obsoloete DNS-only capture format, non-extensible
|
|
|
|
|
|
|
|
## Zendesk DDoS detection (solution)
|
|
|
|
* [Slides from RIPE](https://ripe71.ripe.net/presentations/42-zendesk-ddos.pdf) (info from Jan and Petr)
|
|
|
|
* metrics based solution - no DNS inspection, much lower data flows
|
|
|
|
* based on: [FastNetMon](https://github.com/pavel-odintsov/fastnetmon) (metrics from traffic), [InfluxDB](https://influxdata.com/time-series-platform/influxdb/) (time-series data DB), [Morgoth](http://docs.morgoth.io/) (time-series anomaly detection)
|
|
|
|
* Morgoth as an inspiration for anomaly detection?
|
|
|
|
* ["lossy event counting" algorithm](http://docs.morgoth.io/docs/concepts/detection_framework/)
|
|
|
|
* Idea of "exceptional fingeprints" (time window aggregations/statistics compared to previously seen windows)
|
|
|
|
* written in Go, not very well documented
|
|
|
|
|
|
|
|
## DNS packet deduplication (from Jan Vcelak)
|
|
|
|
* [gitlab](https://gitlab.labs.nic.cz/knot/smoke-tests)
|
|
|
|
* simple pcap (TCP+UDP) parsing - but no defragmentation
|
|
|
|
* deduplication via nice HAT-trie (inspiration?)# Stores for captured and prefiltered data
|
|
|
|
|
|
|
|
*Tomas:* I am looking for (prefiltered) packet data stores with good speed and C api - may be very different from the main processing databases.
|
|
|
|
|
|
|
|
* A NoSQL comparison (Cassandra, Couchbase, HBase, MongoDB; but only 100 byte packets): [PDF](http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf)
|
|
|
|
|
|
|
|
## LMDB
|
|
|
|
* [lmdb web](http://symas.com/mdb/), [docs](http://symas.com/mdb/doc/), [github](https://github.com/LMDB/lmdb)
|
|
|
|
* mmaped DB, key-value (unique keys by default), very fast
|
|
|
|
* idea/question: use for request/response matching? bad: value rewrites fast, but accumulate on-disk size :(
|
|
|
|
* what would be useful (unique?) keys?
|
|
|
|
* plus: indexable individual packets
|
|
|
|
* minus: not compressed
|
|
|
|
|
|
|
|
## Protobuf file
|
|
|
|
* Protobuf with metainfo and stream of query PBs
|
|
|
|
* small overhead (2 bytes/message), super fast :)
|
|
|
|
* plus: optionally compress (snappy, lzo, xz, ...)
|
|
|
|
* minus: sequential - messages not indexable |
|
|
|
\ No newline at end of file |