|
|
# Collector notes
|
|
|
|
|
|
* C99 with [libUCW](http://www.ucw.cz/libucw/doc/ucw/) and protobuf-c
|
|
|
* PLANNED: Time frames (cca 1-300s), soft rate-limiting
|
|
|
|
|
|
## Input
|
|
|
* Currently only PCAP
|
|
|
* Looking at libtrace and SOCK_RAW sockets
|
|
|
* Supports truncated packets (length checks in all the code)
|
|
|
|
|
|
## TCP/IP status and assumptions
|
|
|
* Accepts both IPv4 and IPv6
|
|
|
* Currently drops IPv6 with extra headers (TODO: skip them, detect fragmentation headers) (none encountered in `akuma` data)
|
|
|
* No IP fragment reconstruction
|
|
|
* Not planned (rather technical, separate for IPv4 and IPv6, ...)
|
|
|
* Opening a SOCK_RAW socket handles IP-reconstruction in kernel
|
|
|
* Should not happen too much anyway (very few requests have >100 bytes, very few responses have >1000 bytes)
|
|
|
* TCP is limited to (single request, single response) streams, TCP options accepted but ignored
|
|
|
* These short TCP connections seem to bee (almost?) all the cases in "akuma" data
|
|
|
* Find out: how many long TCP conns are there?
|
|
|
* PLANNED: TCP flow reconstruction, keeping open connections (currently ignores SYN/ACK/FIN)
|
|
|
* UDP fully suported
|
|
|
* Dropping all packets with data size mismatches etc.
|
|
|
|
|
|
## DNS status
|
|
|
* Dropping packets with `OPCODE != QUERY`
|
|
|
* Store some other opcode? (IQUERY is obsolete, STATUS?)
|
|
|
* Dropping packets with QNAME length above 254 (by RFC)`
|
|
|
* Only accepting packets with exactly 1 QNAME
|
|
|
* Dropping packets with "compressed" QNAME, see [RFC section](https://tools.ietf.org/html/rfc1035#section-4.1.4)
|
|
|
* Find out: are those still used?
|
|
|
* Dropping packets with the snapshot (captured part) ending before the entire DNS QNAME part (should not happen with reasonable snaplen)
|
|
|
* TODO NEXT: Actually match the queries and responses
|
|
|
|
|
|
## Output
|
|
|
|
|
|
* Modular, not dependent on protobufs (Can include CBOR or other if needed.)
|
|
|
* PLANNED: separate threads for:
|
|
|
* 1x packet collection, parsing, dumping and matching responses with requests (hash table)
|
|
|
* (1+)x time frame serialization and writing (file, socket or database)
|
|
|
|
|
|
### Protobuf
|
|
|
* Implemented a message for request+response pair writing (`dnsquery.proto`)
|
|
|
* PLANNED: Configurable which attributes are included
|
|
|
|
|
|
### Dumping dropped packets
|
|
|
* Configurable dump/drop by category
|
|
|
* PLANNED: Rotate pcap files with time frames
|
|
|
* PLANNED: Soft rate-limiting to prevent choking#### Dns collector design draft
|
|
|
|
|
|
* Author: Tomáš Gavenčiak, tomas.gavenciak@nic.cz
|
|
|
* Date: 1st Mar 2016
|
|
|
|
|
|
### Operation and main structures
|
|
|
|
|
|
## Struct collector
|
|
|
|
|
|
Main container for a collector instance (try toavoid global state).
|
|
|
|
|
|
# Has
|
|
|
* Configuration structure (given / loaded before init) (incl. outputs)
|
|
|
* Current and previous timeframe
|
|
|
* Queue of timeframes to write (thread safe) and writer thread(s)
|
|
|
* Basic stats on program run (time, packets collected/dropped)
|
|
|
|
|
|
# Setup
|
|
|
Gets a configuration struct, initializes self and opens a packet capture
|
|
|
(file list or live, applying capture length, promiscuous settings and BPF filters).
|
|
|
|
|
|
# Main thread operation
|
|
|
Main thread collects a packet from the input and parses its data (IP/UDP/DNS headers).
|
|
|
If the time is past the current timeframe, does a frame rotation (see below).
|
|
|
When the packet is invalid (malformed, unsupported network feature, ...) drop it and optionally
|
|
|
dump via some of the outputs.
|
|
|
|
|
|
# Fame rotation
|
|
|
The timeframes are cca 0.1-10 sec long time windows (configurable). Any response packet is matched to a
|
|
|
request packet in the current or te previous timeframe (so a response delayed up to the
|
|
|
frame length is always matched). When a packet beyond the current timeframe is read, the
|
|
|
frames are rotated: The previous timeframe is enqueued for writeout, the current timeframe
|
|
|
becomes the previous one and a new timeframe is created.
|
|
|
|
|
|
If the new timeframe is beyond the current period of an output file, this output is rotated as well before
|
|
|
writing the frame (see below).
|
|
|
|
|
|
If a packet arrives out of order (with time smaller than the previous packet, as in wrong ordering of PCAP files),
|
|
|
a warning is issued and it is processed as if it had the time of the last in-order packet.
|
|
|
|
|
|
# Writer thread
|
|
|
One or more writer threads picking up timeframes from the queue and writing their packets to the outputs.
|
|
|
Destroy the packets and timeframes afterwards. If a timeframe is the last one to use an output file, that file
|
|
|
is closed.
|
|
|
|
|
|
The timeframes have to be processed in the order of creation
|
|
|
|
|
|
# Current state
|
|
|
* The writeout is done in the same thread.
|
|
|
* Only one output file per configured output is open.
|
|
|
* Stats to keep track of are not finalised.
|
|
|
|
|
|
## Struct config
|
|
|
|
|
|
Holds collector configuration and configured inputs and outputs.
|
|
|
Configured via [libucw configuration system](http://www.ucw.cz/libucw/doc/ucw/conf.html).
|
|
|
|
|
|
## Struct timeframe
|
|
|
|
|
|
Structure for queries within a time window (cca 1-10 sec, configurable). Contains all requests within
|
|
|
that window, their matching responses within that or the next timeframe, and responses within this
|
|
|
timeframe without a matching request. This limits the writer threads to one (simpler situation and code
|
|
|
structure) or to one per configured output (giving each output a timeframe queue with read-only
|
|
|
timeframes, destroying each frame when processed by all outputs (which may require refcounting)).
|
|
|
|
|
|
The preferred direction seems to be one thread per putput, separating their different runtime requirements.
|
|
|
This way it may be more natural to drop timeframes only for "slow" outputs (e.g. PCAP) when their queue gets too long,
|
|
|
and not for "fast" ones (e.g. counting-only statistics).
|
|
|
|
|
|
Shared state (with locks) should be accessed only a few times per timeframe, not per packet.
|
|
|
|
|
|
# Has
|
|
|
* List of packets to write - possibly with rate-limiting per timeframe (linked list).
|
|
|
* List of dropped packets to dump - likely with rate-limiting per timeframe (linked list).
|
|
|
* Hash containing unmatched requests (by IPver, TCP/UDP, client/server port numbers, client/server IPs, DNS ID and QNAME)
|
|
|
* Possibly: a memory pool for all the packet data
|
|
|
|
|
|
# Query hash
|
|
|
The hash is a fixed-size table of configurable order. Rationale: rehashing could cause a lot of latency in the main thread.
|
|
|
A big enough hash for the upper limit of packets in the timeframe (hard limit or just estimated) takes about 3% of memory of the packets,
|
|
|
so a big enough table can be easily afforded within the expected memory usage.
|
|
|
|
|
|
The hash is a linked list of packets in each bucket (with the "next" ptr within the packet struct).
|
|
|
|
|
|
# Limiting memory use
|
|
|
The numer of requests (and unmatched responses) in the frame should be bounded by a configurable constant.
|
|
|
This should be a soft limit (e.g. packet should be dropped more frequently when approaching the limit).
|
|
|
When a request is accepted, its response should be always accepted.
|
|
|
|
|
|
**Question:** What to do with the (not dropped) responses to dropped requests?
|
|
|
|
|
|
**Rationale:** The packets in the timeframes take up most of collector memory. Since the memory use of a single packet
|
|
|
is bounded by the packet capture bound plus a fixed overhead, bounding the packet number per timeframe is an easy and
|
|
|
deterministic way to control memory usage (together with the number of timeframes).
|
|
|
|
|
|
**Alternatives:** Total packet count could better accomodate short-time bursts (spanning, say, 1-2 timeframes), but
|
|
|
keeping these numbers in sync between the threads adds complexity. Also, this behaviour is less predictable.
|
|
|
Another alternative is considering the total memory usage of the program. Not sure how technically viable and
|
|
|
reliable (what to measure? would such memory usage shrink on `free()`?), and might not be very predictable.
|
|
|
|
|
|
## Struct packet
|
|
|
|
|
|
Holds data about a sggle query packet. Uses libtrace to handle packet data management and dissection.
|
|
|
The DNS parsing is done by a simple header and QNAME label reading without compression. The remaining
|
|
|
parts of the DNS message (various RRs) are not parsed (until we figure out how would they be useful).
|
|
|
|
|
|
**Rationale:** The data in RRs can be quite big and complex to know in advance what all we might want to know.
|
|
|
The data from DNS header + QNAME + QType + QClass seem to carry enough information for statistics.
|
|
|
Replies should in principle be recomputable from requests.
|
|
|
If it is necessary to store all the infrmation, a full PCAP (in a separate process)
|
|
|
could be more appropriate.
|
|
|
|
|
|
# Has
|
|
|
* Raw packet data: timestamp, real length, capture lenght, packet data
|
|
|
* Adresses, ports, transport info
|
|
|
* DNS header data, qname as a printable string (dot notation)
|
|
|
* Request may have a matching response packet. In this case the response is owned by the request
|
|
|
* (Next packet in hash bucket, next packet in timeframe)
|
|
|
|
|
|
# Packet network features
|
|
|
Handles both IPv4 and IPv6, as well as UDP.
|
|
|
|
|
|
Does not currently handle packet defragmentation. This would be nontrivial to do right and to manage resources for
|
|
|
while fragmentation on the IP level is rare for DNS packets. Fragmented packets can be all dumped for later analysis.
|
|
|
**Alternative solution:** Capturing via Linux RAW `socket()` gives us IP-defragmented packets.
|
|
|
|
|
|
TCP flow could be reconstructed, but it seems less of a priority. Currently a one-data-packet TCP streams
|
|
|
(not counting SYN, ACK and FIN packets) are processed, longer streams are dropped. Longer packets and
|
|
|
long-open TCP connections seem to be uncommon.
|
|
|
|
|
|
## Stats
|
|
|
|
|
|
Very basic statistics for the collector (time, dropped/read packets, dropped frames), the timeframes (dropped/read packets),
|
|
|
the outputs (dropped/read packets, dropped timeframes, written items and bytes before/after compression).
|
|
|
Not clear what all to measure. Any DNS data statistics should be handled by an output plugin.
|
|
|
|
|
|
Currently partially implemented.
|
|
|
|
|
|
## Outputs
|
|
|
|
|
|
Each output type extends a basic output structure. This basic structure contains the current open file and filename
|
|
|
(or socket, etc.), time of opening, rotation period, compression settings, basic statistics (bytes written, frames dropped, ...)
|
|
|
and hooks for packet writing, dumping, file closing and opening.
|
|
|
|
|
|
Each output type (currently CSV, ProtoBuf, PCAP) extends this type with additional struct fields and sets the hooks
|
|
|
appropriately on config (configuration handled by libucw). The current fields are:
|
|
|
|
|
|
flags(IPv4/6,TCP/UDP) client-addr client-port server-addr server-port id qname qtype qclass request-time-us request-flags request-ans-rrs request-auth-rrs request-add-rrs request-length response-time-us response-flags response-ans-rrs response-auth-rrs response-add-rrs response-length
|
|
|
|
|
|
|
|
|
Every output has a pathname template with strftime() replacement. An output can be compressed on the fly (which saves
|
|
|
disk space and also write time). Fast compression (LZ4, ...) is preferred.
|
|
|
|
|
|
# Memory usage limits
|
|
|
The maximum length of the timeframe queue of every output should be bounded (and configurable).
|
|
|
When exceeded, oldest timeframe not being currently processed should be dropped.
|
|
|
Rationale: Together with timeframe size this predictably limits
|
|
|
total memory usage. Dropping data on lagging (e.g. IO-bound) outputs is preferable to dropping packets on input
|
|
|
and therefore missing them on fast (e.g. counting) outputs.
|
|
|
|
|
|
# Disk usage limits
|
|
|
Optional. When approaching a per-output-file size limit, softly introduce query skipping.
|
|
|
|
|
|
# CSV output
|
|
|
Optional header line, configurable separator, configurable field set.
|
|
|
Actually not much larger than Protocol Buffers when compressed (e.g. with just the very fast "lz4 -4": 33 B/query CSV, 29B/query ProtoBuf).
|
|
|
Most commonly accepted format. No quoting necessary with e.g. "|" delimiter.
|
|
|
|
|
|
# Protocol Buffer output
|
|
|
Similar to CSV, configurable field set, one length-prefixed (16 bits) protobuf message per query.
|
|
|
Library `protobuf-c` seems to use reflection when serialising rather than fully generated code (as it does in C++)
|
|
|
so the speed is not great (comparable to CSV?).
|
|
|
|
|
|
# PCAP
|
|
|
Currently only used for dropped packets. Should be rate-limited (with softly increasing drop-rate).
|
|
|
|
|
|
# Current state
|
|
|
Timeframes ready for output are processed immediatelly in the main thread (no output queue, no rate limiting).
|
|
|
|
|
|
## Inputs
|
|
|
|
|
|
The input is either a single interface, or a list of pcap files to be processed in the given order.
|
|
|
When reading pcap files, the "current" time follows the recorded times.
|
|
|
|
|
|
Multiple specified input interfaces (and nt just "all") would require multiple PCAPs (or traces) open, but libtrace
|
|
|
does not seem to support polling on multiple traces. Advanced setups can be obtained by listening to "all" interfaces
|
|
|
with kernel BPF filter.
|
|
|
|
|
|
Multiple reader threads are hard to support, as the access to the query hash would have to be somehow guarded.
|
|
|
Since the main congestion is expected to be at the outputs, this may not be problem. If required in the future,
|
|
|
can be a (very) advanced feature.
|
|
|
|
|
|
[Libtrace](http://research.wand.net.nz/software/libtrace.php) is preferred to tcpdump's PCAP for the larger
|
|
|
feature set, implemented header and layer skipping, larger set of inputs (including kernel ring buffers).
|
|
|
|
|
|
# Current state
|
|
|
libPcap is used to read pcap files, live capture could be implemented easily but a switch to libtrace is
|
|
|
expected (and will be as easily implemented there with additional benefits during parsing the layers).
|
|
|
|
|
|
### Configuration / options
|
|
|
|
|
|
Configuration is read by the [libucw configuration system](http://www.ucw.cz/libucw/doc/ucw/conf.html).
|
|
|
Configuration should allow setting predictable limits on memory usage and potentially disk usage.
|
|
|
CPU usage should be regulated by means of the OS (nice, cpulimit, cgroups).
|
|
|
|
|
|
Reloading is supported only via program restart. Optionally, the program could wait until the outputs are rotated
|
|
|
(or at least until timeframe rotation). Shortly after program start, unmatched responses should be ignored.
|
|
|
The amount of missed packets should not be significant relative to the frequency of such changes. Major
|
|
|
|
|
|
Supporting online reconfiguration would greatly complicate program complexity and potentially introduce bugs and memory leaks.
|
|
|
a potential exception could be the BPF filter string. What would be good use-cases or easily tunable parameters?
|
|
|
|
|
|
### Language and libraries
|
|
|
|
|
|
Language standart is C99. The proposed libraries are:
|
|
|
* [Libtrace](http://research.wand.net.nz/software/libtrace.php) for packet capture, dissection and dumping.
|
|
|
* [libUCW](http://www.ucw.cz/libucw/) for configuration parsing, logging, mempools (in the future?) and some data structures (currently doubly linked lists). Replacable but convinient.
|
|
|
* libLZ4, libgz, ... for online (de)compression of input pcaps and output files. Partially implemented separately, but also part of libtrace.
|
|
|
* protobuf-c for writing protocol buffers.
|
|
|
|
|
|
### Logging and reports
|
|
|
|
|
|
Currently using libucw logging system and configured via the same config file. Includes optional log file rotation.
|
|
|
A sub-logger for potentially frequent messages with rate-limiting is also configured by default.
|
|
|
|
|
|
Input and output statistics should be logged (e.g. on output file rotation).
|
|
|
Statistical outputs migh include some statistics. No other reporting mechanism is currently desined.
|
|
|
|
|
|
### Questions
|
|
|
|
|
|
* Libtrace vs libPCAP.
|
|
|
Currently: libPCAP. Tomas: in favor of libtrace.
|
|
|
|
|
|
* One thread per output vs one writer thread.
|
|
|
Currently: No threads (WIP). Tomas: in favor of one thread per output
|
|
|
|
|
|
* Runtime control and reconfiguration - how much control is desired and useful? How to implement it?
|
|
|
Currently: No runtime control.
|
|
|
|
|
|
* Which output modules to support? CSV, Protobuf, counting stats (DSC-like?), CBOR, ...
|
|
|
|
|
|
|
|
|
|