WIP: Querying the aggregator (!6) · Merge requests · Turris / pakon-heavy-aggregator

Ghost User requested to merge query into master May 17, 2017

This branch does two main things:

It stores the last hour of history in memory (storing on disk will be done in some future branch).
Allows for querying the stored history.

Few notes, however. The querying is done in a manner that'll allow it to be generalized for the on-disk storage as well. For that, some traits are introduced and they may seem a bit unnecessary at the moment ‒ they'll be more useful when we'll be querying multiple different storages.

Also, the querying is not by any means complete ‒ there's a TODO listing the missing parts, but in short, there'll be more information in the output in the future and it'll likely be formatted differently. Also, I don't want to list all the header fields by default unless asked for, but that'll also be done in the future. However, I'd like to get it reviewed and merged, because the branch is already a bit large and having some querying capability is a reasonable checkpoint.

If you want to try it, run it with (obviously, after updating the paths):

cargo run -- -d socket -u ~/work/pakon/guts/socket

After it gathers some history it is possible to connect to socket and send queries like these (and no, they are not documented and they may change a bit as well):

{"jsonrpc":"2.0","method":"query","params":{"query":{"filter":[{"direction":["OUT"]}]}},"id":42}

or:

{"jsonrpc":"2.0","method":"query","params":{"query":{"filter":[{"direction":["OUT"]}],"aggregate":[{"remote":"ip"}]}},"id":42}

One thing that may be surprising is how the aggregation works. The aggregate field specifies which „columns“ (which is a bit fuzzy term here, unlike SQL tables) are considered ‒ there's an entry in the result for every different tuple of these columns. But unlike in SQL, aggregation happens even when aggregate is not present ‒ with empty set of columns. Therefore, if no aggregate is present, it produces a single entry with everything aggregated together. This is mostly a side effect of how it is implemented, but I'd like to keep it that way ‒ for one, changing it would require non-trivial work without any obvious advantage, but also I think if someone doesn't request specific information, I don't think we should spam with too much of it. This way it can be used to list high-level information (eg. what IP addresses anyone communicated to), but dumping the whole database (which would be the SQL-like way) seems a bit useless (and verbose).

Admin message

Admin message

WIP: Querying the aggregator

Merge request reports