It's probably better to read just the total diff instead of the individual commits. Maybe it will be best to squash.
- Switchability: currently the tries are used through hat-trie API, and the implementations can be switched by a define.
- Portability: the code focuses on x86_64, also performance-wise; details are in code: 1 2. Running
-DNDEBUGon every platform should be enough to check for related problems.
- Licensing: I used copyright usual for the rest of knot. The original qp sources were almost all rewritten and they were under CC0 anyway.
- Simplicity: I'm not convinced qp-trie is significantly simpler than hat-trie if disregarding dependence on hash tables, but I tried hard to make the code understandable and to explain in comments a lot.
- Speed (UPDATED): I haven't done much benchmarking yet, as our infrastructure seems to be currently suffering from some new bottleneck anyway. Still, it seems that dnssec queries got a little faster and non-dnssec a little slower, i.e. it seems to move closer to NSD speed. (I think the name lookup speed could be improved significantly, but that would be for another time and lookup takes typically only 5–10% of total time anyway, IIRC.) The ddns test shows significant speedup on updating large zones, measuring roughly halved time in comparison to master (the parent commit). Startup time is shown unchanged (measured in the memory tests).
Memory (NEW & UPDATED): there seem to be huge savings if the zones are small – in the hosting-100k scenario the total RSS of knot drops to 28 % of original and 37 % with dnssec. I re-tried each in another round to guard against random mistakes, as I didn't know the tries themselves were taking such a large relative portion of all memory. However, with large zones the memory requirements seem to grow slightly by ~1–2 % with the current
malloc()implementation we use. (I tried with tcmalloc, causing memory usage to drop by ~ 15 % and the slight difference between QP and HAT gets reversed.)
- Testing (UPDATED):
./tests-extra/runtests.pyis OK on x86_64. Jenkins seems OK, and basic testing on a big-endian machine as well.