Why is IXFR so memory hungry and what to do with it
Useful notes
We could reproduce it if someone writes the test. There are zone snapshots on the knot-server.labs.nic.cz
in /data/selu-knot-debug
. What we need is to load up the zones, make ixfr-from-differences and let it run. We could track the memory consumption at the time.
Expectations
I have briefly checked at the IXFR finalization, somebody who knows it better should correct me here.
I have made a quick statistics on how much extra memory RRSIGs take on our cz. zone snapshot.
It appears that 43% extra memory is needed for RRSIGS are their NSEC3 records Rss(signed zone) - Rss(unsigned zone)
.
Note that RRSIGS are not nodes, so there is no other overhead than their RRSet, so the estimate is fairly accurate.
I did not count in NSEC3s of other RRs.
Presume we're have an incoming IXFR that resigns the zone, so all old signature are to be removed and new signatures are to be added.
So at the end of IXFR (when we applied changesets to zone, zones_store_and_apply_chgsets()
) we have in memory following:
- Old zone (base line, say
M
bytes required) - Zone deep copy (
M
bytes) - Changesets containing"
- RRSIGS+NSEC3s to be removed (
0.43*M
) - RRSIGS+NSEC3s to be added (
0.43*M
)
- RRSIGS+NSEC3s to be removed (
- Now if you look how
xfrin_apply_remove_{normal/rrsigs}
work:- Each unique RRSet about to be removed is duplicated, new copy is updated and old copy is put to freelist (
0.43*M
)
- Each unique RRSet about to be removed is duplicated, new copy is updated and old copy is put to freelist (
- Now if you look how
xfrin_apply_add_{normal/rrsigs}
work:- It is basically the same as remove. Nodes/RRSets that didn't exist before, they are inserted (no extra memory, except the node). If RRSet (or it's RRSIG) existed before, it is duplicated, put to freelist and the new copy is updated. Since this is a resign, I expect each RRSet to be signed before, NSEC3's are new. So let's say 1/2 of the RRSets is changed (
0.215*M
) - I don't count the memory use by new nodes, but it shouldn't be so significant.
- It is basically the same as remove. Nodes/RRSets that didn't exist before, they are inserted (no extra memory, except the node). If RRSet (or it's RRSIG) existed before, it is duplicated, put to freelist and the new copy is updated. Since this is a resign, I expect each RRSet to be signed before, NSEC3's are new. So let's say 1/2 of the RRSets is changed (
Grand total: 2*M+ 3*0.43*M + 0.215*M = 3.505*M
Even worse case would be if we changed the NSEC3 hashes to all names in the zone as well.
Okay, but why doesn't it lower after transfer?
Because, this is how malloc works for small allocations. Process have an address space which it can extend or shrink.
But, because the old zone was allocated earlier than the new zone and changesets, it likely resides on lower addresses than the newly allocated data. Now if we free the old zone, a hole of free memory in the middle of the address space is a result.
That memory can be reused for later allocations, but can't be freed to the system because higher (=newer) addresses are likely to be used. And because latest allocations are for the new zone, the memory acts like a watermark, rarely goes below the highest point. I have added malloc_trim()
in the 1.4.1. In theory, it should cut some of the excess memory on subsequent smaller IXFRs, because the new zone should be allocated from the large hole created by the previous transfer, but it isn't a wonder cure.
Things we could do
- Fake it till you make it - free the old zone and postpone answering somehow until the new zone is built
😿 - Use
mmap
-based mempool allocator for each zone.mmap
is good, because we can free it per pages. But, we can't free parts of the memory (old RDATA and whatever), we can only free the whole chunk at once.- See #187 (closed)
- Reduce the amount of RRSet copies in the IXFR, since we do a whole zone deep copy anyway (even if it is named as shallow)
- Do not keep all RRSets in the changeset in memory, or at least free it iteratively after each RRSet is processed.
- Something completely new?