Knot loses NOTIFY if it's in the middle of an AXFR
Hi Knot developers,
One of our Knot 1.6.2 instances alerted us, because one of the slave zones on it is behind the master server. The logs speak for themselves:
2015-03-20T13:07:48 info: [86.in-addr.arpa] NOTIFY, incoming, 193.0.0.198@51841: received serial 1426856859
2015-03-20T13:07:56 info: [86.in-addr.arpa] refresh, outgoing, 193.0.0.198@53: master has newer serial 1426847139 -> 1426856859
2015-03-20T13:07:56 notice: [86.in-addr.arpa] IXFR, incoming, 193.0.0.198@53: receiving AXFR-style IXFR
2015-03-20T13:07:56 info: [86.in-addr.arpa] AXFR, incoming, 193.0.0.198@53: starting
2015-03-20T13:07:57 info: [86.in-addr.arpa] NOTIFY, incoming, 193.0.0.198@56028: received serial 1426856890
2015-03-20T13:07:57 info: [86.in-addr.arpa] AXFR, incoming, 193.0.0.198@53: finished, serial 1426847139 -> 1426856859, 0.68 seconds, 93 messages, 4048258 bytes
2015-03-20T13:29:48 info: [86.in-addr.arpa] zone file updated, serial 1426847139 -> 1426856859
To summarise, Knot received a notify for serial 1426856859, and began an AXFR. While it was buy with this AXFR, it received another notify with a newer serial, 1426856890. I don't know what Knot did with that second notify, but it looks like it got lost instead of getting queued, because when it finished the transfer of serial 1426856859, it did not immediately try a second AXFR to update to the newer serial.
This is not a big deal because:
- The zone will refresh itself upon expiry of the SOA refresh timer; or
- the zone is updated and the master sends another notify
In fact, in this specific instance, Knot recovered from this by doing a refresh itself:
2015-03-20T14:07:57 info: [86.in-addr.arpa] refresh, outgoing, 193.0.0.198@53: master has newer serial 1426856859 -> 1426856890
2015-03-20T14:07:57 notice: [86.in-addr.arpa] IXFR, incoming, 193.0.0.198@53: receiving AXFR-style IXFR
2015-03-20T14:07:57 info: [86.in-addr.arpa] AXFR, incoming, 193.0.0.198@53: starting
2015-03-20T14:07:58 info: [86.in-addr.arpa] AXFR, incoming, 193.0.0.198@53: finished, serial 1426856859 -> 1426856890, 0.41 seconds, 93 messages, 4051937 bytes
This is of course a rare case where a zone gets updated quickly in succession, and results in a second notify while the slave is still busy with an AXFR.
However, if the notify was meant to be queued, but didn't, then it is probably a bug. If the notify was meant to be discarded, because Knot was busy with an AXFR, then it should probably log this. I prefer that the notify be queued, so that Knot can immediately refresh to the newer copy of the zone.