Knot 1.6.0 forgets about .BIND zone after reload
Early this morning, one of our Knot 1.6.0 servers was reloaded as part of the nightly log rotation:
2014-10-31T03:50:57 info: remote control, received command 'reload'
2014-10-31T03:50:57 info: reloading configuration
...
...
2014-10-31T03:51:32 info: configuration reloaded
Immediately after this, it began giving REFUSED responses for CH/TXT/hostname.bind queries. Our Icinga monitoring uses this query to test if the name server is answering queries, and because of the REFUSED response, it generated an alert. My colleague got the alert early in the morning, and when he ran the same query by hand, he also got a REFUSED response. However, he did not reload or restart Knot, because he wanted to keep the state for debugging. Since the server was answering queries for all the loaded zones just fine, he did not deem this critical, and went back to sleep.
However, later this morning, our zone configuration changed, and so our configuration management issued another "reload" command.
2014-10-31T10:37:28 info: remote control, received command 'reload'
2014-10-31T10:37:28 info: reloading configuration
...
...
2014-10-31T10:37:29 info: configuration reloaded
At this point, Knot again began answering properly for CH/TXT/hostname.bind queries, and our Icinga monitoring sent us a recovery message. So now the server has recovered, and we don't have the error condition any more.
There seems to be some kind of bug, perhaps some kind of race or something else, that appears to have caused Knot to forget about the internal .BIND zone at reload, and then remembered it again at the next reload.