crash after AXFR

knot-2.7.2 crashed 2 times after an AXFR message. Context is quite terse, and the crashes are distinct, which (if confirmed) could suggest a memory issue. I don't have backtraces, and this specific server is a master, so I cannot gather info with gdb, for example.

First crash (note negative time after "finished"):

Sep 18 10:13:02 knot[2730]: info: refresh, outgoing, xxx@53: remote serial 2018091802, zone is outdated
Sep 18 10:13:02 knot[2730]: info: IXFR, incoming, xxx@53: receiving AXFR-style IXFR
Sep 18 10:13:02 knot[2730]: info: AXFR, incoming, xxx@53: starting
Sep 18 10:13:02 knot[2730]: info: AXFR, incoming, xxx@53: finished, -7717964.48 seconds, 1 messages, 320 bytes
Sep 18 10:13:02 knot[2730]: info: refresh, outgoing, xxx@53: zone updated, serial 2018091801 -> 2018091802
Sep 18 10:13:02 knot[2730]: info: zone file updated, serial 2018091801 -> 2018091802
Sep 18 10:13:02 kernel: [7717836.767581] traps: knotd[2751] general protection ip:7f2c3452a5a6 sp:7f2c0c963cb0 error:0 in liburcu.so.6.0.0[7f2c34527000+6000]

This one seems to be inside glibc's malloc, at 0x94f57:

   94f1b:       48 8b 44 24 08          mov    0x8(%rsp),%rax
   94f20:       48 39 05 a9 f3 34 00    cmp    %rax,0x34f3a9(%rip)        # 3e42d0 <mp_+0x50>
   94f27:       0f 86 ff fe ff ff       jbe    94e2c <__libc_malloc+0x6c>
   94f2d:       64 48 8b 4d 00          mov    %fs:0x0(%rbp),%rcx
   94f32:       48 85 c9                test   %rcx,%rcx
   94f35:       0f 84 f1 fe ff ff       je     94e2c <__libc_malloc+0x6c>
   94f3b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   94f40:       48 8d 34 c1             lea    (%rcx,%rax,8),%rsi
   94f44:       48 8b 56 40             mov    0x40(%rsi),%rdx
   94f48:       48 85 d2                test   %rdx,%rdx
   94f4b:       0f 84 db fe ff ff       je     94e2c <__libc_malloc+0x6c>
   94f51:       48 83 f8 3f             cmp    $0x3f,%rax
   94f55:       77 19                   ja     94f70 <__libc_malloc+0x1b0>
*  94f57:       48 8b 3a                mov    (%rdx),%rdi                 
   94f5a:       48 89 7e 40             mov    %rdi,0x40(%rsi)
   94f5e:       80 2c 01 01             subb   $0x1,(%rcx,%rax,1)
   94f62:       48 83 c4 18             add    $0x18,%rsp
   94f66:       48 89 d0                mov    %rdx,%rax
   94f69:       5b                      pop    %rbx
   94f6a:       5d                      pop    %rbp
   94f6b:       c3                      retq

Second crash:

Sep 18 16:38:53 knot[18092]: info: control, received command 'zone-retransfer'
Sep 18 16:38:54 knot[18092]: info: AXFR, incoming, xxx@53: starting
Sep 18 16:38:54 kernel: [7740988.047323] traps: knotd[18099] general protection ip:7fa190ec0f57 sp:7fa16a6fb290 error:0 in libc-2.27.so[7fa190e2c000+1e0000]

Just after this crash, I've noticed we didn't have symbols for uRCU, so I recompiled the exact same release with debugging symbols. While this is not absolutely certain to yield the same layout, in my experience, it reproduces well enough as long as all tools involved are the same (i.e., libs, compilers, etc).

So here is a possible location of the crash, in call_rcu_thread at 0x35a6:

    3558:       e8 53 e6 ff ff          callq  1bb0 <synchronize_rcu_memb@plt>
    355d:       48 8b 44 24 30          mov    0x30(%rsp),%rax
    3562:       48 85 c0                test   %rax,%rax
    3565:       0f 84 bd 00 00 00       je     3628 <call_rcu_thread+0x2a8>
    356b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    3570:       31 c0                   xor    %eax,%eax
    3572:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
    3577:       48 85 c9                test   %rcx,%rcx
    357a:       75 18                   jne    3594 <call_rcu_thread+0x214>
    357c:       83 c0 01                add    $0x1,%eax
    357f:       83 f8 09                cmp    $0x9,%eax
    3582:       0f 8f b8 00 00 00       jg     3640 <call_rcu_thread+0x2c0>
    3588:       f3 90                   pause  
    358a:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
    358f:       48 85 c9                test   %rcx,%rcx
    3592:       74 e8                   je     357c <call_rcu_thread+0x1fc>
    3594:       4c 8b 39                mov    (%rcx),%r15
    3597:       4d 85 ff                test   %r15,%r15
    359a:       0f 84 b0 01 00 00       je     3750 <call_rcu_thread+0x3d0>
    35a0:       45 31 e4                xor    %r12d,%r12d
    35a3:       48 89 cf                mov    %rcx,%rdi
 *  35a6:       ff 51 08                callq  *0x8(%rcx)
    35a9:       49 83 c4 01             add    $0x1,%r12
    35ad:       4d 85 ff                test   %r15,%r15
    35b0:       0f 84 ea fe ff ff       je     34a0 <call_rcu_thread+0x120>

Is there any more information I could provide? Knot is running on a Slackware64-14.2 with updates:

glibc-2.27
uRCU-0.10.1
lmdb-0.9.14
libmaxminddb-1.0.2
libedit-20170329_3.1
Linux kernel 4.15.15

Nothing special was used in Knot's configure; only directories were set to accomodate for Slackware's hierarchy.

Admin message

crash after AXFR