A3720 CPU random crashes
Random reboots observed at some Mox devices.
This is output from Mox AGFED on Turris OS 6.0 crashing approximately once every 30 minutes after installing most of the available package lists (see the attached reForis Packages screen).
Intentionally causing full CPU load or frequent switching of CPU multipliers (in between 200 or 500 and 1000 MHz) doesn't seem to trigger these reboots (as opposed to mentioned software operating normally without any other significant load).
root@turris:/# [ 1846.232423] Internal error: synchronous parity or ECC error: 96000018 [#1] SMP
[ 1846.239904] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat ath9k xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_natk
[ 1846.240214] ebt_limit ebt_among ebt_802_3 crc_ccitt btusb btmrvl_sdio btmrvl btintel br_netfilter bnep bluetooth ath9k_hw ath10k_pci ath10k_core ath crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcinh
[ 1846.330235] raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx raid10 raid1 raid0 linear md_mod nls_utf8 nls_koi8_r nls_cp1255 nls_iso8859_6 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_iso8859_1 nls_cp932 r
[ 1846.481740] CPU: 1 PID: 12247 Comm: kworker/1:2 Not tainted 5.15.63 #0
[ 1846.488477] Hardware name: CZ.NIC Turris Mox Board (DT)
[ 1846.493873] Workqueue: 0x0 (events)
[ 1846.497570] pstate: a04000c5 (NzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1846.504752] pc : update_curr+0xd4/0x120
[ 1846.508713] lr : update_curr+0xcc/0x120
[ 1846.512668] sp : ffffffc00bd5ba40
[ 1846.516082] x29: ffffffc00bd5ba40 x28: 0000000000000001 x27: 0000000000000000
[ 1846.523449] x26: 0000000000000006 x25: 0000000000000009 x24: 0000000000000000
[ 1846.530815] x23: ffffff803fdbcac0 x22: ffffff8001bf0cc0 x21: 000000000004a380
[ 1846.538181] x20: ffffff803fdbcac0 x19: ffffff8001bf0d80 x18: 00000000506d864d
[ 1846.545548] x17: 000000000e5a0fd8 x16: 000000004f2566db x15: 00000000e625934d
[ 1846.552915] x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000000
[ 1846.560281] x11: ffffffc008b904b0 x10: ffffff803fdc8668 x9 : ffffff80000f5100
[ 1846.567648] x8 : 0000000000000001 x7 : 0000000000000000 x6 : ffffffc008c78b10
[ 1846.575015] x5 : 0000000000000000 x4 : ffffffc037159000 x3 : 000000154ffb2840
[ 1846.582382] x2 : ffffff80030c2a38 x1 : ffffffc037169000 x0 : ffffff80030c2940
[ 1846.589750] Call trace:
[ 1846.592269] update_curr+0xd4/0x120
[ 1846.595865] dequeue_entity+0x24/0x5c8
[ 1846.599731] dequeue_task_fair+0x8c/0x5f8
[ 1846.603867] deactivate_task+0x50/0x68
[ 1846.607736] load_balance+0x3c4/0x9e8
[ 1846.611513] newidle_balance.isra.156+0x1fc/0x368
[ 1846.616365] pick_next_task_fair+0x48/0x318
[ 1846.620679] __schedule+0x10c/0x660
[ 1846.624279] schedule+0x58/0xc0
[ 1846.627518] worker_thread+0xe4/0x4e0
[ 1846.631296] kthread+0x11c/0x128
[ 1846.634629] ret_from_fork+0x10/0x20
[ 1846.638322] Code: 9401aaf5 9400c840 f942ce60 9103e002 (b9415801)
[ 1846.644609] ---[ end trace 9929ad1c16fffc48 ]---
[ 1846.649369] Kernel panic - not syncing: synchronous parity or ECC error: Fatal exception
[ 1846.657713] SMP: stopping secondary CPUs
[ 1847.701754] SMP: failed to stop secondary CPUs 0-1
[ 1847.706695] Kernel Offset: disabled
[ 1847.710287] CPU features: 0x00000000,20000802
[ 1847.714779] Memory Limit: none
[ 1847.717926] Rebooting in 3 seconds..
[ 1850.721756] SMP: stopping secondary CPUs
[ 1851.765797] SMP: failed to stop secondary CPUs 0-1
Another crash output:
[ 1416.417978] Internal error: synchronous parity or ECC error: 86000018 [#1] SMP
[ 1416.425459] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat ath9k xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_natk
[ 1416.425778] ebt_limit ebt_among ebt_802_3 crc_ccitt btusb btmrvl_sdio btmrvl btintel br_netfilter bnep bluetooth ath9k_hw ath10k_pci ath10k_core ath crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcin4
[ 1416.515802] dns_resolver multipath raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx raid10 raid1 raid0 linear md_mod nls_utf8 nls_koi8_r nls_cp1255 nls_iso8859_6 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nr
[ 1416.668843] CPU: 0 PID: 11 Comm: ksoftirqd/0 Not tainted 5.15.64 #0
[ 1416.675310] Hardware name: CZ.NIC Turris Mox Board (DT)
[ 1416.680698] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1416.687880] pc : ath10k_htt_rx_pktlog_completion_handler+0x1048/0x2e88 [ath10k_core]
[ 1416.695897] lr : ath10k_htt_txrx_compl_task+0x68/0x1158 [ath10k_core]
[ 1416.702554] sp : ffffffc008e7ba80
[ 1416.705967] x29: ffffffc008e7ba80 x28: ffffff8004de27c0 x27: ffffffc008c86000
[ 1416.713335] x26: ffffffc008b91cb0 x25: 000000010001b40f x24: ffffffc037149000
[ 1416.720701] x23: ffffff8004de3cb8 x22: 0000000000000040 x21: 0000000000000040
[ 1416.728068] x20: 0000000000000000 x19: 0000000000000040 x18: 0000000000000000
[ 1416.735433] x17: 0000000000000000 x16: 0000000000000000 x15: fffa33d179b00cbf
[ 1416.742800] x14: 4000000000800001 x13: 01c72e2e2e0204c3 x12: 0000000000000000
[ 1416.750166] x11: ffffffc008b91cb0 x10: ffffff803fdb8668 x9 : ffffffc008f7de78
[ 1416.757533] x8 : 00000000000014f8 x7 : 0000000000000000 x6 : 0000000000000000
[ 1416.764898] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 1416.772263] x2 : 0000000000000040 x1 : 00000000000016c0 x0 : ffffff8004de3e80
[ 1416.779631] Call trace:
[ 1416.782149] ath10k_htt_rx_pktlog_completion_handler+0x1048/0x2e88 [ath10k_core]
[ 1416.789792] ath10k_htt_txrx_compl_task+0x68/0x1158 [ath10k_core]
[ 1416.796090] ath10k_pci_enable_legacy_irq+0xc8/0x280 [ath10k_pci]
[ 1416.802381] __napi_poll+0x34/0x1a8
[ 1416.805987] net_rx_action+0xf8/0x248
[ 1416.809764] _stext+0x11c/0x278
[ 1416.813003] run_ksoftirqd+0x4c/0x60
[ 1416.816693] smpboot_thread_fn+0x144/0x188
[ 1416.820923] kthread+0x11c/0x128
[ 1416.824256] ret_from_fork+0x10/0x20
[ 1416.827949] Code: 95cc4518 17ffffd6 a9425bf5 a94363f7 (2a1403e0)
[ 1416.834235] ---[ end trace 34c6ab41c8a37133 ]---
[ 1416.838996] Kernel panic - not syncing: synchronous parity or ECC error: Fatal exception in interrupt
[ 1416.848505] SMP: stopping secondary CPUs
[ 1417.892545] SMP: failed to stop secondary CPUs 0-1
[ 1417.897487] Kernel Offset: disabled
[ 1417.901079] CPU features: 0x00000000,20000802
[ 1417.905570] Memory Limit: none
[ 1417.908718] Rebooting in 3 seconds..
[ 1420.912549] SMP: stopping secondary CPUs
[ 1421.956589] SMP: failed to stop secondary CPUs 0-1