On Wed, May 22, 2019 at 10:22 AM mark <m.roth at 5-cent.us> wrote: > It seems unlikely. It's a 4U server, with 36 disks (and the dual root > disks), in a machine room, and ipmitool sel list shows nada, nor are there > any warnings, as I've seen on other systems occasionally, that the CPU is > overheating, and is being throttled. If this is a recent sever (ivybridge/haswell/broadwell) then I’ve seen the “edac” kernel module prevent SEL from showing faults when a MCE/machine-check-exception occurs. Disable edac and poof server stops crashing and/or SEL shows something useful(ECC/MCE). Did you check /var/log/mcelog?