[CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...

Wed Jun 13 17:36:29 UTC 2018
m.roth at 5-cent.us <m.roth at 5-cent.us>

m.roth at 5-cent.us wrote:
> m.roth at 5-cent.us wrote:
>> m.roth at 5-cent.us wrote:
>>> Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0,
>>> 2, 4 and 6 ok, and *all* other show "is now offline.
>>>
>>> What's happening here?
> <snip>
> Ok, more info. I found how to online a CPU -
> echo 1 > /sys/devices/system/cpu/cpu23/online
>
> Perhaps I should have started with 1,3, etc, but I was doing the 20's,
> instead. Got to CPU27... and the system rebooted.
>
> Now I'm wondering if the offline'd CPUs have something to do with the fact
> that this (and an identical one, in the datacenter, are rebooting around
> 04:00 every day. Btw, they're Dell PE R530's from 2016....
>
Still more info (come on, folks, help me out!): these two machines that
keep rebooting, and only one other that doesn't, have Intel E5-2630's in
them. These two are v3, while the one other is a v.2. The latter's
microcode is
microcode: CPU0 sig=0x306e4, pf=0x1, revision=0x428
while on the two that reboot, they have
microcode: CPU0 sig=0x306f2, pf=0x1, revision=0x3a

Anyone think I might be going down the wrong path? Any thoughts at all? If
not, any cmts on my downgrading to the previous microcode? This happened
once a week ago, and then, starting last Friday, began happening at least
around 04:00 every day.

    mark