[CentOS-virt] kernel-4.9.37-29.el7 (and el6)

Thu Jul 20 10:31:55 UTC 2017
Piotr Gackiewicz <p.gackiewicz at intertele.pl>

On Wed, 19 Jul 2017, Johnny Hughes wrote:

> On 07/19/2017 09:23 AM, Johnny Hughes wrote:
>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote:
>>> On Mon, 17 Jul 2017, Johnny Hughes wrote:
>>>
>>>> Are the testing kernels (kernel-4.9.37-29.el7 and kernel-4.9.37-29.el6,
>>>> with the one config file change) working for everyone:
>>>>
>>>> (turn off: CONFIG_IO_STRICT_DEVMEM)
>>>
>>> Hello.
>>> Maybe it's not the most appropriate thread or time, but I have been
>>> signalling it before:
>>>
>>> 4.9.* kernels do not work well for me any more (and for other people
>>> neither, as I know). Last stable kernel was 4.9.13-22.

I think I have nailed down the faulty combo.
My tests showed, that SLUB allocator does not work well in Xen Dom0, on top of Xen Hypervisor.
Id does not work at least on one of my testing servers (old AMD K8 (1 proc,
1 core), only 1 paravirt guest).
If kernel with SLUB booted as main (w/o Xen hypervisor), it works well.
If booted as Xen hypervisor module - it almost instantly gets page allocation failure.


SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then problems
started to explode in my production environment, and on testing server mentioned
above.

After recompiling recent 4.9.34 with SLAB - everything works well on that testing machine.
A will try to test 4.9.38 with the same config on my production servers.

Moreover, digging into logs of memory allocation failures on my production
supermicro servers resulted in some interesting findings:

Jul  9 05:02:47 xen kernel: [3040088.089379] gzip: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
Jul 10 12:18:01 xen kernel: [3152495.802565] 2.xvda5-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
Jul 10 12:18:01 xen kernel: [3152495.815871] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT)
Jul 10 12:18:01 xen kernel: [3152495.816826] 2.xvda5-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
Jul 10 12:18:01 xen kernel: [3152495.832477] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT)
Jul 10 12:20:20 xen kernel: [3152635.070680] 1.xvda5-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
Jul 10 12:20:20 xen kernel: [3152635.083952] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT)
Jul 12 09:15:15 xen kernel: [118420.343615] 10.xvda5-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
Jul 12 09:15:15 xen kernel: [118420.359779] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT)

What is node "-1" ?
8-/

I think it should be reported to Xen and/or SLUB developers.
I suggest releasing new Xen kernels with SLAB, until the issue is resolved.

Regards,

-- 
Piotr Gackiewicz
Intertele S.A. - operator systemów ITL.PL i DOMENY.ITL.PL
al. T. Rejtana 10, 35-310 Rzeszów
TEL: +48 17 8507580, FAX: +48 17 8520275

http://www.itl.pl       - niezawodne usługi hostingowe
http://domeny.itl.pl    - tanie domeny internetowe
http://www.intertele.pl