[CentOS-virt] kernel-4.9.37-29.el7 (and el6)

Fri Jul 21 10:48:33 UTC 2017
Johnny Hughes <johnny at centos.org>

On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote:
> On Thu, 20 Jul 2017, Kevin Stange wrote:
> 
>> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote:
>>> On Wed, 19 Jul 2017, Johnny Hughes wrote:
>>>
>>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote:
>>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote:
>>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote:
>>>>>>
>>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and
>>>>>>> kernel-4.9.37-29.el6,
>>>>>>> with the one config file change) working for everyone:
>>>>>>>
>>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM)
>>>>>>
>>>>>> Hello.
>>>>>> Maybe it's not the most appropriate thread or time, but I have been
>>>>>> signalling it before:
>>>>>>
>>>>>> 4.9.* kernels do not work well for me any more (and for other people
>>>>>> neither, as I know). Last stable kernel was 4.9.13-22.
>>>
>>> I think I have nailed down the faulty combo.
>>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on
>>> top of Xen Hypervisor.
>>> Id does not work at least on one of my testing servers (old AMD K8 (1
>>> proc,
>>> 1 core), only 1 paravirt guest).
>>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well.
>>> If booted as Xen hypervisor module - it almost instantly gets page
>>> allocation failure.
>>>
>>>
>>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then
>>> problems
>>> started to explode in my production environment, and on testing server
>>> mentioned
>>> above.
>>>
>>> After recompiling recent 4.9.34 with SLAB - everything works well on
>>> that testing machine.
>>> A will try to test 4.9.38 with the same config on my production servers.
>>
>> I was having page allocation failures on 4.9.25 with SLUB, but these
>> problems seem to be gone with 4.9.34 (still with SLUB).   Have you
>> checked this build?  It was moved to the stable repo on July 4th.
> 
> Yes, 4.9.34 was failing too. And this was actually the worst case, with
> I/O error on guest:
> 
> Jul 16 06:01:03 dom0 kernel: [452360.743312] CPU: 0 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector
> 9200640
> Jul 16 06:01:03 dom0 kernel: [452360.758931] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical
> block 1150080
> Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical
> block 1150081
> Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical
> block 1150082
> Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical
> block 1150083
> Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical
> block 1150084
> Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:03 dom0 kernel: [452361.449389] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:03 dom0 kernel: [452361.449685] CPU: 1 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:03 dom0 kernel: [452361.449934] Hardware name: Supermicro
> X8SIL/X8SIL, BIOS 1.0c 02/25/2010
> Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector
> 6102784
> Jul 16 06:01:03 dom0 kernel: [452361.462103] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:03 dom0 kernel: [452361.676257] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:03 dom0 kernel: [452361.676531] CPU: 0 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector
> 6127872
> Jul 16 06:01:03 dom0 kernel: [452361.692171] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:07 dom0 kernel: [452365.438565] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:07 dom0 kernel: [452365.438870] CPU: 0 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:07 dom0 kernel: [452365.454213] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:07 guest kernel: end_request: I/O error, dev xvda3, sector
> 6477112
> Jul 16 06:01:09 dom0 kernel: [452366.732994] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:09 dom0 kernel: [452366.733306] CPU: 0 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:09 dom0 kernel: [452366.746362] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector
> 6546488
> Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical
> block 818311
> Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical
> block 818312
> Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical
> block 818313
> Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical
> block 818314
> Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3
> Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical
> block 818315
> Jul 16 06:01:09 dom0 kernel: [452366.913734] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:09 dom0 kernel: [452366.914002] CPU: 1 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector
> 6366208
> Jul 16 06:01:09 dom0 kernel: [452366.929809] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:09 dom0 kernel: [452367.288193] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:09 dom0 kernel: [452367.288455] CPU: 1 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:09 dom0 kernel: [452367.301690] SLUB: Unable to allocate
> memory on node -1, gfp=0x2000000(GFP_NOWAIT)
> Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector
> 6630656
> Jul 16 06:01:10 dom0 kernel: [452368.253435] 12.xvda3-0: page allocation
> failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> Jul 16 06:01:10 dom0 kernel: [452368.253701] CPU: 0 PID: 28450 Comm:
> 12.xvda3-0 Tainted: G           O    4.9.34-29.el6.x86_64 #1
> Jul 16 06:01:10 guest kernel: end_request: I/O error, dev xvda3, sector
> 6708224
> 

I will happily create a test kernel with SLAB .. what is your config
file diff?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170721/33674e72/attachment-0006.sig>