[CentOS-virt] kernel-4.9.37-29.el7 (and el6)

Mon Jul 24 20:05:50 UTC 2017
Kevin Stange <kevin at steadfast.net>

On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote:
> On Thu, 20 Jul 2017, Kevin Stange wrote:
> 
>> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote:
>>> On Wed, 19 Jul 2017, Johnny Hughes wrote:
>>>
>>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote:
>>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote:
>>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote:
>>>>>>
>>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and
>>>>>>> kernel-4.9.37-29.el6,
>>>>>>> with the one config file change) working for everyone:
>>>>>>>
>>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM)
>>>>>>
>>>>>> Hello.
>>>>>> Maybe it's not the most appropriate thread or time, but I have been
>>>>>> signalling it before:
>>>>>>
>>>>>> 4.9.* kernels do not work well for me any more (and for other people
>>>>>> neither, as I know). Last stable kernel was 4.9.13-22.
>>>
>>> I think I have nailed down the faulty combo.
>>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on
>>> top of Xen Hypervisor.
>>> Id does not work at least on one of my testing servers (old AMD K8 (1
>>> proc,
>>> 1 core), only 1 paravirt guest).
>>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well.
>>> If booted as Xen hypervisor module - it almost instantly gets page
>>> allocation failure.
>>>
>>>
>>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then
>>> problems
>>> started to explode in my production environment, and on testing server
>>> mentioned
>>> above.
>>>
>>> After recompiling recent 4.9.34 with SLAB - everything works well on
>>> that testing machine.
>>> A will try to test 4.9.38 with the same config on my production servers.
>>
>> I was having page allocation failures on 4.9.25 with SLUB, but these
>> problems seem to be gone with 4.9.34 (still with SLUB).   Have you
>> checked this build?  It was moved to the stable repo on July 4th.
> 
> Yes, 4.9.34 was failing too. And this was actually the worst case, with
> I/O error on guest:

I did find one server running 4.9.34 that was still throwing SLUB page
allocation errors, but oddly, the only servers ever to have this issue
for me are spares that are running no domains.  I've just tried booting
that box up on 4.9.39, but I may not know if the switch back to SLAB
fixes anything for several weeks.

Otherwise, the other server I'm running 4.9.39 on for the past 72 hours
has been stable with running domains.

-- 
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
kevin at steadfast.net | www.steadfast.net