On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote: > On Thu, 20 Jul 2017, Kevin Stange wrote: > >> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote: >>> On Wed, 19 Jul 2017, Johnny Hughes wrote: >>> >>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote: >>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote: >>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote: >>>>>> >>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and >>>>>>> kernel-4.9.37-29.el6, >>>>>>> with the one config file change) working for everyone: >>>>>>> >>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM) >>>>>> >>>>>> Hello. >>>>>> Maybe it's not the most appropriate thread or time, but I have been >>>>>> signalling it before: >>>>>> >>>>>> 4.9.* kernels do not work well for me any more (and for other people >>>>>> neither, as I know). Last stable kernel was 4.9.13-22. >>> >>> I think I have nailed down the faulty combo. >>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on >>> top of Xen Hypervisor. >>> Id does not work at least on one of my testing servers (old AMD K8 (1 >>> proc, >>> 1 core), only 1 paravirt guest). >>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well. >>> If booted as Xen hypervisor module - it almost instantly gets page >>> allocation failure. >>> >>> >>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then >>> problems >>> started to explode in my production environment, and on testing server >>> mentioned >>> above. >>> >>> After recompiling recent 4.9.34 with SLAB - everything works well on >>> that testing machine. >>> A will try to test 4.9.38 with the same config on my production servers. >> >> I was having page allocation failures on 4.9.25 with SLUB, but these >> problems seem to be gone with 4.9.34 (still with SLUB). Have you >> checked this build? It was moved to the stable repo on July 4th. > > Yes, 4.9.34 was failing too. And this was actually the worst case, with > I/O error on guest: I did find one server running 4.9.34 that was still throwing SLUB page allocation errors, but oddly, the only servers ever to have this issue for me are spares that are running no domains. I've just tried booting that box up on 4.9.39, but I may not know if the switch back to SLAB fixes anything for several weeks. Otherwise, the other server I'm running 4.9.39 on for the past 72 hours has been stable with running domains. -- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin at steadfast.net | www.steadfast.net