On Tue, Feb 17, 2015 at 10:02 PM, Jason Pyeron jpyeron@pdinc.us wrote:
I can say, we have about 20 of the identical systems, doing the same work. PE2970 running RHEL6/Centos6 and libvirtd
20 other identical systems doing the same work strongly suggests hardware problem when there's a single outlier.
If it's a kernel bug though, you could maybe clobber it with a substantially newer kernel. You might check out elrepo kernels. 2.6.32 is really old, granted the centos one you're running has a huge pile of backports that makes it less "ancient" from a stability
We should start looking at Centos7/RHEL7, ug systemd..... But these machines are ancient too.
I've been using it since Fedora 15, I find it easier to use to troubleshoot boot and service startup problems. systemd-analyze blame/plot are quite useful for boot performance optimizing. The journal on Fedora these days is persistent, on CentOS it's volatile with rsyslog running by default; but I like being able to journalctl -b-2 or b-3 to view previous boots, or point all systems to a single server, and sealing the journal logs against tampering, etc. It's certainly different, but wasn't onerous to get used to, and these days I prefer it.
perspective, but anything really new that's hard to backport likely isn't in that kernel. While you're waiting for Dell you could try either:
kernel-ml-3.18.6-1.el6.elrepo.x86_64.rpm kernel-ml-3.19.0-1.el6.elrepo.x86_64.rpm
Unlikly, since I do not have a test plan. If I could reproduce the error on demand then it would be a valid experiment. Some of the systems are running RHEL6 which are under support, while the others are Centos6. The configs are kept as close as possible to each other.
I'd say it's unnecessary at this point. It's almost certainly a hardware problem given the numerous identical setups not having this problem. But, seeing as it panics every 30-40 hours, it can hardly be much worse with a new kernel running for a couple days... but my bet is there'd be no change.