Hi,
I have a server with Supermicro X7DVL-3 (P9) motherboard, 16G ECC RAM and LSI SAS 1068e RAID controller. I installed CentOS 6.5 64bit on the machine without any problems, but after following the Xen setup steps at
http://wiki.centos.org/HowTos/Xen/Xen4QuickStart
which installed me the kernel 3.10.32-11.el6.centos.alt.x86_64, I encountered a problem: After "Starting certmonger [OK]" the screen went black and the system became unresponsive: keyboard was not working (NumLock did not respond) and SSH was not responding either. After first lockup I increased dom0 max mem to 2G, but rebooting after that produced the same result. The strange thing is, that after a third reboot everything worked ok: screen went black for a moment after "Staring certmonger [OK]" but after that the graphical login screen appeared and I could use the system normally. The fourth reboot went ok as well.
Any ideas what could cause this kind of behaviour?
Regards, Peter
On Tue, Mar 04, 2014 at 02:50:40PM +0200, Peter Peltonen wrote:
Hi, I have a server with Supermicro X7DVL-3 (P9) motherboard, 16G ECC RAM and LSI SAS 1068e RAID controller. I installed CentOS 6.5 64bit on the machine without any problems, but after following the Xen setup steps at [1]http://wiki.centos.org/HowTos/Xen/Xen4QuickStart which installed me the kernel 3.10.32-11.el6.centos.alt.x86_64, I encountered a problem: After "Starting certmonger [OK]" the screen went black and the system became unresponsive: keyboard was not working (NumLock did not respond) and SSH was not responding either. After first lockup I increased dom0 max mem to 2G, but rebooting after that produced the same result. The strange thing is, that after a third reboot everything worked ok: screen went black for a moment after "Staring certmonger [OK]" but after that the graphical login screen appeared and I could use the system normally. The fourth reboot went ok as well. Any ideas what could cause this kind of behaviour?
No idea really.. but what you should do is to enable/configure a serial console, probably by using the IPMI SOL, so you can capture and log all the Xen and dom0 kernel boot messages..
So we can hopefully *see* what the issue is, and not have to guess :)
-- Pasi
Regards, Peter
On 3/4/2014 8:45 PM, Pasi Kärkkäinen wrote:
On Tue, Mar 04, 2014 at 02:50:40PM +0200, Peter Peltonen wrote:
Hi, I have a server with Supermicro X7DVL-3 (P9) motherboard, 16G ECC RAM and LSI SAS 1068e RAID controller. I installed CentOS 6.5 64bit on the machine without any problems, but after following the Xen setup steps at [1]http://wiki.centos.org/HowTos/Xen/Xen4QuickStart which installed me the kernel 3.10.32-11.el6.centos.alt.x86_64, I encountered a problem: After "Starting certmonger [OK]" the screen went black and the system became unresponsive: keyboard was not working (NumLock did not respond) and SSH was not responding either. After first lockup I increased dom0 max mem to 2G, but rebooting after that produced the same result. The strange thing is, that after a third reboot everything worked ok: screen went black for a moment after "Staring certmonger [OK]" but after that the graphical login screen appeared and I could use the system normally. The fourth reboot went ok as well. Any ideas what could cause this kind of behaviour?
No idea really.. but what you should do is to enable/configure a serial console, probably by using the IPMI SOL, so you can capture and log all the Xen and dom0 kernel boot messages..
So we can hopefully *see* what the issue is, and not have to guess :)
-- Pasi
Regards, Peter
Also, are you sure your hardware is stable? It sounds a lot like a hardware issue... Try to run memtest for a while.
Z.
I'd look in the logs for Xorg failures.
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting. Knowledgeable human assistance, not telephone trees or script readers. See our web site: http://www.eskimo.com/ (206) 812-0051 or (800) 246-6874.
On Tue, 4 Mar 2014, Pasi Kärkkäinen wrote:
Date: Tue, 4 Mar 2014 21:45:01 +0200 From: Pasi Kärkkäinen pasik@iki.fi Reply-To: Discussion about the virtualization on CentOS centos-virt@centos.org To: Discussion about the virtualization on CentOS centos-virt@centos.org Subject: Re: [CentOS-virt] Xen4CentOS installation strangeness
On Tue, Mar 04, 2014 at 02:50:40PM +0200, Peter Peltonen wrote:
Hi, I have a server with Supermicro X7DVL-3 (P9) motherboard, 16G ECC RAM and LSI SAS 1068e RAID controller. I installed CentOS 6.5 64bit on the machine without any problems, but after following the Xen setup steps at [1]http://wiki.centos.org/HowTos/Xen/Xen4QuickStart which installed me the kernel 3.10.32-11.el6.centos.alt.x86_64, I encountered a problem: After "Starting certmonger [OK]" the screen went black and the system became unresponsive: keyboard was not working (NumLock did not respond) and SSH was not responding either. After first lockup I increased dom0 max mem to 2G, but rebooting after that produced the same result. The strange thing is, that after a third reboot everything worked ok: screen went black for a moment after "Staring certmonger [OK]" but after that the graphical login screen appeared and I could use the system normally. The fourth reboot went ok as well. Any ideas what could cause this kind of behaviour?
No idea really.. but what you should do is to enable/configure a serial console, probably by using the IPMI SOL, so you can capture and log all the Xen and dom0 kernel boot messages..
So we can hopefully *see* what the issue is, and not have to guess :)
-- Pasi
Regards, Peter
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
Hi,
On Tue, Mar 4, 2014 at 10:08 PM, Robert Dinse nanook@eskimo.com wrote:
I'd look in the logs for Xorg failures.
Thanks Robert, it seems Xorg is indeed the one to blame in this case. Apparently the network issues I experienced at the same time were just bad luck and not related, so while booting another time I could SSH into the system.
These errors are visible in the Xorg logs:
"
[ 128.186] (EE)
[ 128.186] (EE) Backtrace:
[ 128.186] (EE) 0: X (xorg_backtrace+0x36) [0x46d196]
[ 128.186] (EE) 1: X (0x400000+0x72f99) [0x472f99]
[ 128.186] (EE) 2: /lib64/libpthread.so.0 (0x7fb7b6fa3000+0xf710) [0x7fb7b6fb2710]
[ 128.186] (EE) 3: /usr/lib64/xorg/modules/drivers/xgi_drv.so (0x7fb7b4459000+0x21a95) [0x7fb7b447aa95]
[ 128.186] (EE) 4: /usr/lib64/xorg/modules/drivers/xgi_drv.so (0x7fb7b4459000+0x23159) [0x7fb7b447c159]
[ 128.186] (EE) 5: X (dixSaveScreens+0x15e) [0x4695ee]
[ 128.186] (EE) 6: X (0x400000+0x7d061) [0x47d061]
[ 128.186] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fb7b56a3d1d]
[ 128.187] (EE) 8: X (0x400000+0x26189) [0x426189]
[ 128.187] (EE)
[ 128.187] (EE) Segmentation fault at address 0x0
[ 128.187]
Fatal server error:
[ 128.187] Caught signal 11 (Segmentation fault). Server aborting
[ 128.187]
[ 128.187] (EE)
Please consult the CentOS support
"
Some more googling revealed that other people have had similar problems. This is most likely caused by either Xen or the new Xen enabled kernel that was installed as with stock CentOS6 kernel it is not happening. I ran some hardware diagnostics tests for the motherboard and CPUs which passed ok.
I tried creating a new Xorg configuration file and starting X with that, but it did not help. Changing the server's runlevel to 3 resulted the server to boot ok in non-graphical mode. As I can run virt-manager remotely, there is no reason for me to debug this further, and I just wanted to report the issues I've experienced here on the list.
Regards,
Peter