 
            I run VMware vSphere 4 Essentials with three almost identically configured ESXi 4.1 hosts and a mix of 32 and 64 bit guests including Windows 2003 and 2008 as well as CentOS 5 and 6. Recently I updated one of the hosts to build 800380. The new build runs Windows and CentOS 5 VMs fine, but CentOS 6 guests won't come up.
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
on the bottom of the virtual console. Both run perfectly fine again once I move them back to the host with the older ESXi build.
From one of the failed boot attempts, I captured a VMware debug log which shows:
Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=1 Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=2 Sep 11 17:21:19.629: vcpu-0| X86Fault_Warning: vmcore/vmm64/cpu/interp.c:427: cs:eip=0x10:0xffffffff81038879 fault=13 Sep 11 17:21:19.632: vcpu-0| Vix: [1125838 vmxCommands.c:9609]: VMAutomation_HandleCLIHLTEvent. Do nothing. Sep 11 17:21:19.632: vcpu-0| MsgHint: msg.monitorevent.halt (sent) Sep 11 17:21:19.632: vcpu-0| The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.
Ideas?
aTdHvAaNnKcSe, Tilman
 
            On 09/11/12 12:06 PM, Tilman Schmidt wrote:
I run VMware vSphere 4 Essentials with three almost identically configured ESXi 4.1 hosts and a mix of 32 and 64 bit guests including Windows 2003 and 2008 as well as CentOS 5 and 6. Recently I updated one of the hosts to build 800380. The new build runs Windows and CentOS 5 VMs fine, but CentOS 6 guests won't come up.
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
on the bottom of the virtual console. Both run perfectly fine again once I move them back to the host with the older ESXi build.
From one of the failed boot attempts, I captured a VMware debug log which shows:
Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=1 Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=2 Sep 11 17:21:19.629: vcpu-0| X86Fault_Warning: vmcore/vmm64/cpu/interp.c:427: cs:eip=0x10:0xffffffff81038879 fault=13 Sep 11 17:21:19.632: vcpu-0| Vix: [1125838 vmxCommands.c:9609]: VMAutomation_HandleCLIHLTEvent. Do nothing. Sep 11 17:21:19.632: vcpu-0| MsgHint: msg.monitorevent.halt (sent) Sep 11 17:21:19.632: vcpu-0| The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.
Ideas?
from here, it appears to be a hardware or vmware issue. NOTHING the guest OS does should crash the hypervisor. I'd file a bug report with vmware.
 
            Am 11.09.2012 21:14, schrieb John R Pierce:
On 09/11/12 12:06 PM, Tilman Schmidt wrote:
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
[...]
from here, it appears to be a hardware or vmware issue.
I tend to exclude hardware issues. The host in question was working fine before the update.
NOTHING the guest OS does should crash the hypervisor.
Perhaps I wasn't quite clear. It's the guest OS that panics. The hypervisor continues quite unperturbed by its guest's fate.
Thanks, Tilman
 
            Am 11.09.2012 21:14, schrieb John R Pierce:
On 09/11/12 12:06 PM, Tilman Schmidt wrote:
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
I'd file a bug report with vmware.
Well, yes, I'm working on that. It's a tedious process trying to convince VMware support that I really have bought support.
Meanwhile I'd like to understand what's going wrong here, and ideally how to work around it. I found this blog post
http://www.basemont.com/panic_early_exception_i3_i5_i7_vmware_virtualbox_par...
which seems to hint that the Linux kernel might be involved in the problem after all. The processor in the problem host is a Xeon E3-1270V2, while the other one which works fine has an E3-1230. Alas the "nosmep" boot option did not have any effect.
 
            Le 2012-09-11 21:06, Tilman Schmidt a écrit :
I run VMware vSphere 4 Essentials with three almost identically configured ESXi 4.1 hosts and a mix of 32 and 64 bit guests including Windows 2003 and 2008 as well as CentOS 5 and 6. Recently I updated one of the hosts to build 800380. The new build runs Windows and CentOS 5 VMs fine, but CentOS 6 guests won't come up.
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
I've found what is probably your post on VMware Communities. http://communities.vmware.com/message/2112173?tstart=0
It seems there's a second 4.1 update 3 build (811144): http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd...
It fixes another panic, so trying this build may help. -- PR722061: When a Linux kernel crashes, the linux kexec feature is used to enable booting into a special kdump kernel and gathering crash dump files. An SMP Linux guest configured with kexec might cause the virtual machine to fail with a monitor panic during this reboot. Error messages such as the following might be logged:
vcpu-0| CPU reset: soft (mode 2) vcpu-0| MONITOR PANIC: vcpu-0:VMM fault 14: src=MONITOR rip=0xfffffffffc28c30d regs=0xfffffffffc008b50 --
 
            Am 11.09.2012 21:57, schrieb Laurent:
I've found what is probably your post on VMware Communities. http://communities.vmware.com/message/2112173?tstart=0
Indeed.
It seems there's a second 4.1 update 3 build (811144): http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd...
Thanks, I'll give it a try.
 
            Am 11.09.2012 21:57, schrieb Laurent:
Le 2012-09-11 21:06, Tilman Schmidt a écrit :
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards,
[...]
It seems there's a second 4.1 update 3 build (811144): http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd...
I can't find that second build anywhere.
I have installed patch ESXi410-201208201-UG on the problem host. It refers to http://kb.vmware.com/kb/2020373 for details, which states:
Build 800380 811144 (security-only)
I have also installed ESXi410-Update03 ("VMware ESXi 4.1 Complete Update 3") which is named in the title of that KB article. Update Manager does not offer me anything else to install. Still vSphere Client reports build 800380.
 
            Am 11.09.2012 21:06, schrieb Tilman Schmidt:
I tried two different CentOS 6 VMs. Both have the latest standard kernel (2.6.32-279.5.2.el6.x86_64). Both run perfectly fine on one of the other VMware hosts still running ESXi 4.1.0 build 702113. On build 800380, both display the GRUB menu alright but freeze immediately afterwards, emitting the message
PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
on the bottom of the virtual console. Both run perfectly fine again once I move them back to the host with the older ESXi build.
Two and a half new data points:
- The problem host has a Xeon E3-1270V2 processor while the one which runs the CentOS 6 guests fine has an E3-1230. I'm not sufficiently up to date with Intel processor types to tell whether this would make a difference.
- Another CentOS 6 VM with older kernel 2.6.32-220.7.1.el6.x86_64 does come up on the problem host. It does a panic blink (Caps Lock and Scroll Lock blinking in unison while the VM has the keyboard) but I get a working login prompt (I don't get any further because I don't have a logon for the machine) and I can shut it down normally by sending Ctrl-Alt-Del.
- (the half point, no idea if it matters) The CentOS 6 VMs which die with "PANIC: early exception 0d" do *not* do a panic blink.
So it would seem that something related to the problem was changed in the CentOS kernel between releases 2.6.32-220.7.1 and 2.6.32-279.5.2.
 
            Alright, it's the CPU. Subject adapted accordingly.
In a fit of recklessness, I updated VMware on one of the hosts on which the CentOS 6 machines were still able to run. Lo and behold, they still work fine there. So now I have:
ESXi Build 582267 800380 800380 Processor E5620 E3-1230 E3-1270V2
Windows ok ok ok (all versions)
CentOS 5.8 ok ok ok 2.6.18-308.13.1.el5
CentOS 6.2 ok ok ok(*) 2.6.32-220.7.1.el6.x86_64
CentOS 6.3 ok ok Panic 2.6.32-279.2.1.el6.x86_64
(*) except for the irritating keyboard blink
More ideas?
Thx T.
 
            Am 13.09.2012 11:30, schrieb Tilman Schmidt:
In a fit of recklessness, I updated VMware on one of the hosts on which the CentOS 6 machines were still able to run. Lo and behold, they still work fine there. So now I have:
ESXi Build 582267 800380 800380 Processor E5620 E3-1230 E3-1270V2
Windows ok ok ok (all versions)
CentOS 5.8 ok ok ok 2.6.18-308.13.1.el5
CentOS 6.2 ok ok ok(*) 2.6.32-220.7.1.el6.x86_64
CentOS 6.3 ok ok Panic 2.6.32-279.2.1.el6.x86_64
(*) except for the irritating keyboard blink
In the meantime, one other user with the same problem has turned up on the VMware forum. He reports that Windows 8 x64 doesn't work on the E3-1270V2 host either, but a 32 bit install of CentOS 6.3 does.
Also, I have updated the last host and it continues to run all VMs fine, so the ESXi version is definitely not the culprit.
What happened between kernel releases 2.6.32-220.7.1.el6.x86_64 and 2.6.32-279.2.1.el6.x86_64 that would cause a CPU dependent early exception?
 
            Having read that RHEL/CentOS 6.4 came with new VMware drivers I checked whether this problem might perchance be fixed. It isn't.
In the meantime I got a report that Windows 8 (64 bit) showed a similar problem.
So, updated problem matrix (ESXi build omitted as it has no influence):
Processor E5620 E3-1230 E3-1270V2
Windows XP/2003/2008 ok ok ok
Windows 8 ? ? Panic
CentOS 5.8 ok ok ok 2.6.18-308.13.1.el5
CentOS 6.2 ok ok ok(*) 2.6.32-220.7.1.el6.x86_64
CentOS 6.3 ok ok Panic 2.6.32-279.2.1.el6.x86_64
CentOS 6.4 ok ok Panic 2.6.32-358.0.1.el6.x86_64
(*) keyboard shows panic blink but system works fine otherwise
Reminder of the problem description: trying to boot a VM with CentOS 6.3 or later on a VMware ESXi 4 host with a Xeon E3-1270V2 processor fails immediately after GRUB, with the VM locking up, console message:
Sep 11 17:21:31.498: vmx| PANIC: early exception 0d rip 10:ffffffff81038879 error 0 cr2 0
and ESXi log messages:
Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=1 Sep 11 17:21:19.628: vcpu-0| RDMSR: unknown MSR[0x1a0] (read as zero): rip=0xffffffff810388db count=2 Sep 11 17:21:19.629: vcpu-0| X86Fault_Warning: vmcore/vmm64/cpu/interp.c:427: cs:eip=0x10:0xffffffff81038879 fault=13 Sep 11 17:21:19.632: vcpu-0| Vix: [1125838 vmxCommands.c:9609]: VMAutomation_HandleCLIHLTEvent. Do nothing. Sep 11 17:21:19.632: vcpu-0| MsgHint: msg.monitorevent.halt (sent) Sep 11 17:21:19.632: vcpu-0| The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.


