[CentOS-virt] Problem with Xen4CentOS
T.Weyergraf at virtfinity.de
Tue Nov 18 16:39:47 UTC 2014
On 11/18/2014 04:59 PM, George Dunlap wrote:
> On Sun, Nov 16, 2014 at 12:39 AM, Thomas Weyergraf
> <T.Weyergraf at virtfinity.de> wrote:
>> Hi folks,
>> we (the company i am working for) are running several dozens of
>> virtualisation servers using CentOS 6 + Xen4CentOS as the virtualisation
>> With the latest versions of all packages installed , we see failures in
>> live-migration of stock-CentOS6 HVM guests, leaving a "Domain-unnamed" on
>> the source host, while the migrated guest runs fine on the target host.
>> Domain-0 0 2048 64 r----- 6791.5
>> Domain-Unnamed 1 4099 4 --ps-- 94.8
>> The failure is not consistently reproducable, some guests (of the same type)
>> live-migrate just fine, until eventually some seemingly random guest fails,
>> leaving a "Domain-unnamed" Zombie.
> Thanks for this report.
Good to know, they are appreciated. I have other issues, which I will be
reporting soon as well.
> It looks like for some reason xend has asked Xen to shut down the
> domain, but Xen is saying, "Sorry, can't do that yet." That's why
> restarting xend and removing things from xenstore don't work: xend is
> just saying what it sees, and what it sees is a zombie domain that
> refuses to die. :-)
Right. That's what I figured out as well. Everything in tearing down the
migrated DomU in source-host context works fine until the actual
deconstruction takes place - and fails.
> Do you have a serial port connected to any of your servers?
> * If so, could you:
> - Send the output just after you notice a domain in this state
> - Type "Ctrl-A" three times on the console to switch to Xen, and then
> type 'q' (And send the resulting output)
I know, you were (rightfully) going to ask for that. However, I have
seen this problem only in our production environment, were such changes
are next to impossible, due to policy reasons. I am currently trying to
get hold of some spare production servers to configure them accordingly
and re-create the problem. If that is going to happen, I will happily
provide the dump. However, I cannot guarantee, I will get the required
ressources anytime soon. May take weeks to actually get spare machines.
> * If not, could you:
> - send the output of "xl dmesg"
> - Run "xl debug-keys q" and again take the output of "xl dmesg"?
I actually did that, but the result was not saved. IIRC, you basically
saw all the bits of the DomU in place in the xenstore-part of the dump.
I will try to catch that dump asap.
The host has already been rebootet, so catching the dump for the
reported case is not possible anymore.
> Can you also do "ps ax | grep qemu" to check to see if the qemu
> instance associated with this domain has actually been destroyed, or
> if it's still around?
Yes, the qemu-dm process (btw: called with correct parameters) was
> Also, have you tried running "xl destroy" on the domain and seeing
> what happens? xl is stateless, so it can often do things along side
> of xend. This is not a good idea in general as they can freqently end
> up stepping on each others' toes; but in this case I think it
> shouldn't be a problem.
Yes, I did, but to no avail. I even shut-down xend for this attempt to
make sure, I do not trigger any code-paths in xl&friends, that might
take extra steps for the "xend is running" case.
Thanks for your time an consideration. If you happen to have any hints
on things to try or look after, I'd be a happy consumer ;)
More information about the CentOS-virt