[CentOS-virt] Problem with Xen4CentOS

Tue Nov 18 15:59:17 UTC 2014
George Dunlap <dunlapg at umich.edu>

On Sun, Nov 16, 2014 at 12:39 AM, Thomas Weyergraf
<T.Weyergraf at virtfinity.de> wrote:
> Hi folks,
>
> we (the company i am working for) are running several dozens of
> virtualisation servers using CentOS 6 + Xen4CentOS as the virtualisation
> infrastructure.
>
> With the latest versions of all packages installed [1], we see failures in
> live-migration of stock-CentOS6 HVM guests, leaving a "Domain-unnamed" on
> the source host, while the migrated guest runs fine on the target host.
>
> Domain-0                                     0  2048    64 r----- 6791.5
> Domain-Unnamed                               1  4099     4 --ps--     94.8
>
> The failure is not consistently reproducable, some guests (of the same type)
> live-migrate just fine, until eventually some seemingly random guest fails,
> leaving a "Domain-unnamed" Zombie.

Thanks for this report.

It looks like for some reason xend has asked Xen to shut down the
domain, but Xen is saying, "Sorry, can't do that yet."  That's why
restarting xend and removing things from xenstore don't work: xend is
just saying what it sees, and what it sees is a zombie domain that
refuses to die. :-)

Do you have a serial port connected to any of your servers?
* If so, could you:
 - Send the output just after you notice a domain in this state
 - Type "Ctrl-A" three times on the console to switch to Xen, and then
type 'q'  (And send the resulting output)
* If not, could you:
 - send the output of "xl dmesg"
 - Run "xl debug-keys q" and again take the output of "xl dmesg"?

Can you also do "ps ax | grep qemu" to check to see if the qemu
instance associated with this domain has actually been destroyed, or
if it's still around?

Also, have you tried running "xl destroy" on the domain and seeing
what happens?  xl is stateless, so it can often do things along side
of xend.  This is not a good idea in general as they can freqently end
up stepping on each others' toes; but in this case I think it
shouldn't be a problem.

Thanks,

 -George