On 11/18/2014 04:59 PM, George Dunlap wrote: > On Sun, Nov 16, 2014 at 12:39 AM, Thomas Weyergraf > <T.Weyergraf at virtfinity.de> wrote: >> Hi folks, >> >> we (the company i am working for) are running several dozens of >> virtualisation servers using CentOS 6 + Xen4CentOS as the virtualisation >> infrastructure. >> >> With the latest versions of all packages installed [1], we see failures in >> live-migration of stock-CentOS6 HVM guests, leaving a "Domain-unnamed" on >> the source host, while the migrated guest runs fine on the target host. >> >> Domain-0 0 2048 64 r----- 6791.5 >> Domain-Unnamed 1 4099 4 --ps-- 94.8 >> >> The failure is not consistently reproducable, some guests (of the same type) >> live-migrate just fine, until eventually some seemingly random guest fails, >> leaving a "Domain-unnamed" Zombie. > Thanks for this report. Good to know, they are appreciated. I have other issues, which I will be reporting soon as well. > > It looks like for some reason xend has asked Xen to shut down the > domain, but Xen is saying, "Sorry, can't do that yet." That's why > restarting xend and removing things from xenstore don't work: xend is > just saying what it sees, and what it sees is a zombie domain that > refuses to die. :-) Right. That's what I figured out as well. Everything in tearing down the migrated DomU in source-host context works fine until the actual deconstruction takes place - and fails. > > Do you have a serial port connected to any of your servers? > * If so, could you: > - Send the output just after you notice a domain in this state > - Type "Ctrl-A" three times on the console to switch to Xen, and then > type 'q' (And send the resulting output) I know, you were (rightfully) going to ask for that. However, I have seen this problem only in our production environment, were such changes are next to impossible, due to policy reasons. I am currently trying to get hold of some spare production servers to configure them accordingly and re-create the problem. If that is going to happen, I will happily provide the dump. However, I cannot guarantee, I will get the required ressources anytime soon. May take weeks to actually get spare machines. > * If not, could you: > - send the output of "xl dmesg" > - Run "xl debug-keys q" and again take the output of "xl dmesg"? I actually did that, but the result was not saved. IIRC, you basically saw all the bits of the DomU in place in the xenstore-part of the dump. I will try to catch that dump asap. The host has already been rebootet, so catching the dump for the reported case is not possible anymore. > > Can you also do "ps ax | grep qemu" to check to see if the qemu > instance associated with this domain has actually been destroyed, or > if it's still around? Yes, the qemu-dm process (btw: called with correct parameters) was already gone. > > Also, have you tried running "xl destroy" on the domain and seeing > what happens? xl is stateless, so it can often do things along side > of xend. This is not a good idea in general as they can freqently end > up stepping on each others' toes; but in this case I think it > shouldn't be a problem. Yes, I did, but to no avail. I even shut-down xend for this attempt to make sure, I do not trigger any code-paths in xl&friends, that might take extra steps for the "xend is running" case. > > Thanks, > > -George Thanks for your time an consideration. If you happen to have any hints on things to try or look after, I'd be a happy consumer ;) Regards, Thomas