Xen VMs won’t start: “Cannot plug VIF”

I've been unable to solve this problem for a few weeks now, so i'll try posting it to my blog – maybe someone out there knows the answer. Would very much appreciate a clue on this one.

I've got a VM on a citrix xenserver 5.0.0 machine that won't start. it was running fine until it was rebooted on a Fri afternoon, a few weeks ago.

Starting it from the (windows-only) xencenter GUI results in:

   "This VM cannot be started, as its network interfaces could not be
   connected. One of the NICs is in use elsewhere."

Starting it from the command line results in:

# xe vm-start uuid=88915b63-d794-e021-4f78-b03f46e352b0
Cannot plug VIF
VIF: 5dfd3886-8b48-20e5-4231-30284d7b185d

anyone seen this before? know how to fix?

I've tried deleting and re-creating the network interface for the VM. I've also tried booting the VM without any network interface at all. I've tried "xe vif-unplug –force uuid=5dfd3886-8b48-20e5-4231-30284d7b185d" (doesn't work, expects the VM to be running, not halted). In fact, I've lost track of all the things i've tried doing that haven't worked.

I haven't rebooted the xen server and *really* don't want to have to do that. I'd rather create a new VM and copy the OS (debian, fortunately so relatively easy to do) than reboot the server (NOTE: Tried that, didn't work).

Since the initial problem, I now have a total of three VMs that won't restart (with the same error message), and I can't even create a new VM. So it looks like a problem with the host rather than the guest. I think i'm going to have to migrate all the VMs off this machine to another box, then rebuild this server. maybe with a newer Xen (on debian, rather than proprietary citrix junk), or maybe as a KVM box.

The (absence of) this kind of crap is one of the main reasons why i prefer KVM to Xen….and if i have to rebuild the server anyway, I may as well rebuild it as a KVM server

7 Comments

Glenn

November 10, 2011 at 9:04 pm 14 years ago

Obvious perhaps, but is it using the same mac address for that nic in the configuration file? You should be able to edit that text file directly to check.

cas

November 11, 2011 at 7:12 am 14 years ago

nope. good idea but each VM has its own unique MAC address (and IP addresses are assigned by dhcp). I've tried giving it a new MAC, but that doesn't help.

Ian Campbell

November 11, 2011 at 8:08 am 14 years ago

It would be worth confirming which network the VIF is connected to and that the configuration of that network and the PIFs on it look sensible. It is also worth checking that no other network conflicts with that one e.g. a network with a bond which uses the same PIF or something like that. If such a Network does exist and another VM has a VIF on it then they will conflict and prevent you starting this one.
There might be more info in the logs, e.g. /var/log/xensource.log. If you collect a bugtool (run xen-bugtool or "Server Status Report" in the GUI) I could take a look if you like.

cas

November 11, 2011 at 10:23 am 14 years ago

now that's interesting. xen-bugtool won't even run on this server, bombs out because there's no /sys – in particular: “OSError: [Errno 2] No such file or directory: ‘/sys/class/net/'”

huh? WTF? how the hell did /sys get unmounted?

I’ve remounted it and can now run xen-bugtool. Even better, the VMs will now boot up. problem solved.

Thanks for getting me to notice there was no /sys – it never even occured to me.

Ian Campbell

November 11, 2011 at 4:41 pm 14 years ago

No /sys? That's pretty wierd and would never have occurred to me either. You mentioned a reboot a while back — was that a controlled reboot or a crash/powercut/etc? I'm wondering if there is some sort of corruption to whichever script was supposed to have mounted /sys because I can't think of any other reason it wouldn't be there.

cas

November 12, 2011 at 8:58 am 14 years ago

it was the VM that was rebooted. crashed, actually.

the server itself has been up for 534 days. I semi-inherited it a few months back, and it has some problems (it's not a complete mess but there are some weird misconfigurations on it – like hostname = localhost, and several of the VMs have fs corruption in their LVM volumes). the server's an IBM x3655, and the warranty expired about a year ago. rather than risk it breaking completely, we're just letting it limp along until we can replace it (purchase was finally approved just a few days ago).

by 'semi-inherited' i mean i got stuck with it but there are other people who have access to it (web developers and the like) who seem to think that force-reboot is not only an acceptable solution but is THE solution to every problem on a VM. inevitable fs corruption – i've had to use losetup on several of the VM's LVs (while they're down) in order to run fsck on them, and then loopback mount them to add "FSCKFIX=yes" to /etc/default/rcS (the linux VMs are all debian, fortunately :) before I reboot them.

Ian Campbell

November 12, 2011 at 8:33 pm 14 years ago

Ouch, I certainly don't envy you that situation. I suppose under the circumstances there's any number of ways /sys might have gotten unmounted.
You might be interested in /opt/xensource/debug/with-vdi (I think that's the path) which automates the losetup stuff for you and dumps you into a shell with /dev/xvda == the vdi you gave it.

Comments are closed.