I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem.
We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7.
I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.)
In /var/log/messages, I'll see the following entry for each impacted VM:
<date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
In /var/log/libvirt/qemu/<vm-name>.log, I'll see
block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along.
Googling suggests that the "disabled perfctr wrmsr" message is harmless, but my experience suggests otherwise.
Any hints, workarounds, or relevent information is very welcome.
Thanks!
<date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
In /var/log/libvirt/qemu/<vm-name>.log, I'll see
block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
Are you sure your network is sound. If you turn on debug logging on the NFS do you see anything interesting at the time you get the Stale file handle errors?
On 02.12.2013 23:29, Paul Heinlein wrote:
I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem.
We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7.
I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.)
In /var/log/messages, I'll see the following entry for each impacted VM:
<date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
In /var/log/libvirt/qemu/<vm-name>.log, I'll see
block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along.
I think CentOS ext4 filesystems do remount read-only in cases where the underlying device has problems; if in your case your network has any timeouts or is maxed-out then it could explain the problem. To ignore this might prolly be unwise, but it can be done by specifying errors=continue in /etc/fstab. I would do some network/throughput tests between your hosts though, check that all drives are fine, that have available space etc. Also check the logs, dmesg and so on.
On 03.Dez.2013, at 00:29, Paul Heinlein wrote:
I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem.
We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7.
I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.)
In /var/log/messages, I'll see the following entry for each impacted VM:
<date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
In /var/log/libvirt/qemu/<vm-name>.log, I'll see
block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along.
Googling suggests that the "disabled perfctr wrmsr" message is harmless, but my experience suggests otherwise.
Any hints, workarounds, or relevent information is very welcome.
I have seen a non-root ext4 filesystem going read only while providing it to 2 virtual machines at the same time by mistake. I went read-only only on one virtual machine.