On 02.12.2013 23:29, Paul Heinlein wrote:
I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem.
We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7.
I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.)
In /var/log/messages, I'll see the following entry for each impacted VM:
<date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
In /var/log/libvirt/qemu/<vm-name>.log, I'll see
block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along.
I think CentOS ext4 filesystems do remount read-only in cases where the underlying device has problems; if in your case your network has any timeouts or is maxed-out then it could explain the problem. To ignore this might prolly be unwise, but it can be done by specifying errors=continue in /etc/fstab. I would do some network/throughput tests between your hosts though, check that all drives are fine, that have available space etc. Also check the logs, dmesg and so on.