On 02.12.2013 23:29, Paul Heinlein wrote: > I've a the following happen a couple times now, and my internet > searches are failing to locate an answer to the problem. > > We've got a few servers that primarily house VMs using KVM. They've > got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully > patched (not yet migrated to 6.5). The VM disk images are housed on an > NFS server. None of the VMs is particularly resource-hungry. They run > a variety of Linux distros: CentOS 5/6, Debian 6/7. > > I'll start to see the VMs fail to write files to their local > filesytems. No machine in the chain has rebooted or been updated in > any significant way, but the root filesystem is off-limits. (This will > happen on just one of our servers; the other VM platforms run without > issue.) > > In /var/log/messages, I'll see the following entry for each impacted > VM: > > <date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 > data 0xabcd > > In /var/log/libvirt/qemu/<vm-name>.log, I'll see > > block I/O error in device 'drive-virtio-disk0': Stale file handle > (116) > > Oddly, the underlying host might be running, say, five VMs, but only > four of them will get the log messages, and show the read-only > symptoms, while the fifth just keeps chugging along. I think CentOS ext4 filesystems do remount read-only in cases where the underlying device has problems; if in your case your network has any timeouts or is maxed-out then it could explain the problem. To ignore this might prolly be unwise, but it can be done by specifying errors=continue in /etc/fstab. I would do some network/throughput tests between your hosts though, check that all drives are fine, that have available space etc. Also check the logs, dmesg and so on. -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro