Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
The first symptom was on a new VM I installed ISPConfig onto. I got through the entire process with only a dependency issue between php-pecl_apc and php-accelerator.
After completing the installation I noticed some funny things, but I assumed it might be the addition of quotas and remounting with quotas on. so I didn't think much of it and rebooted the VM.
It failed to reboot because the file system should not be switched to read-write. Since it was a new VM and installing ISPConfig was an experiment, I just wiped it with the intention of starting over.
While I was creating another clone of a CentOS 6 image on the host, I looked into one of the other VMs running on that host, which has been up and running for 47 days. Same problem, without rebooting. For example running yum give this:
[root@dev log]# yum update Loaded plugins: fastestmirror, presto Cannot open logfile /var/log/yum.log Could not create lock at /var/run/yum.pid: [Errno 30] Read-only file system: '/var/run/yum.pid' Another app is currently holding the yum lock; waiting for it to exit... Traceback (most recent call last): File "/usr/bin/yum", line 29, in <module> yummain.user_main(sys.argv[1:], exit_code=True) File "/usr/share/yum-cli/yummain.py", line 254, in user_main errcode = main(args) File "/usr/share/yum-cli/yummain.py", line 103, in main show_lock_owner(e.pid, logger) File "/usr/share/yum-cli/utils.py", line 106, in show_lock_owner ps = get_process_info(pid) File "/usr/share/yum-cli/utils.py", line 61, in get_process_info if (not os.path.exists("/proc/%d/status" % pid) or TypeError: %d format: a number is required, not str
------------------------------------ And running mount gives this:
[root@dev log]# mount /dev/mapper/VolGroup-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda1 on /boot type ext4 (rw) /web on /NFS/web type none (rw,bind) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /etc/named on /var/named/chroot/etc/named type none (rw,bind) /etc/named.rfc1912.zones on /var/named/chroot/etc/named.rfc1912.zones type none (rw,bind) /usr/lib64/bind on /var/named/chroot/usr/lib64/bind type none (rw,bind) /etc/named.iscdlv.key on /var/named/chroot/etc/named.iscdlv.key type none (rw,bind) /etc/named on /var/named/chroot/etc/named type none (rw,bind) /usr/lib64/bind on /var/named/chroot/usr/lib64/bind type none (rw,bind)
mount: warning: /etc/mtab is not writable (e.g. read-only filesystem). It's possible that information reported by mount(8) is not up to date. For actual information about system mount points check the /proc/mounts file.
------------------------------- The VM is running, serving web pages and responding to DNS queries, but it is clear, given my earlier experience with the ISPConfig machine, that I won't be able to reboot it until I figure out the problem.
Now that I am looking at the output from the mount command I wonder where all those named related mounts came from. Could it be webmin. Both VMs have webmin installed. Mostly to allow be to configure bind, since system-config-bind is no more.
Anybody have any idea what happened, or better yet, any ideas on how to fix this?
Emmett
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
Disclaimer: I can't claim this matches your circumstance exactly, but it is something you might check.
I have seen problems with LVM partitions in KVM guests being unwritable, despite being mounted read-write, on CentOS5.6 (host and guest). Specifically, I was booting a guest from the CentOS live CD in order to fix /etc/fstab on the root partition, which was LVM, but I could not save my changes.
Executing 'vgscan' resolved my problem; I'm not clear exactly why, but I could then remount and write successfully. You might also try 'vgchange -ay' if that doesn't work.
Perhaps: - boot with liveCD .iso in a virtual CD drive - check whether VolGroup-lv_root is mounted and writeable (use mount, lvdisplay, touch etc.) - if not, unmount it, run vgscan - remount it, and check for writability again
N
On 09/08/2011 02:25 AM, Nick wrote:
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
Disclaimer: I can't claim this matches your circumstance exactly, but it is something you might check.
I have seen problems with LVM partitions in KVM guests being unwritable, despite being mounted read-write, on CentOS5.6 (host and guest). Specifically, I was booting a guest from the CentOS live CD in order to fix /etc/fstab on the root partition, which was LVM, but I could not save my changes.
Executing 'vgscan' resolved my problem; I'm not clear exactly why, but I could then remount and write successfully. You might also try 'vgchange -ay' if that doesn't work.
Perhaps:
- boot with liveCD .iso in a virtual CD drive
- check whether VolGroup-lv_root is mounted and writeable (use mount, lvdisplay, touch etc.)
- if not, unmount it, run vgscan
- remount it, and check for writability again
N
I will give that a try. However I remembered something about both failed VMs from my investigations yesterday and checked it out this morning on the one that is still available. The last lines of syslog (/var/log/messages) are:
Sep 6 19:42:49 dev squid[2885]: Ready to serve requests. Sep 6 19:42:50 dev squid[2885]: storeLateRelease: released 0 objects Sep 7 00:16:55 dev fail2ban.actions: WARNING [apache-pma] Ban 82.165.150.194 Sep 7 15:47:06 dev mountd[1658]: authenticated unmount request from 192.168.6.12:603 for /web (/web) Sep 7 15:49:33 dev mountd[1658]: authenticated mount request from 192.168.6.12:699 for /web (/web) Sep 7 15:49:34 dev mountd[1658]: authenticated mount request from 192.168.6.12:863 for /web (/web) Sep 7 21:19:59 dev init: tty (/dev/tty1) main process ended, respawning Sep 7 21:59:23 dev kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 7 21:59:23 dev kernel: ata1.00: failed command: WRITE DMA Sep 7 21:59:23 dev kernel: ata1.00: cmd ca/00:08:a0:02:31/00:00:00:00:00/e3 tag 0 dma 4096 out Sep 7 21:59:23 dev kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Sep 7 21:59:23 dev kernel: ata1.00: status: { DRDY } Sep 7 21:59:23 dev kernel: ata1: soft resetting link Sep 7 21:59:23 dev kernel: ata1.00: configured for MWDMA2 Sep 7 21:59:23 dev kernel: ata1.00: device reported invalid CHS sector 0 Sep 7 21:59:23 dev kernel: ata1: EH complete
So it looks like there is a bug I can report. But where to report it? I don't see how it can be a CentOS 6 bug.
Emmett
Emmett Culley wrote:
On 09/08/2011 02:25 AM, Nick wrote:
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
Disclaimer: I can't claim this matches your circumstance exactly, but it is something you might check.
<snip>
I will give that a try. However I remembered something about both failed VMs from my investigations yesterday and checked it out this morning on the one that is still available. The last lines of syslog (/var/log/messages) are:
<snip>
Sep 7 21:59:23 dev kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 7 21:59:23 dev kernel: ata1.00: failed command: WRITE DMA Sep 7 21:59:23 dev kernel: ata1.00: cmd ca/00:08:a0:02:31/00:00:00:00:00/e3 tag 0 dma 4096 out Sep 7 21:59:23 dev kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Sep 7 21:59:23 dev kernel: ata1.00: status: { DRDY } Sep 7 21:59:23 dev kernel: ata1: soft resetting link Sep 7 21:59:23 dev kernel: ata1.00: configured for MWDMA2 Sep 7 21:59:23 dev kernel: ata1.00: device reported invalid CHS sector 0 Sep 7 21:59:23 dev kernel: ata1: EH complete
<snip> Bad news, IMO: I think you have a hardware problem - looks like sector 0 of your h/d has gone bad.
Got backups? Got spare drive?
mark
On 09/08/2011 09:07 AM, m.roth@5-cent.us wrote:
Emmett Culley wrote:
On 09/08/2011 02:25 AM, Nick wrote:
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
Disclaimer: I can't claim this matches your circumstance exactly, but it is something you might check.
<snip> > I will give that a try. However I remembered something about both failed > VMs from my investigations yesterday and checked it out this morning on > the one that is still available. The last lines of syslog > (/var/log/messages) are: <snip> > Sep 7 21:59:23 dev kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 > action 0x6 frozen > Sep 7 21:59:23 dev kernel: ata1.00: failed command: WRITE DMA > Sep 7 21:59:23 dev kernel: ata1.00: cmd > ca/00:08:a0:02:31/00:00:00:00:00/e3 tag 0 dma 4096 out > Sep 7 21:59:23 dev kernel: res > 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) > Sep 7 21:59:23 dev kernel: ata1.00: status: { DRDY } > Sep 7 21:59:23 dev kernel: ata1: soft resetting link > Sep 7 21:59:23 dev kernel: ata1.00: configured for MWDMA2 > Sep 7 21:59:23 dev kernel: ata1.00: device reported invalid CHS sector 0 > Sep 7 21:59:23 dev kernel: ata1: EH complete <snip> Bad news, IMO: I think you have a hardware problem - looks like sector 0 of your h/d has gone bad.
Got backups? Got spare drive?
mark
Except that this "hardware" is on the guest and so is virtual. The image is actually a LVM logical volume. So, it must be either a kvm/qemu or a kernel bug. I am working on getting a bug reported, as soon as I figure out where to report it.
Emmett
Emmett Culley wrote:
On 09/08/2011 09:07 AM, m.roth@5-cent.us wrote:
Emmett Culley wrote:
On 09/08/2011 02:25 AM, Nick wrote:
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
<snip>
the one that is still available. The last lines of syslog (/var/log/messages) are:
<snip> > Sep 7 21:59:23 dev kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr > 0x0 > action 0x6 frozen > Sep 7 21:59:23 dev kernel: ata1.00: failed command: WRITE DMA > Sep 7 21:59:23 dev kernel: ata1.00: cmd > ca/00:08:a0:02:31/00:00:00:00:00/e3 tag 0 dma 4096 out > Sep 7 21:59:23 dev kernel: res > 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) > Sep 7 21:59:23 dev kernel: ata1.00: status: { DRDY } > Sep 7 21:59:23 dev kernel: ata1: soft resetting link > Sep 7 21:59:23 dev kernel: ata1.00: configured for MWDMA2 > Sep 7 21:59:23 dev kernel: ata1.00: device reported invalid CHS sector > 0 > Sep 7 21:59:23 dev kernel: ata1: EH complete <snip> Bad news, IMO: I think you have a hardware problem - looks like sector 0 of your h/d has gone bad.
Got backups? Got spare drive?
Except that this "hardware" is on the guest and so is virtual. The image is actually a LVM logical volume. So, it must be either a kvm/qemu or a kernel bug. I am working on getting a bug reported, as soon as I figure out where to report it.
Are you sure that the host o/s isn't passing a real error up? Are there errors in the host's logfile?
mark
On 09/08/2011 10:34 AM, m.roth@5-cent.us wrote:
Emmett Culley wrote:
On 09/08/2011 09:07 AM, m.roth@5-cent.us wrote:
Emmett Culley wrote:
On 09/08/2011 02:25 AM, Nick wrote:
On 08/09/11 07:23, Emmett Culley wrote:
Today I suddenly have two VMs that have read only file systems. The host is CentOS 6, as are the two VMs with this problem.
<snip> >>> the one that is still available. The last lines of syslog >>> (/var/log/messages) are: >> <snip> >>> Sep 7 21:59:23 dev kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr >>> 0x0 >>> action 0x6 frozen >>> Sep 7 21:59:23 dev kernel: ata1.00: failed command: WRITE DMA >>> Sep 7 21:59:23 dev kernel: ata1.00: cmd >>> ca/00:08:a0:02:31/00:00:00:00:00/e3 tag 0 dma 4096 out >>> Sep 7 21:59:23 dev kernel: res >>> 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) >>> Sep 7 21:59:23 dev kernel: ata1.00: status: { DRDY } >>> Sep 7 21:59:23 dev kernel: ata1: soft resetting link >>> Sep 7 21:59:23 dev kernel: ata1.00: configured for MWDMA2 >>> Sep 7 21:59:23 dev kernel: ata1.00: device reported invalid CHS sector >>> 0 >>> Sep 7 21:59:23 dev kernel: ata1: EH complete >> <snip> >> Bad news, IMO: I think you have a hardware problem - looks like sector 0 >> of your h/d has gone bad. >> >> Got backups? Got spare drive? >> > Except that this "hardware" is on the guest and so is virtual. The image > is actually a LVM logical volume. So, it must be either a kvm/qemu or a > kernel bug. I am working on getting a bug reported, as soon as I figure > out where to report it.
Are you sure that the host o/s isn't passing a real error up? Are there errors in the host's logfile?
mark
Turns out you were correct. I did see the same error on the host, though with an hour earlier time stamp.
I replaced that drive and all seems well now.
Thanks for your insight.
Emmett