I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
I've found - https://bugzilla.redhat.com/show_bug.cgi?id=228108 http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=51306 - which seem related but I believe I am running a kernel that contains these fixes.
My kernel is 2.6.18-194.32.1.el5 on one of the most effected hosts.
Does anyone else have experience with similar issues or know of the status of this Bug/KB?
I can install, boot, run, and then at some seemingly random moment -
init_special_inode: bogus i_mode (50632) init_special_inode: bogus i_mode (137147) init_special_inode: bogus i_mode (172036) init_special_inode: bogus i_mode (175720) init_special_inode: bogus i_mode (72350) init_special_inode: bogus i_mode (174751) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698169 in dir #19696695 Aborting journal on device sdb2. init_special_inode: bogus i_mode (165661) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698131 in dir #19696695 init_special_inode: bogus i_mode (76763) init_special_inode: bogus i_mode (3116) init_special_inode: bogus i_mode (75363) init_special_inode: bogus i_mode (77034) init_special_inode: bogus i_mode (132237) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698139 in dir #19696695 init_special_inode: bogus i_mode (53031) init_special_inode: bogus i_mode (33361) init_special_inode: bogus i_mode (77546) init_special_inode: bogus i_mode (6516) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698143 in dir #19696695 init_special_inode: bogus i_mode (6442) init_special_inode: bogus i_mode (72554) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698142 in dir #19696695 EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698164 in dir #19696695 init_special_inode: bogus i_mode (73171) init_special_inode: bogus i_mode (154432) init_special_inode: bogus i_mode (34302) init_special_inode: bogus i_mode (131733) init_special_inode: bogus i_mode (30773) ext3_abort called. EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
I've found - https://bugzilla.redhat.com/show_bug.cgi?id=228108 http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=51306
- which seem related but I believe I am running a kernel that contains
these fixes.
I ran into a similar problem, but it was not specifically iSCSI. We ended up setting a sysctl.conf file. Give me a few and I will find the setting..
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
http://communities.vmware.com/message/245983
The setting we used to resolve was vm.min_free_kbytes = 8192
Previous to this we were seeing the error pop up every week or so.
Hi
Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??.
Does changing the vm.min_free_kbytes value apply CentOS v.5.4 and 5.5 as well to resolve the issue?
On 13 Feb 2011, at 14:40, "Kwan Lowe" kwan.lowe@gmail.com wrote:
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
http://communities.vmware.com/message/245983
The setting we used to resolve was vm.min_free_kbytes = 8192
Previous to this we were seeing the error pop up every week or so. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue?
I guess we'll see [this issue has become extremely frustrating].
I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh].
On 13 Feb 2011, at 14:40, "Kwan Lowe" kwan.lowe@gmail.com wrote:
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so.
On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams awilliam@whitemice.org wrote:
On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue?
I guess we'll see [this issue has become extremely frustrating].
I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh].
I am experiencing the same issue.
cent: current exsi v3.5 update 5 storage nfs
I am in the process of rebuilding the virtual server using a different os thinking it was just file system errors.
-bazooka
On Mon, 2011-02-14 at 13:01 -0800, Bazooka Joe wrote:
On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams awilliam@whitemice.org wrote:
On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue?
I guess we'll see [this issue has become extremely frustrating]. I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh].
I am experiencing the same issue. cent: current exsi v3.5 update 5 storage nfs I am in the process of rebuilding the virtual server using a different os thinking it was just file system errors.
What other OS?
I've experienced one [possibly unrelated] corruption of the /tmp filesystem on an openSUSE 11.1 VM. So far Windows VMs seem immune to the issue.
On Sun, 2011-02-13 at 09:40 -0500, Kwan Lowe wrote:
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with "fsck -f" and occasionally find errors.
http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so.
You made this change to the *virtual machine* [not the host OS]?
This thread indicates this was with VMware Workstation and not ESX (correct)?
On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams awilliam@whitemice.org wrote: em and force a check with "fsck -f" and
occasionally find errors.
http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so.
You made this change to the *virtual machine* [not the host OS]?
This thread indicates this was with VMware Workstation and not ESX (correct)?
This was done on the CentOS and RHEL guests on VMWare ESX hosts.
Hi,
So the 'fix' is applied directly to the host os, is this the correct thing to do?
sysctl -w vm.min_free_kbytes = 8192
Keith
On 14 Feb 2011, at 10:36, Kwan Lowe wrote:
On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams awilliam@whitemice.org wrote: em and force a check with "fsck -f" and
occasionally find errors.
http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so.
You made this change to the *virtual machine* [not the host OS]?
This thread indicates this was with VMware Workstation and not ESX (correct)?
This was done on the CentOS and RHEL guests on VMWare ESX hosts. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
Hi, So the 'fix' is applied directly to the host os,
no, to the *guest* OS instances. [please, do not top-post].
is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192
No space(s) I believe.
sysctl -w vm.min_free_kbytes=8192
I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity.
On "Jun 20, 2007" in the aforementioned thread there is the comment: "RHEL5 still needs a "fix" as well, and since it's not yet officially supported from VMware for ESX my guess is it won't get a formal fix until it is certified. I plan to post a patched driver for RHEL5 on my website in the next day or so." - but the comment is from *2007* and RHEL5 is now certified.
http://communities.vmware.com/message/881727#881727 seems like an update that describes my issue; but even that is from 2008.
Reference: VMware KB#1001778 (Note: RHEL5U1 is long since released)
On 14 Feb 2011, at 10:36, Kwan Lowe wrote:
On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams awilliam@whitemice.org wrote: em and force a check with "fsck -f" and
occasionally find errors.
http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so.
You made this change to the *virtual machine* [not the host OS]? This thread indicates this was with VMware Workstation and not ESX (correct)?
This was done on the CentOS and RHEL guests on VMWare ESX hosts.
On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
Hi, So the 'fix' is applied directly to the host os,
no, to the *guest* OS instances. [please, do not top-post].
is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192
No space(s) I believe.
sysctl -w vm.min_free_kbytes=8192
I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity.
It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up.
On 02/14/2011 07:31 AM, Kwan Lowe wrote:
On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams awilliam@whitemice.org wrote:
On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
Hi, So the 'fix' is applied directly to the host os,
no, to the *guest* OS instances. [please, do not top-post].
is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192
No space(s) I believe.
sysctl -w vm.min_free_kbytes=8192
I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity.
It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up.
I use that setting on the "Host OS" for VMWare to prevent a whole vm from getting killed.
That setting will maintain a minimum amount of free memory available to prevent a large program that requests memory quick from depleting all available memory and causing the program killer from killing the highest RAM process.
If you are on a Host OS box, the biggest Memory processes are your VMs, and getting one killed off because memory reaches zero is not good.
I don't have any idea how it would fix journal errors on a drive, but I guess it could.
I set it much higher than 8192 on the host machines ... I set it to 131072.
It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up.
I use that setting on the "Host OS" for VMWare to prevent a whole vm from getting killed.
That setting will maintain a minimum amount of free memory available to prevent a large program that requests memory quick from depleting all available memory and causing the program killer from killing the highest RAM process.
If you are on a Host OS box, the biggest Memory processes are your VMs, and getting one killed off because memory reaches zero is not good.
I don't have any idea how it would fix journal errors on a drive, but I guess it could.
It's been a few years since I put in the tuning, but here's some info that might be useful:
http://communities.vmware.com/thread/20690?start=0&tstart=0
In particular, others had reported seeing this error:
"kernel: journal_get_undo_access: No memory for committed data".
I don't recall that error in my case, but might explain why the tuning fixed the problem. There's a bugzilla for this: