On Tuesday 24 April 2007, Jim Perrin wrote:
Okay. Why are you rebuilding, rather than use the already compiled version in the repository?
Especially with such crucial things like kernel I like to be able to react on my own if there are arising security issues not fixed by the upstreram distro in time. Therefore I like the idea of at least beeing able to rebuild the kernel on my own if I have to.
While you should in theory be able to rebuild the module and have it work just fine, sometimes there are issues with the build environment that are difficult to track down.
That's strange because we didn't change anything regarding the build environment. It is a plain Centos 5 installation from scratch using the DVD iso image. Beside some configuration regarding iptables and NFS service NOTHING was been changed.
Do you get the same results with the precompiled modules?
I will try it tomorrow evening because we need the image server today for a bunch of new installations. Luckky us that ext3 is working ;-) As soon as the student rooms are installed I will test the precompiled module and come back to the list with the results.
I don't use xfs that extensively on i686 hardware because I don't trust the 4k stack limitation issue, but a quick run-through with the pre-built modules, and I don't seem to have this issue. (moved a bunch of copies of the centos-5 dvd around, on and off the fs, md5suming each time)
That doesn't make me feel better ;-) We wanted to migrate a machine with almost 7000 user homes on XFS to CentOS 5 and I'm now a "little" bit scared unless we can't track down the problem.
Thanks for your hints, Gernot
Hi CentOS-dev team,
On Tuesday 24 April 2007, Gernot Stocker wrote: [...]
I don't use xfs that extensively on i686 hardware because I don't trust the 4k stack limitation issue, but a quick run-through with the pre-built modules, and I don't seem to have this issue. (moved a bunch of copies of the centos-5 dvd around, on and off the fs, md5suming each time)
1.) I unloaded the xfs modules, removed the selfcompiled xfs- module rpm and installed the binary package from devs.centos.org. It was loaded with:
Apr 25 21:15:30 sz-linux02 kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled Apr 25 21:16:46 sz-linux02 kernel: SGI XFS Quota Management subsystem
2.) I created with "mkfs.xfs /dev/md10" a new filesystem on the softraid mirror and mounted it
Apr 25 21:16:46 sz-linux02 kernel: Filesystem "md10": Disabling barriers, not supported by the underlying device Apr 25 21:16:46 sz-linux02 kernel: XFS mounting filesystem md10 Apr 25 21:16:46 sz-linux02 kernel: SELinux: initialized (dev md10, type xfs), uses xattr
3.) I'm able to reproduce the same behaviour of corrupted XFS filesystem as I got with my self compiled package by just copying bigger files onto the xfs volume. The copyied files are corrupted (the md5sum gives a different result) and hava size between 10 MB and 3 GB. Finally "xfs_logprint /dev/md0" results in:
Bad log record header xfs_logprint: data device: 0x90a log device: 0x90a daddr: 57062816 length: 55720
Header 0x4 wanted 0xfeedbabe
- ERROR: header cycle=4 block=45454 *
Hth Gernot
My suggestion would be to try it on the raw disks without md10 and see if that still gives corruption...
On 4/25/07, Gernot Stocker gernot.stocker@tugraz.at wrote:
Hi CentOS-dev team,
On Tuesday 24 April 2007, Gernot Stocker wrote: [...]
I don't use xfs that extensively on i686 hardware because I don't trust the 4k stack limitation issue, but a quick run-through with the pre-built modules, and I don't seem to have this issue. (moved a bunch of copies of the centos-5 dvd around, on and off the fs, md5suming each time)
1.) I unloaded the xfs modules, removed the selfcompiled xfs- module rpm and installed the binary package from devs.centos.org. It was loaded with:
Apr 25 21:15:30 sz-linux02 kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled Apr 25 21:16:46 sz-linux02 kernel: SGI XFS Quota Management subsystem
2.) I created with "mkfs.xfs /dev/md10" a new filesystem on the softraid mirror and mounted it
Apr 25 21:16:46 sz-linux02 kernel: Filesystem "md10": Disabling barriers, not supported by the underlying device Apr 25 21:16:46 sz-linux02 kernel: XFS mounting filesystem md10 Apr 25 21:16:46 sz-linux02 kernel: SELinux: initialized (dev md10, type xfs), uses xattr
3.) I'm able to reproduce the same behaviour of corrupted XFS filesystem as I got with my self compiled package by just copying bigger files onto the xfs volume. The copyied files are corrupted (the md5sum gives a different result) and hava size between 10 MB and 3 GB. Finally "xfs_logprint /dev/md0" results in:
Bad log record header xfs_logprint: data device: 0x90a log device: 0x90a daddr: 57062816 length: 55720
Header 0x4 wanted 0xfeedbabe
- ERROR: header cycle=4 block=45454 *
Hth Gernot -- Gernot Stocker, Institute for Genomics and Bioinformatics(IGB) Petersgasse 14, 8010 Graz, Austria Tel.: ++43 316 873 5345 http://genome.tugraz.at
CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel
On Wed, 2007-04-25 at 22:39 +0200, Gernot Stocker wrote:
Hi CentOS-dev team,
On Tuesday 24 April 2007, Gernot Stocker wrote: [...]
I don't use xfs that extensively on i686 hardware because I don't trust the 4k stack limitation issue, but a quick run-through with the pre-built modules, and I don't seem to have this issue. (moved a bunch of copies of the centos-5 dvd around, on and off the fs, md5suming each time)
1.) I unloaded the xfs modules, removed the selfcompiled xfs- module rpm and installed the binary package from devs.centos.org. It was loaded with:
Apr 25 21:15:30 sz-linux02 kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled Apr 25 21:16:46 sz-linux02 kernel: SGI XFS Quota Management subsystem
2.) I created with "mkfs.xfs /dev/md10" a new filesystem on the softraid mirror and mounted it
Apr 25 21:16:46 sz-linux02 kernel: Filesystem "md10": Disabling barriers, not supported by the underlying device Apr 25 21:16:46 sz-linux02 kernel: XFS mounting filesystem md10 Apr 25 21:16:46 sz-linux02 kernel: SELinux: initialized (dev md10, type xfs), uses xattr
3.) I'm able to reproduce the same behaviour of corrupted XFS filesystem as I got with my self compiled package by just copying bigger files onto the xfs volume. The copyied files are corrupted (the md5sum gives a different result) and hava size between 10 MB and 3 GB. Finally "xfs_logprint /dev/md0" results in:
Bad log record header xfs_logprint: data device: 0x90a log device: 0x90a daddr: 57062816 length: 55720
Header 0x4 wanted 0xfeedbabe
- ERROR: header cycle=4 block=45454 *
For a test, can you try it without using the MD device.
Is this on an i386 or x86_64 distro.
On Wednesday 25 April 2007, Johnny Hughes wrote:
For a test, can you try it without using the MD device.
I repeated the test without md device and I have the same results:
1.) Different md5sums for bigger files
2.) logprint gives errors
xfs_logprint /dev/sda4 xfs_logprint: xfs_logprint: /dev/sda4 contains a mounted and writable filesystem data device: 0x804 log device: 0x804 daddr: 57062880 length: 55720
Header 0x2 wanted 0xfeedbabe ********************************************************************** * ERROR: header cycle=2 block=822 * ********************************************************************** Bad log record header
Is this on an i386 or x86_64 distro.
It is a i386 system and distro: uname -a Linux sz-linux02.tugraz.at 2.6.18-8.1.1.el5 #1 SMP Mon Apr 9 09:46:54 EDT 2007 i686 i686 i386 GNU/Linux
I almost can exclude a hardware problem because the machine is working as image server already for 2 years. Additionally I can repeat the same copy process onto the same partition/disk formatted with ext3 and the md5sums are fine.
Gernot
Could you give the machine a good solid couple hours of memtest?
On 4/26/07, Gernot Stocker gernot.stocker@tugraz.at wrote:
On Wednesday 25 April 2007, Johnny Hughes wrote:
For a test, can you try it without using the MD device.
I repeated the test without md device and I have the same results:
1.) Different md5sums for bigger files
2.) logprint gives errors
xfs_logprint /dev/sda4 xfs_logprint: xfs_logprint: /dev/sda4 contains a mounted and writable filesystem data device: 0x804 log device: 0x804 daddr: 57062880 length: 55720
Header 0x2 wanted 0xfeedbabe
- ERROR: header cycle=2 block=822 *
Bad log record header
Is this on an i386 or x86_64 distro.
It is a i386 system and distro: uname -a Linux sz-linux02.tugraz.at 2.6.18-8.1.1.el5 #1 SMP Mon Apr 9 09:46:54 EDT 2007 i686 i686 i386 GNU/Linux
I almost can exclude a hardware problem because the machine is working as image server already for 2 years. Additionally I can repeat the same copy process onto the same partition/disk formatted with ext3 and the md5sums are fine.
Gernot
Gernot Stocker, Institute for Genomics and Bioinformatics(IGB) Petersgasse 14, 8010 Graz, Austria Tel.: ++43 316 873 5345 http://genome.tugraz.at
CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel
On Thu, 2007-04-26 at 22:31 -0700, Maciej Żenczykowski wrote:
Could you give the machine a good solid couple hours of memtest?
I have set up an i686 test box to grab the XFS test suite out of SGI's CVS.
I will run this test suite for i686 (and also for x64_64) and work with SGI to address failures.
I will also test the specific issue brought up below as well.
Should have some progress on this issue today.
Thanks for testing, Johnny Hughes
On 4/26/07, Gernot Stocker gernot.stocker@tugraz.at wrote:
On Wednesday 25 April 2007, Johnny Hughes wrote:
For a test, can you try it without using the MD device.
I repeated the test without md device and I have the same results:
1.) Different md5sums for bigger files
2.) logprint gives errors
xfs_logprint /dev/sda4 xfs_logprint: xfs_logprint: /dev/sda4 contains a mounted and writable filesystem data device: 0x804 log device: 0x804 daddr: 57062880 length: 55720
Header 0x2 wanted 0xfeedbabe
- ERROR: header cycle=2 block=822 *
Bad log record header
Is this on an i386 or x86_64 distro.
It is a i386 system and distro: uname -a Linux sz-linux02.tugraz.at 2.6.18-8.1.1.el5 #1 SMP Mon Apr 9 09:46:54 EDT 2007 i686 i686 i386 GNU/Linux
I almost can exclude a hardware problem because the machine is working as image server already for 2 years. Additionally I can repeat the same copy process onto the same partition/disk formatted with ext3 and the md5sums are fine.
Gernot
Gernot Stocker, Institute for Genomics and Bioinformatics(IGB) Petersgasse 14, 8010 Graz, Austria Tel.: ++43 316 873 5345 http://genome.tugraz.at
CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel
CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel
On Friday 27 April 2007, Johnny Hughes wrote:
On Thu, 2007-04-26 at 22:31 -0700, Maciej Żenczykowski wrote:
Could you give the machine a good solid couple hours of memtest?
This weekend I will try to run the memtest over night but I don't expect any results from this test. We were already running a scientific application in testmode, which in general reports almost immediately if a RAM is not working properly. It is stresstesting CPU and RAM and is crosschecking the numerical outcome with precalculated results. It stops immediately as soon as the result differ from the expected ones.
I have set up an i686 test box to grab the XFS test suite out of SGI's CVS.
I will run this test suite for i686 (and also for x64_64) and work with SGI to address failures.
That' sounds lika a good plan! Let me know when you have a fix for the module or if I can help you with a module which gives you more debug information. Just let me know what to do.
I will also test the specific issue brought up below as well.
Should have some progress on this issue today.
Cool!
Thanks for testing,
No problem, Gernot
On Fri, 2007-04-27 at 14:48 +0200, Gernot Stocker wrote:
On Friday 27 April 2007, Johnny Hughes wrote:
On Thu, 2007-04-26 at 22:31 -0700, Maciej Żenczykowski wrote:
Could you give the machine a good solid couple hours of memtest?
This weekend I will try to run the memtest over night but I don't expect any results from this test. We were already running a scientific application in testmode, which in general reports almost immediately if a RAM is not working properly. It is stresstesting CPU and RAM and is crosschecking the numerical outcome with precalculated results. It stops immediately as soon as the result differ from the expected ones.
OK ... I can confirm that I copied a large file (650mb ISO file) from ext3 to XFS partition ... and the MD5 sums were indeed different
(This is on an i686 kernel, with the standard module loaded)
However, I did not have any errors in:
"xfs_logprint /dev/hda3"
so this is quite strange.
I have set up an i686 test box to grab the XFS test suite out of SGI's CVS.
I will run this test suite for i686 (and also for x64_64) and work with SGI to address failures.
That' sounds lika a good plan! Let me know when you have a fix for the module or if I can help you with a module which gives you more debug information. Just let me know what to do.
I have the test suite compiled now and I will kick off this test in a little while.
I will also test the specific issue brought up below as well.
Should have some progress on this issue today.
Cool!
-- Johnny Hughes
On Friday 27 April 2007, Johnny Hughes wrote:
OK ... I can confirm that I copied a large file (650mb ISO file) from ext3 to XFS partition ... and the MD5 sums were indeed different
Bad for the module but good for my system ;-) I would suggest to remove the i686 module from the devs-testing site until the problem is solved. We have lost files, which we could restore somehow. But it shouldn't happen to other people as well.
(This is on an i686 kernel, with the standard module loaded)
My system is also a i686 system. In the other mail I meant with i386 the i386-family.
However, I did not have any errors in:
"xfs_logprint /dev/hda3"
so this is quite strange.
I think that the errors in xfs_logprint are just a matter of time and transferred files. I'm synching a directory tree (using rsync) of about 2000 files with different size and I'm checking it with a "find ./ -type f -exec md5sum {} ; | sort" against the original tree.
I have the test suite compiled now and I will kick off this test in a little while.
Good hunting! Gernot
<snip>
OK ... Working with Eric Sandeen, we have some new xfs modules for testing ... these seem sto work much better.
Thanks Eric :P
xfs-kmod version 0.4-1.2.6.18_8* is now in the Testing Repo and waiting.
--Johnny Hughes
On Saturday 28 April 2007, Johnny Hughes wrote:
<snip>
OK ... Working with Eric Sandeen, we have some new xfs modules for testing ... these seem sto work much better.
Thanks Eric :P
xfs-kmod version 0.4-1.2.6.18_8* is now in the Testing Repo and waiting.
--Johnny Hughes
Hi Johnny, thanks for the new modules and I don't want to disssapoint you, but my tests have been just partly successful.
1.) The binary module kmod-xfs-0.4-1.2.6.18_8.1.1.el5.i686.rpm from the dev site is not loading properly in the standard updated CentOS 5 kernel 1.2.6.18_8.1.1 The module loading results in "undefined symbols".
2.) After recompiling the rpm module from the source rpm, I synced my diretory tree with the new module and the md5sum test has been just fine! Only the xfs_logprint ended up in a punch of errors ... see below
3.) Is there a special reason that the xfs_quota module did disappear? I'm not so deep in the xfsmodules but from the kernel config file in the src rpm I have seen the y instead of the m at the xfs_quota config entry. Does this mean, that the quota module is simply integrated in the main xfs module or do I need a complete rebuild of the kernel?
I think, I will wait with the raid1 xfs tests until the xfs_logprint returns no errors ;-)
So far from here, thanks Gernot