El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió:
J.J. Garcia,
the bug you face looks exactly like the ours one. I thought it is memory corruption since %eax is 8, while should be 0. (BTW, can you run memtest to make sure your memory is really ok? http://wiki.openvz.org/Hardware_testing ), but the fact that it is always 8 in yours and our case makes me believe it is something else...
If I provide some debugging patch for you, will you be able to apply it to your kernel, rebuild it and test the issue?
Your help is very much appreciated.
Thanks, Kirill
Sure i'll do my best, if you provide me the patch i can check it on the current host, it's not a very critycall host at the network and i think the bug is relevant to stop it for a while,
I've started by installing memtest86+ in the related host following the next steps, for your info:
<...>
============================================================================= Package Arch Version Repository Size ============================================================================= Installing: memtest86+ i386 1.26-2 base 53 k
Transaction Summary ============================================================================= Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 53 k Is this ok [y/N]: y Downloading Packages: (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB 00:00 Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing: memtest86+ ######################### [1/1]
Installed: memtest86+.i386 0:1.26-2 Complete! [root@fattybox ~]# rpm -ql memtest86+ /boot/memtest86+-1.26 /sbin/new-memtest-pkg /usr/sbin/memtest-setup /usr/share/doc/memtest86+-1.26 /usr/share/doc/memtest86+-1.26/README
[root@fattybox ~]# rpm -qi memtest86+ Name : memtest86+ Relocations: (not relocatable) Version : 1.26 Vendor: CentOS Release : 2 Build Date: lun 21 feb 2005 20:35:44 CET Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: bhrama.build.karan.org Group : System Environment/Base Source RPM: memtest86 +-1.26-2.src.rpm Size : 123633 License: GPL Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID a53d0bab443e1821 Packager : Karanbir Singh kbsingh@centos.org URL : http://www.memtest.org Summary : Stand-alone memory tester for x86 and x86-64 computers Description : Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 architecture computers. BIOS based memory tests are only a quick check and often miss many of the failures that are detected by Memtest86+.
Run 'memtest-setup' to add to your GRUB or lilo boot menu. root@fattybox ~]#
Proceding with the install on boot,
[root@fattybox ~]# memtest-setup Setup complete.
Lead to /etc/grub.conf in the following way, i'll use it to launch the tests by the way:
title Memtest86+ (1.26) root (hd0,0) kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307 selinux=0
Since here, memtest is running using default config, feel free 2 tell me 2 change the default params when running if you are looking for something you need, i'll leave it running for 48 hours looking for something strange in memory.
I've to note that this host has shared memm for the graphics, iow, there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 mobo (CM33-TL) with up2date bios according dfi with a intel celeron running. I can't assure kingstom memories... but 22.0.2 worked fine with this hardware previously for long time (months, and year of uptime with heavy loads)...
We'll keep on touch,
Jose.
El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió:
El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió:
J.J. Garcia,
the bug you face looks exactly like the ours one. I thought it is memory corruption since %eax is 8, while should be 0. (BTW, can you run memtest to make sure your memory is really ok? http://wiki.openvz.org/Hardware_testing ), but the fact that it is always 8 in yours and our case makes me believe it is something else...
If I provide some debugging patch for you, will you be able to apply it to your kernel, rebuild it and test the issue?
Your help is very much appreciated.
Thanks, Kirill
Sure i'll do my best, if you provide me the patch i can check it on the current host, it's not a very critycall host at the network and i think the bug is relevant to stop it for a while,
I've started by installing memtest86+ in the related host following the next steps, for your info:
<...>
============================================================================= Package Arch Version Repository Size ============================================================================= Installing: memtest86+ i386 1.26-2 base 53 k
Transaction Summary
Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 53 k Is this ok [y/N]: y Downloading Packages: (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB 00:00 Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing: memtest86+ ######################### [1/1]
Installed: memtest86+.i386 0:1.26-2 Complete! [root@fattybox ~]# rpm -ql memtest86+ /boot/memtest86+-1.26 /sbin/new-memtest-pkg /usr/sbin/memtest-setup /usr/share/doc/memtest86+-1.26 /usr/share/doc/memtest86+-1.26/README
[root@fattybox ~]# rpm -qi memtest86+ Name : memtest86+ Relocations: (not relocatable) Version : 1.26 Vendor: CentOS Release : 2 Build Date: lun 21 feb 2005 20:35:44 CET Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: bhrama.build.karan.org Group : System Environment/Base Source RPM: memtest86 +-1.26-2.src.rpm Size : 123633 License: GPL Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID a53d0bab443e1821 Packager : Karanbir Singh kbsingh@centos.org URL : http://www.memtest.org Summary : Stand-alone memory tester for x86 and x86-64 computers Description : Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 architecture computers. BIOS based memory tests are only a quick check and often miss many of the failures that are detected by Memtest86+.
Run 'memtest-setup' to add to your GRUB or lilo boot menu. root@fattybox ~]#
Proceding with the install on boot,
[root@fattybox ~]# memtest-setup Setup complete.
Lead to /etc/grub.conf in the following way, i'll use it to launch the tests by the way:
title Memtest86+ (1.26) root (hd0,0) kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307 selinux=0
Since here, memtest is running using default config, feel free 2 tell me 2 change the default params when running if you are looking for something you need, i'll leave it running for 48 hours looking for something strange in memory.
I've to note that this host has shared memm for the graphics, iow, there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 mobo (CM33-TL) with up2date bios according dfi with a intel celeron running. I can't assure kingstom memories... but 22.0.2 worked fine with this hardware previously for long time (months, and year of uptime with heavy loads)...
We'll keep on touch,
Jose.
Hi again,
After almost 24 hours running memtest86+ in affected host i think it discovered a memory corruption issue as you mentioned and it can be checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg
I'm trying to solve it with a new PC133 memory module. And at the same time maybe i can use an old video card to avoid memory sharing from mobo embedded one to simplify things,
I'll check it then ASAP to see if the EAX register still keeps the noted value after panic, if i can reproduce it again,
Sorry about the inconvenience, but what is strange is not having any kind of memory corruption when 22.0.2 was used for months, really this morning i was surprised!
Jose.
J.J. Garcia spake the following on 10/24/2006 6:00 AM:
El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió:
El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió:
J.J. Garcia,
the bug you face looks exactly like the ours one. I thought it is memory corruption since %eax is 8, while should be 0. (BTW, can you run memtest to make sure your memory is really ok? http://wiki.openvz.org/Hardware_testing ), but the fact that it is always 8 in yours and our case makes me believe it is something else...
If I provide some debugging patch for you, will you be able to apply it to your kernel, rebuild it and test the issue?
Your help is very much appreciated.
Thanks, Kirill
Sure i'll do my best, if you provide me the patch i can check it on the current host, it's not a very critycall host at the network and i think the bug is relevant to stop it for a while,
I've started by installing memtest86+ in the related host following the next steps, for your info:
<...>
============================================================================= Package Arch Version Repository Size ============================================================================= Installing: memtest86+ i386 1.26-2 base 53 k
Transaction Summary
Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 53 k Is this ok [y/N]: y Downloading Packages: (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB 00:00 Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing: memtest86+ ######################### [1/1]
Installed: memtest86+.i386 0:1.26-2 Complete! [root@fattybox ~]# rpm -ql memtest86+ /boot/memtest86+-1.26 /sbin/new-memtest-pkg /usr/sbin/memtest-setup /usr/share/doc/memtest86+-1.26 /usr/share/doc/memtest86+-1.26/README
[root@fattybox ~]# rpm -qi memtest86+ Name : memtest86+ Relocations: (not relocatable) Version : 1.26 Vendor: CentOS Release : 2 Build Date: lun 21 feb 2005 20:35:44 CET Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: bhrama.build.karan.org Group : System Environment/Base Source RPM: memtest86 +-1.26-2.src.rpm Size : 123633 License: GPL Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID a53d0bab443e1821 Packager : Karanbir Singh kbsingh-IFYaIzF+flcdnm+yROfE0A@public.gmane.org URL : http://www.memtest.org Summary : Stand-alone memory tester for x86 and x86-64 computers Description : Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 architecture computers. BIOS based memory tests are only a quick check and often miss many of the failures that are detected by Memtest86+.
Run 'memtest-setup' to add to your GRUB or lilo boot menu. root@fattybox ~]#
Proceding with the install on boot,
[root@fattybox ~]# memtest-setup Setup complete.
Lead to /etc/grub.conf in the following way, i'll use it to launch the tests by the way:
title Memtest86+ (1.26) root (hd0,0) kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307 selinux=0
Since here, memtest is running using default config, feel free 2 tell me 2 change the default params when running if you are looking for something you need, i'll leave it running for 48 hours looking for something strange in memory.
I've to note that this host has shared memm for the graphics, iow, there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 mobo (CM33-TL) with up2date bios according dfi with a intel celeron running. I can't assure kingstom memories... but 22.0.2 worked fine with this hardware previously for long time (months, and year of uptime with heavy loads)...
We'll keep on touch,
Jose.
Hi again,
After almost 24 hours running memtest86+ in affected host i think it discovered a memory corruption issue as you mentioned and it can be checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg
I'm trying to solve it with a new PC133 memory module. And at the same time maybe i can use an old video card to avoid memory sharing from mobo embedded one to simplify things,
I'll check it then ASAP to see if the EAX register still keeps the noted value after panic, if i can reproduce it again,
Sorry about the inconvenience, but what is strange is not having any kind of memory corruption when 22.0.2 was used for months, really this morning i was surprised!
Jose.
Memory can fail over time, and also look for any swollen or leaky tops on motherboard capacitors. If this is an older board, which I assumed by the PLE133 chipset, there were a lot of issues with bad capacitors in the 2000 to 2003 timeframe. This can be a symptom of drying electrolyte in the filter caps.
El mar, 24-10-2006 a las 08:19 -0700, Scott Silva escribió:
J.J. Garcia spake the following on 10/24/2006 6:00 AM:
El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió:
El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió:
J.J. Garcia,
the bug you face looks exactly like the ours one. I thought it is memory corruption since %eax is 8, while should be 0. (BTW, can you run memtest to make sure your memory is really ok? http://wiki.openvz.org/Hardware_testing ), but the fact that it is always 8 in yours and our case makes me believe it is something else...
If I provide some debugging patch for you, will you be able to apply it to your kernel, rebuild it and test the issue?
Your help is very much appreciated.
Thanks, Kirill
Sure i'll do my best, if you provide me the patch i can check it on the current host, it's not a very critycall host at the network and i think the bug is relevant to stop it for a while,
I've started by installing memtest86+ in the related host following the next steps, for your info:
<...>
============================================================================= Package Arch Version Repository Size ============================================================================= Installing: memtest86+ i386 1.26-2 base 53 k
Transaction Summary
Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 53 k Is this ok [y/N]: y Downloading Packages: (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB 00:00 Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing: memtest86+ ######################### [1/1]
Installed: memtest86+.i386 0:1.26-2 Complete! [root@fattybox ~]# rpm -ql memtest86+ /boot/memtest86+-1.26 /sbin/new-memtest-pkg /usr/sbin/memtest-setup /usr/share/doc/memtest86+-1.26 /usr/share/doc/memtest86+-1.26/README
[root@fattybox ~]# rpm -qi memtest86+ Name : memtest86+ Relocations: (not relocatable) Version : 1.26 Vendor: CentOS Release : 2 Build Date: lun 21 feb 2005 20:35:44 CET Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: bhrama.build.karan.org Group : System Environment/Base Source RPM: memtest86 +-1.26-2.src.rpm Size : 123633 License: GPL Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID a53d0bab443e1821 Packager : Karanbir Singh kbsingh-IFYaIzF+flcdnm+yROfE0A@public.gmane.org URL : http://www.memtest.org Summary : Stand-alone memory tester for x86 and x86-64 computers Description : Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 architecture computers. BIOS based memory tests are only a quick check and often miss many of the failures that are detected by Memtest86+.
Run 'memtest-setup' to add to your GRUB or lilo boot menu. root@fattybox ~]#
Proceding with the install on boot,
[root@fattybox ~]# memtest-setup Setup complete.
Lead to /etc/grub.conf in the following way, i'll use it to launch the tests by the way:
title Memtest86+ (1.26) root (hd0,0) kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307 selinux=0
Since here, memtest is running using default config, feel free 2 tell me 2 change the default params when running if you are looking for something you need, i'll leave it running for 48 hours looking for something strange in memory.
I've to note that this host has shared memm for the graphics, iow, there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 mobo (CM33-TL) with up2date bios according dfi with a intel celeron running. I can't assure kingstom memories... but 22.0.2 worked fine with this hardware previously for long time (months, and year of uptime with heavy loads)...
We'll keep on touch,
Jose.
Hi again,
After almost 24 hours running memtest86+ in affected host i think it discovered a memory corruption issue as you mentioned and it can be checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg
I'm trying to solve it with a new PC133 memory module. And at the same time maybe i can use an old video card to avoid memory sharing from mobo embedded one to simplify things,
I'll check it then ASAP to see if the EAX register still keeps the noted value after panic, if i can reproduce it again,
Sorry about the inconvenience, but what is strange is not having any kind of memory corruption when 22.0.2 was used for months, really this morning i was surprised!
Jose.
Memory can fail over time, and also look for any swollen or leaky tops on motherboard capacitors. If this is an older board, which I assumed by the PLE133 chipset, there were a lot of issues with bad capacitors in the 2000 to 2003 timeframe. This can be a symptom of drying electrolyte in the filter caps.
Scott
Sometimes i forget that "nothing is/lasts forever"... :) my fault to be in my maniac mood sometimes... yes, i have to admit it, and i must to write it down on the blackboard for 1000 times! :)
By the way thx for the hint, the capacitors seems to be ok on that board, no drying by the momment and the host is well refrigerated, no more than 2 years on service with a "new/0 hours" board since then.
I hope to find out the bad ram module asap to get further
J.