J.J. Garcia spake the following on 10/24/2006 6:00 AM: > El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió: >> El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió: >>> J.J. Garcia, >>> >>> the bug you face looks exactly like the ours one. >>> I thought it is memory corruption since %eax is 8, while should be 0. >>> (BTW, can you run memtest to make sure your memory is really ok? >>> http://wiki.openvz.org/Hardware_testing ), >>> but the fact that it is always 8 in yours and our case makes me believe >>> it is something else... >>> >>> If I provide some debugging patch for you, will you be able to apply it to your >>> kernel, rebuild it and test the issue? >>> >>> Your help is very much appreciated. >>> >>> Thanks, >>> Kirill >>> >> Sure i'll do my best, if you provide me the patch i can check it on the >> current host, it's not a very critycall host at the network and i think >> the bug is relevant to stop it for a while, >> >> I've started by installing memtest86+ in the related host following the >> next steps, for your info: >> >> <...> >> >> ============================================================================= >> Package Arch Version Repository >> Size >> ============================================================================= >> Installing: >> memtest86+ i386 1.26-2 base >> 53 k >> >> Transaction Summary >> ============================================================================= >> Install 1 Package(s) >> Update 0 Package(s) >> Remove 0 Package(s) >> Total download size: 53 k >> Is this ok [y/N]: y >> Downloading Packages: >> (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB >> 00:00 >> Running Transaction Test >> Finished Transaction Test >> Transaction Test Succeeded >> Running Transaction >> Installing: memtest86+ ######################### >> [1/1] >> >> Installed: memtest86+.i386 0:1.26-2 >> Complete! >> [root at fattybox ~]# rpm -ql memtest86+ >> /boot/memtest86+-1.26 >> /sbin/new-memtest-pkg >> /usr/sbin/memtest-setup >> /usr/share/doc/memtest86+-1.26 >> /usr/share/doc/memtest86+-1.26/README >> >> [root at fattybox ~]# rpm -qi memtest86+ >> Name : memtest86+ Relocations: (not >> relocatable) >> Version : 1.26 Vendor: CentOS >> Release : 2 Build Date: lun 21 feb 2005 >> 20:35:44 CET >> Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: >> bhrama.build.karan.org >> Group : System Environment/Base Source RPM: memtest86 >> +-1.26-2.src.rpm >> Size : 123633 License: GPL >> Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID >> a53d0bab443e1821 >> Packager : Karanbir Singh <kbsingh-IFYaIzF+flcdnm+yROfE0A at public.gmane.org> >> URL : http://www.memtest.org >> Summary : Stand-alone memory tester for x86 and x86-64 computers >> Description : >> Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 >> architecture computers. BIOS based memory tests are only a quick >> check and often miss many of the failures that are detected by >> Memtest86+. >> >> Run 'memtest-setup' to add to your GRUB or lilo boot menu. >> root at fattybox ~]# >> >> Proceding with the install on boot, >> >> [root at fattybox ~]# memtest-setup >> Setup complete. >> >> Lead to /etc/grub.conf in the following way, i'll use it to launch the >> tests by the way: >> >> title Memtest86+ (1.26) >> root (hd0,0) >> kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 >> ACPI=off vga=0x307 selinux=0 >> >> >> Since here, memtest is running using default config, feel free 2 tell me >> 2 change the default params when running if you are looking for >> something you need, i'll leave it running for 48 hours looking for >> something strange in memory. >> >> I've to note that this host has shared memm for the graphics, iow, >> there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 >> mobo (CM33-TL) with up2date bios according dfi with a intel celeron >> running. I can't assure kingstom memories... but 22.0.2 worked fine with >> this hardware previously for long time (months, and year of uptime with >> heavy loads)... >> >> We'll keep on touch, >> >> Jose. >> >> >> > > Hi again, > > After almost 24 hours running memtest86+ in affected host i think it > discovered a memory corruption issue as you mentioned and it can be > checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg > > I'm trying to solve it with a new PC133 memory module. And at the same > time maybe i can use an old video card to avoid memory sharing from mobo > embedded one to simplify things, > > I'll check it then ASAP to see if the EAX register still keeps the noted > value after panic, if i can reproduce it again, > > Sorry about the inconvenience, but what is strange is not having any > kind of memory corruption when 22.0.2 was used for months, really this > morning i was surprised! > > Jose. Memory can fail over time, and also look for any swollen or leaky tops on motherboard capacitors. If this is an older board, which I assumed by the PLE133 chipset, there were a lot of issues with bad capacitors in the 2000 to 2003 timeframe. This can be a symptom of drying electrolyte in the filter caps. -- MailScanner is like deodorant... You hope everybody uses it, and you notice quickly if they don't!!!!