El mar, 24-10-2006 a las 08:19 -0700, Scott Silva escribió: > J.J. Garcia spake the following on 10/24/2006 6:00 AM: > > El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió: > >> El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió: > >>> J.J. Garcia, > >>> > >>> the bug you face looks exactly like the ours one. > >>> I thought it is memory corruption since %eax is 8, while should be 0. > >>> (BTW, can you run memtest to make sure your memory is really ok? > >>> http://wiki.openvz.org/Hardware_testing ), > >>> but the fact that it is always 8 in yours and our case makes me believe > >>> it is something else... > >>> > >>> If I provide some debugging patch for you, will you be able to apply it to your > >>> kernel, rebuild it and test the issue? > >>> > >>> Your help is very much appreciated. > >>> > >>> Thanks, > >>> Kirill > >>> > >> Sure i'll do my best, if you provide me the patch i can check it on the > >> current host, it's not a very critycall host at the network and i think > >> the bug is relevant to stop it for a while, > >> > >> I've started by installing memtest86+ in the related host following the > >> next steps, for your info: > >> > >> <...> > >> > >> ============================================================================= > >> Package Arch Version Repository > >> Size > >> ============================================================================= > >> Installing: > >> memtest86+ i386 1.26-2 base > >> 53 k > >> > >> Transaction Summary > >> ============================================================================= > >> Install 1 Package(s) > >> Update 0 Package(s) > >> Remove 0 Package(s) > >> Total download size: 53 k > >> Is this ok [y/N]: y > >> Downloading Packages: > >> (1/1): memtest86+-1.26-2. 100% |=========================| 53 kB > >> 00:00 > >> Running Transaction Test > >> Finished Transaction Test > >> Transaction Test Succeeded > >> Running Transaction > >> Installing: memtest86+ ######################### > >> [1/1] > >> > >> Installed: memtest86+.i386 0:1.26-2 > >> Complete! > >> [root at fattybox ~]# rpm -ql memtest86+ > >> /boot/memtest86+-1.26 > >> /sbin/new-memtest-pkg > >> /usr/sbin/memtest-setup > >> /usr/share/doc/memtest86+-1.26 > >> /usr/share/doc/memtest86+-1.26/README > >> > >> [root at fattybox ~]# rpm -qi memtest86+ > >> Name : memtest86+ Relocations: (not > >> relocatable) > >> Version : 1.26 Vendor: CentOS > >> Release : 2 Build Date: lun 21 feb 2005 > >> 20:35:44 CET > >> Install Date: lun 23 oct 2006 16:25:57 CEST Build Host: > >> bhrama.build.karan.org > >> Group : System Environment/Base Source RPM: memtest86 > >> +-1.26-2.src.rpm > >> Size : 123633 License: GPL > >> Signature : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID > >> a53d0bab443e1821 > >> Packager : Karanbir Singh <kbsingh-IFYaIzF+flcdnm+yROfE0A at public.gmane.org> > >> URL : http://www.memtest.org > >> Summary : Stand-alone memory tester for x86 and x86-64 computers > >> Description : > >> Memtest86+ is a thorough stand-alone memory test for x86 and x86-64 > >> architecture computers. BIOS based memory tests are only a quick > >> check and often miss many of the failures that are detected by > >> Memtest86+. > >> > >> Run 'memtest-setup' to add to your GRUB or lilo boot menu. > >> root at fattybox ~]# > >> > >> Proceding with the install on boot, > >> > >> [root at fattybox ~]# memtest-setup > >> Setup complete. > >> > >> Lead to /etc/grub.conf in the following way, i'll use it to launch the > >> tests by the way: > >> > >> title Memtest86+ (1.26) > >> root (hd0,0) > >> kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00 > >> ACPI=off vga=0x307 selinux=0 > >> > >> > >> Since here, memtest is running using default config, feel free 2 tell me > >> 2 change the default params when running if you are looking for > >> something you need, i'll leave it running for 48 hours looking for > >> something strange in memory. > >> > >> I've to note that this host has shared memm for the graphics, iow, > >> there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100 > >> mobo (CM33-TL) with up2date bios according dfi with a intel celeron > >> running. I can't assure kingstom memories... but 22.0.2 worked fine with > >> this hardware previously for long time (months, and year of uptime with > >> heavy loads)... > >> > >> We'll keep on touch, > >> > >> Jose. > >> > >> > >> > > > > Hi again, > > > > After almost 24 hours running memtest86+ in affected host i think it > > discovered a memory corruption issue as you mentioned and it can be > > checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg > > > > I'm trying to solve it with a new PC133 memory module. And at the same > > time maybe i can use an old video card to avoid memory sharing from mobo > > embedded one to simplify things, > > > > I'll check it then ASAP to see if the EAX register still keeps the noted > > value after panic, if i can reproduce it again, > > > > Sorry about the inconvenience, but what is strange is not having any > > kind of memory corruption when 22.0.2 was used for months, really this > > morning i was surprised! > > > > Jose. > Memory can fail over time, and also look for any swollen or leaky tops on > motherboard capacitors. If this is an older board, which I assumed by the > PLE133 chipset, there were a lot of issues with bad capacitors in the 2000 to > 2003 timeframe. This can be a symptom of drying electrolyte in the filter caps. Scott Sometimes i forget that "nothing is/lasts forever"... :) my fault to be in my maniac mood sometimes... yes, i have to admit it, and i must to write it down on the blackboard for 1000 times! :) By the way thx for the hint, the capacitors seems to be ok on that board, no drying by the momment and the host is well refrigerated, no more than 2 years on service with a "new/0 hours" board since then. I hope to find out the bad ram module asap to get further J.