[CentOS] Re: BUG in fs/bio.c:99

Tue Oct 24 15:19:47 UTC 2006
Scott Silva <ssilva at sgvwater.com>

J.J. Garcia spake the following on 10/24/2006 6:00 AM:
> El lun, 23-10-2006 a las 16:56 +0200, J.J. Garcia escribió:
>> El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev escribió:
>>> J.J. Garcia,
>>>
>>> the bug you face looks exactly like the ours one.
>>> I thought it is memory corruption since %eax is 8, while should be 0.
>>> (BTW, can you run memtest to make sure your memory is really ok?
>>> http://wiki.openvz.org/Hardware_testing ),
>>> but the fact that it is always 8 in yours and our case makes me believe
>>> it is something else...
>>>
>>> If I provide some debugging patch for you, will you be able to apply it to your
>>> kernel, rebuild it and test the issue?
>>>
>>> Your help is very much appreciated.
>>>
>>> Thanks,
>>> Kirill
>>>
>> Sure i'll do my best, if you provide me the patch i can check it on the
>> current host, it's not a very critycall host at the network and i think
>> the bug is relevant to stop it for a while,
>>
>> I've started by installing memtest86+ in the related host following the
>> next steps, for your info:
>>
>> <...>
>>
>> =============================================================================
>>  Package                 Arch       Version          Repository
>> Size
>> =============================================================================
>> Installing:
>>  memtest86+              i386       1.26-2           base
>> 53 k
>>
>> Transaction Summary
>> =============================================================================
>> Install      1 Package(s)
>> Update       0 Package(s)
>> Remove       0 Package(s)
>> Total download size: 53 k
>> Is this ok [y/N]: y
>> Downloading Packages:
>> (1/1): memtest86+-1.26-2. 100% |=========================|  53 kB
>> 00:00
>> Running Transaction Test
>> Finished Transaction Test
>> Transaction Test Succeeded
>> Running Transaction
>>   Installing: memtest86+                   #########################
>> [1/1]
>>
>> Installed: memtest86+.i386 0:1.26-2
>> Complete!
>> [root at fattybox ~]# rpm -ql memtest86+
>> /boot/memtest86+-1.26
>> /sbin/new-memtest-pkg
>> /usr/sbin/memtest-setup
>> /usr/share/doc/memtest86+-1.26
>> /usr/share/doc/memtest86+-1.26/README
>>
>> [root at fattybox ~]# rpm -qi memtest86+
>> Name        : memtest86+                   Relocations: (not
>> relocatable)
>> Version     : 1.26                              Vendor: CentOS
>> Release     : 2                             Build Date: lun 21 feb 2005
>> 20:35:44 CET
>> Install Date: lun 23 oct 2006 16:25:57 CEST      Build Host:
>> bhrama.build.karan.org
>> Group       : System Environment/Base       Source RPM: memtest86
>> +-1.26-2.src.rpm
>> Size        : 123633                           License: GPL
>> Signature   : DSA/SHA1, sáb 26 feb 2005 21:59:06 CET, Key ID
>> a53d0bab443e1821
>> Packager    : Karanbir Singh <kbsingh-IFYaIzF+flcdnm+yROfE0A at public.gmane.org>
>> URL         : http://www.memtest.org
>> Summary     : Stand-alone memory tester for x86 and x86-64 computers
>> Description :
>> Memtest86+ is a thorough stand-alone memory test for x86 and x86-64
>> architecture computers. BIOS based memory tests are only a quick
>> check and often miss many of the failures that are detected by
>> Memtest86+.
>>
>> Run 'memtest-setup' to add to your GRUB or lilo boot menu.
>> root at fattybox ~]#
>>
>> Proceding with the install on boot,
>>
>> [root at fattybox ~]# memtest-setup
>> Setup complete.
>>
>> Lead to /etc/grub.conf in the following way, i'll use it to launch the
>> tests by the way:
>>
>> title Memtest86+ (1.26)
>>         root (hd0,0)
>>         kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00
>> ACPI=off vga=0x307 selinux=0
>>
>>
>> Since here, memtest is running using default config, feel free 2 tell me
>> 2 change the default params when running if you are looking for
>> something you need, i'll leave it running for 48 hours looking for
>> something strange in memory.
>>
>> I've to note that this host has shared memm for the graphics, iow,
>> there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100
>> mobo (CM33-TL) with up2date bios according dfi with a intel celeron
>> running. I can't assure kingstom memories... but 22.0.2 worked fine with
>> this hardware previously for long time (months, and year of uptime with
>> heavy loads)...
>>
>> We'll keep on touch,
>>
>> Jose.
>>
>>
>>
> 
> Hi again,
> 
> After almost 24 hours running memtest86+ in affected host i think it
> discovered a memory corruption issue as you mentioned and it can be
> checked at http://img206.imageshack.us/my.php?image=dscn2284xj4.jpg
> 
> I'm trying to solve it with a new PC133 memory module. And at the same
> time maybe i can use an old video card to avoid memory sharing from mobo
> embedded one to simplify things,
> 
> I'll check it then ASAP to see if the EAX register still keeps the noted
> value after panic, if i can reproduce it again,
> 
> Sorry about the inconvenience, but what is strange is not having any
> kind of memory corruption when 22.0.2 was used for months, really this
> morning i was surprised!
> 
> Jose. 
Memory can fail over time, and also look for any swollen or leaky tops on
motherboard capacitors. If this is an older board, which I assumed by the
PLE133 chipset, there were a lot of issues with bad capacitors in the 2000 to
2003 timeframe. This can be a symptom of drying electrolyte in the filter caps.

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!