[CentOS] Re: BUG in fs/bio.c:99

Wed Oct 25 11:08:57 UTC 2006
J.J. Garcia <stigmatedbrain at gmail.com>

El mar, 24-10-2006 a las 19:20 +0400, Kirill Korotaev escribió:
> J.J. Garcia,
> 
> thanks a lot for the detailed answer and taking your time helping!
> 

Morning Kirill,

Finally i managed to solve the memory problem by replacing a 128MB PC133
module, same memory config (1x256+1x128 on that mobo) than previous,
same environment then. Running memtest for almost 24 hours leads to no
memory issues. Booted with 42.0.3 since few hours, sys up and running.

[root at fattybox ~]# iostat
Linux 2.6.9-42.0.3.EL (fattybox.stigmatedbrain.net)     25/10/06

cpu-med:  %user   %nice    %sys %iowait   %idle
           2,02   78,23   18,66    0,20    0,90

Device:            tps   Blq_leid/s   Blq_escr/s   Blq_leid   Blq_escr
hda               2,91        33,04        28,87    1700018    1485314
hda1              0,01         0,02         0,00       1040        106
hda2              5,42        33,00        28,86    1697882    1485208
hdd               1,95       101,58         1,23    5226772      63472
hdd1              2,29       101,55         1,23    5225204      63472
dm-0              5,41        32,98        28,86    1697138    1484888
dm-1              0,00         0,01         0,01        360        320
sda               1,40        71,35         0,87    3671370      44896
sda1              1,83        71,34         0,87    3671106      44896
sdb               7,51       613,80        13,20   31583426     679032
sdb1             12,87       613,79        13,20   31583290     679032
sdc               0,00         0,02         0,00        786        168
sdc1              0,01         0,01         0,00        650        168
sdd              19,61      1020,24      1824,04   52497330   93857456
sdd1            244,30      1020,24      1824,04   52497194   93857456
sde               0,00         0,00         0,00          8          0


> 1. do you use md devices in your system?
> 

Not at the moment, no raid configuration on that host, only ide disks
and usb2 external harddisks,

[root at fattybox ~]# cat /proc/mdstat
Personalities :
unused devices: <none>

[root at fattybox ~]# dmesg | grep SCSI
parport0 (addr 0): SCSI adapter, IMG VP1
SCSI subsystem initialized
scsi0 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
scsi1 : SCSI emulation for USB Mass Storage devices
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
scsi2 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdc: 78140160 512-byte hdwr sectors (40008 MB)
SCSI device sdc: 78140160 512-byte hdwr sectors (40008 MB)
scsi3 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sde: 196608 512-byte hdwr sectors (101 MB)
SCSI device sde: drive cache: write back
SCSI device sde: 196608 512-byte hdwr sectors (101 MB)
SCSI device sde: drive cache: write back


> 2. try applying diff-bio-debug-on-orig-rhel4 patch first:
>  # patch -p1 < diff-bio-debug-on-orig-rhel4
>  it will print mode details in case bug happens again.
>  please note that it should not panic due to the bug, so you will
>  need to check dmesg whether bug was hit or not.
> 
> 3. As additional check you can backout debug patch and apply 2nd patch diff-bio.
>   This is the only change in block I/O which I see from .22 kernel which can 
>   influence somehow. So apply it as:
>  # patch -p1 -R < diff-bio-debug-on-orig-rhel4
>  # patch -p1 < diff-bio
> 
> Check whether bug is reproducable now or not.

I've launched several I/O operations on usb disks to see if i can
reproduce the bug, not "succeeded" by the moment but let me check it for
several days. If i get again the panic, i'll patch the kernel and send
you back the results, hope this help.

Thanks a lot for the hints,

Jose.

> 
> Thanks,
> Kirill
>