[CentOS] Re: BUG in fs/bio.c:99

Mon Oct 23 13:13:26 UTC 2006
J.J. Garcia <stigmatedbrain at gmail.com>

El lun, 23-10-2006 a las 16:28 +0400, Kirill Korotaev escribió:
> J.J.Garcia,
> 
> in july 2006 you reported hit of bio.c:99 bug on CentOs kernel:
> http://lists.centos.org/pipermail/centos/2006-July/067539.html
> 
> Recently, we also hit the same bug, so maybe you can provide some details on:
> 1. whether it is doesn't work for you or was fixed.
> 2. how it was triggered
> 3. whether you know the fast way to reproduce
> 
> Any help would be appreciated.
> 
> Thanks,
> Kirill

Hi Kirill,

Since then no more news from this subject, sorry about it. You can
follow the full story of my testing at CentOS Bug Tracker with 1419 id,
http://bugs.centos.org/view.php?id=1419

Actually i'm running the stock kernel 42.0.3 in affected host with a lil
more stability appreciated than previous kernels released but not really
stable in certain situations (heavy i/o load on external usb 2.0 disk). 

[root at fattybox ~]# cat /etc/grub.conf | grep vmlinuz-`uname -r`
        kernel /vmlinuz-2.6.9-42.0.3.EL ro root=/dev/VolGroup00/LogVol00
ACPI=off vga=0x307 selinux=0


[root at fattybox ~]# dmesg | grep ACPI
ACPI: Unable to locate RSDP
Kernel command line: ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307
selinux=0
ACPI: Subsystem revision 20040816
ACPI: Interpreter disabled.


What's sure, the system was running rock solid with 4.3 release and
22.0.2 kernel. From that point, any update to kernel or pkgs to 4.4
(34.x.x to 42.x.x) i get the same issue.

The call trace (taken from 1419 id. at bug tracker and re-pasted here)
is almost allways the same, and it happends normally when lot of i/o
work is performed on an USB external disk, locally or via NFS/exported.

Call Trace:
 [<c016deb4>] bio_put+0x27/0x28
 [<c016d698>] end_bio_bh_io_sync+0x33/0x37
 [<c016ea55>] bio_endio+0x4f/0x54
 [<c0251ce5>] __end_that_request_first+0xea/0x1ab
 [<d841cdd1>] scsi_end_request+0x1b/0x174 [scsi_mod]
 [<d841d268>] scsi_io_completion+0x20b/0x417 [scsi_mod]
 [<d841800c>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<d8417f31>] scsi_softirq+0xba/0xc2 [scsi_mod]
 [<c0126a9d>] __do_softirq+0x35/0x79
 [<c010934c>] do_softirq+0x46/0x4d


Don't know for you, but im using on that host an PCI/USB 2.0 card, im
not having issues/tested yet same situation for hosts with native USB
2.0. 

[root at fattybox ~]# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia]
(rev 05)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686
AC97 Audio Controller (rev 50)
00:09.0 USB Controller: NEC Corporation USB (rev 43)
00:09.1 USB Controller: NEC Corporation USB (rev 43)
00:09.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1
(rev 6a)


For stock kernel 42.0.3 i had 2 panics, one of them related to fs/bio.c:

[root at spoolbox crash]# grep -R 2.6.9-42.0.3.EL 192.168.0.6-2006*
192.168.0.6-2006-10-10-05:39/log:EFLAGS: 00010082   (2.6.9-42.0.3.EL)
192.168.0.6-2006-10-12-10:21/log:EFLAGS: 00010206   (2.6.9-42.0.3.EL)

[root at spoolbox crash]# tail -n 38 192.168.0.6-2006-10-12-10:21/log
------------[ cut here ]------------
kernel BUG at fs/bio.c:99!
invalid operand: 0000 [#1]
Modules linked in: vfat fat imm eeprom i2c_viapro nfsd exportfs nfs_acl
lp netconsole netdump autofs4 via686a i2c_sensor i2c_isa i2c_dev
i2c_core rfcomm l2cap lockd sunrpc ip_nat_ftp ip_conntrack_ftp
ipt_MASQUERADE iptable_nat ipt_REJECT ipt_state ip_conntrack
iptable_filter ip_tables dm_multipath hci_usb bluetooth sd_mod
usb_storage scsi_mod ohci_hcd ehci_hcd parport_pc parport snd_via82xx
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore
8139too mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<c0171399>]    Not tainted VLI
EFLAGS: 00010206   (2.6.9-42.0.3.EL)
EIP is at bio_destructor+0x19/0x3a
eax: 00000008   ebx: c2356b00   ecx: cddf1f50   edx: c036d880
esi: 00001000   edi: c0170d59   ebp: 00000000   esp: c03fff24
ds: 007b   es: 007b   ss: 0068
Process usb-storage (pid: 2030, threadinfo=c03ff000 task=d5c9e130)
Stack: c2356b00 c01715a8 c0170d8c c2356b00 c0172149 00001000 c2356b00
00000000
       c03fff6c c0258da9 c8c4386c 00000000 00000000 00005000 00002000
c8c4386c
       d56de720 d57c6340 00000001 d840add1 00000001 d5509560 c8c4386c
d57c6340
Call Trace:
 [<c01715a8>] bio_put+0x27/0x28
 [<c0170d8c>] end_bio_bh_io_sync+0x33/0x37
 [<c0172149>] bio_endio+0x4f/0x54
 [<c0258da9>] __end_that_request_first+0xea/0x1ab
 [<d840add1>] scsi_end_request+0x1b/0x174 [scsi_mod]
 [<d840b268>] scsi_io_completion+0x20b/0x417 [scsi_mod]
 [<d840600c>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<d8405f31>] scsi_softirq+0xba/0xc2 [scsi_mod]
 [<c0129a8d>] __do_softirq+0x35/0x79
 [<c0109446>] do_softirq+0x46/0x4d
 =======================
 [<c01089f7>] do_IRQ+0x2b3/0x2bf
 [<c031983c>] common_interrupt+0x18/0x20
 [<c0316019>] __down_interruptible+0x13f/0x24a
 [<c0120049>] default_wake_function+0x0/0xc
 [<c0316137>] __down_failed_interruptible+0x7/0xc
 [<d83f4b13>] .text.lock.usb+0xf/0x78 [usb_storage]
 [<c0318d7e>] ret_from_fork+0x6/0x14
 [<d83f3d24>] usb_stor_control_thread+0x0/0x417 [usb_storage]
 [<d83f3d24>] usb_stor_control_thread+0x0/0x417 [usb_storage]
 [<c01041dd>] kernel_thread_helper+0x5/0xb
Code: 4d 1a 00 e9 cc c9 ff ff e8 cd 4d 1a 00 e9 26 ca ff ff 53 89 c3 8b
40 10 c1 e8 1c 89 c2 c1 e2 04 81 c2 00 d8 36 c0 83 f8 05 7e 08 <0f> 0b
63 00 f0 d8 32 c0 8b 43 30 8b 52 0c e8 90 ce fd ff 89 d8




[root at spoolbox crash]# tail -n 31 192.168.0.6-2006-10-10-05:39/log
kernel BUG at kernel/panic.c:75!
invalid operand: 0000 [#1]
Modules linked in: nls_utf8 ppp_deflate zlib_deflate ppp_async crc_ccitt
ppp_generic slhc vfat fat imm eeprom i2c_viapro nfsd exportfs nfs_acl lp
netconsole netdump autofs4 via686a i2c_sensor i2c_isa i2c_dev i2c_core
rfcomm l2cap lockd sunrpc ip_nat_ftp ip_conntrack_ftp ipt_MASQUERADE
iptable_nat ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables
dm_multipath hci_usb bluetooth sd_mod usb_storage scsi_mod ohci_hcd
ehci_hcd parport_pc parport snd_via82xx snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device snd soundcore 8139too mii dm_snapshot dm_zero
dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<c0123db2>]    Not tainted VLI
EFLAGS: 00010082   (2.6.9-42.0.3.EL)
EIP is at panic+0x47/0x142
eax: 00000065   ebx: c2356b00   ecx: c032a703   edx: d75fbe40
esi: d75fbe8c   edi: d75fbe94   ebp: 00000000   esp: d75fbe48
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 41, threadinfo=d75fb000 task=d75fc050)
Stack: c2356b00 c014b87e c0322158 c032c3b3 00000295 c2356b10 00000000
d75fbe8c
       c036e120 c0155ba7 d75fbe94 c2356a20 00000000 c015602f 0000000e
ffffffff
       c2356b00 00000000 00000000 c1290400 c01557c6 c036c9b4 00000286
00000000
Call Trace:
 [<c014b87e>] find_get_pages+0xb6/0xf4
 [<c0155ba7>] pagevec_lookup+0x17/0x1d
 [<c015602f>] invalidate_mapping_pages+0xb2/0xc5
 [<c01557c6>] __pagevec_release+0x15/0x1d
 [<c016eceb>] remove_inode_buffers+0x12/0x112
 [<c01899a8>] prune_icache+0x1a7/0x354
 [<c0189b69>] shrink_icache_memory+0x14/0x2b
 [<c015630b>] shrink_slab+0xf7/0x14c
 [<c015794b>] balance_pgdat+0x1b3/0x2cb
 [<c0157b1c>] kswapd+0xb9/0xbb
 [<c0121853>] autoremove_wake_function+0x0/0x2d
 [<c0318d7e>] ret_from_fork+0x6/0x14
 [<c0121853>] autoremove_wake_function+0x0/0x2d
 [<c0157a63>] kswapd+0x0/0xbb
 [<c01041dd>] kernel_thread_helper+0x5/0xb
Code: 40 c0 e8 3b 5d 0c 00 68 60 7f 40 c0 68 03 a7 32 c0 e8 64 0b 00 00
83 c4 0c 83 3d 1c a8 42 c0 00 75 09 83 3d 18 a8 42 c0 00 74 08 <0f> 0b
4b 00 26 a7 32 c0 31 c0 e8 53 97 ff ff 31 d2 b9 60 7f 40


Finally, to reproduce iirc i simply have to perform two r/w commands
concurrently, i.e an 'dd' command writing to that external usb disk and
at the same time a second r/w operation operation to that disk, i.e. via
nfs or even on console, after hours/secs the condition is triggered.

I know usb external disks are used with scsi emulation from the point of
view of kernel and they are not a really good approach for massive data
storing (rather using scsi units mainly/directly) but i have almost a TB
on that disks... 

By the way i use the workaround of not writing concurrently to that disk
for a long time or for long chunks... heh, no more clues unless i rescue
the old 4.3 and 22.0.2 raw hd/image for that old host serving as NAS.

Don't know if all of this will be helping, anyway...

Good luck!

Jose.