Hi,
Since upgrading to "2.6.18-194" I am getting odd messages in the logs. Such as;
sraid3 kernel INFO task pdflush 259 blocked for more than 120 seconds.
The output from
grep '120 seconds' /var/log/messages | tr : ' ' | awk '{print $10}' | sort | uniq -c
6 nfsd 4 pdflush
This is from an NFS server that since the upgrade has been flakey. I have an identical server running "2.6.18-164.11.1" and no such messages are seen.
is anyone else seeing this and/or know what is going on?
Steve
The messages appear as follows.
============================================================================================
Jun 7 19:45:21 sraid3 kernel: INFO: task nfsd:3369 blocked for more than 120 seconds. Jun 7 19:45:21 sraid3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 19:45:21 sraid3 kernel: nfsd D ffff8101062a9100 0 3369 1 3370 3368 (L-TLB) Jun 7 19:45:21 sraid3 kernel: ffff8101053519e0 0000000000000046 0000000000000020 ffff810105ea0780 Jun 7 19:45:21 sraid3 kernel: ffff8101180e203a 000000000000000a ffff81013ecb77a0 ffff8101062a9100 Jun 7 19:45:21 sraid3 kernel: 00000cab54499f77 00000000000026c3 ffff81013ecb7988 00000001000002d8 Jun 7 19:45:21 sraid3 kernel: Call Trace: Jun 7 19:45:21 sraid3 kernel: [<ffffffff80064b29>] _spin_lock_bh+0x9/0x14 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2ab>] inode_wait+0x9/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0aec>] wake_bit_function+0x0/0x23 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8003dbbf>] ifind_fast+0x6e/0x83 Jun 7 19:45:21 sraid3 kernel: [<ffffffff80023290>] iget_locked+0x59/0x149 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88cf0b0f>] :ext4:ext4_iget+0x16/0x65a Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0abe>] autoremove_wake_function+0x0/0x2e Jun 7 19:45:21 sraid3 kernel: [<ffffffff88cf8171>] :ext4:ext4_get_dentry+0x3b/0x87 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dcf36b>] :exportfs:find_exported_dentry+0x43/0x480 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88ddc753>] :nfsd:nfsd_acceptable+0x0/0xd8 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88de074f>] :nfsd:exp_get_by_name+0x5b/0x71 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88de0d3e>] :nfsd:exp_find_key+0x89/0x9c Jun 7 19:45:21 sraid3 kernel: [<ffffffff8008b4b1>] __wake_up_common+0x3e/0x68 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8009b1dd>] set_current_groups+0x159/0x164 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dcf7f3>] :exportfs:export_decode_fh+0x4b/0x50 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88ddcac5>] :nfsd:fh_verify+0x29a/0x4bd Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dddccf>] :nfsd:nfsd_open+0x20/0x184 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dddffb>] :nfsd:nfsd_write+0x89/0xd5 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88de4b1a>] :nfsd:nfsd3_proc_write+0xea/0x109 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dda1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88d5a651>] :sunrpc:svc_process+0x454/0x71b Jun 7 19:45:21 sraid3 kernel: [<ffffffff80064644>] __down_read+0x12/0x92 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dda5a1>] :nfsd:nfsd+0x0/0x2cb Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dda746>] :nfsd:nfsd+0x1a5/0x2cb Jun 7 19:45:21 sraid3 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dda5a1>] :nfsd:nfsd+0x0/0x2cb Jun 7 19:45:21 sraid3 kernel: [<ffffffff88dda5a1>] :nfsd:nfsd+0x0/0x2cb Jun 7 19:45:21 sraid3 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
============================================================================================
on 6-7-2010 1:12 PM Steve Brooks spake the following:
Hi,
Since upgrading to "2.6.18-194" I am getting odd messages in the logs. Such as;
sraid3 kernel INFO task pdflush 259 blocked for more than 120 seconds.
The output from
grep '120 seconds' /var/log/messages | tr : ' ' | awk '{print $10}' | sort | uniq -c
6 nfsd 4 pdflush
This is from an NFS server that since the upgrade has been flakey. I have an identical server running "2.6.18-164.11.1" and no such messages are seen.
is anyone else seeing this and/or know what is going on?
Steve
Maybe boot back to the older kernel and see if the messages stop... That would eliminate any sudden coincidental hardware problems.
On Mon, 7 Jun 2010, Scott Silva wrote:
on 6-7-2010 1:12 PM Steve Brooks spake the following:
Hi,
Since upgrading to "2.6.18-194" I am getting odd messages in the logs. Such as;
sraid3 kernel INFO task pdflush 259 blocked for more than 120 seconds.
The output from
grep '120 seconds' /var/log/messages | tr : ' ' | awk '{print $10}' | sort | uniq -c
6 nfsd 4 pdflush
This is from an NFS server that since the upgrade has been flakey. I have an identical server running "2.6.18-164.11.1" and no such messages are seen.
is anyone else seeing this and/or know what is going on?
Steve
Maybe boot back to the older kernel and see if the messages stop... That would eliminate any sudden coincidental hardware problems.
Thanks Scott,
Yes, this is a good idea, however being a production server it can't be done immediately. Also, with the kernel messages being labeled "INFO" it could be *possibly* just be the new kernel being informative and possibly nothing serious but it does look alarming. Having done more research now, it appears that upstream distros have had problems with this too and experienced software lockups and hangs.
Steve
Hi (2010/06/08 5:12), Steve Brooks wrote:
Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0aec>] wake_bit_function+0x0/0x23 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8003dbbf>] ifind_fast+0x6e/0x83
This message was created at Linux/fs/inode.c:ifind_fast() The source code was bellows,
Linux/fs/inode.c: 912 static struct inode *ifind_fast(struct super_block *sb, 913 struct hlist_head *head, unsigned long ino) 914 { 915 struct inode *inode; 916 917 *LOCK* spin_lock(&inode_lock); <= This takes 918 inode = find_inode_fast(sb, head, ino); <= more 120s. 919 if (inode) { 920 __iget(inode); 921 *UNLOCK* spin_unlock(&inode_lock); 922 wait_on_inode(inode); 923 return inode; 924 } 925 spin_unlock(&inode_lock); 926 return NULL; 927 } 928
I guess your your file system has a trouble with i-node(file number) resources. CAUSES: Hard Disk trouble (bit error/raid trouble.) i-node trouble (overflow. etc.) Memory/CPU trouble(&inode_lock)
Buy Fresh Hard disks& rebuild them is convenience way. Or memtest86 can finds DIMM trouble.(or CPU, mother board) Or ext4 bug in 194.3.1 kernel, back to ext3!
-tsuyoshi
W dniu 2010-06-08 09:54, Tsuyoshi Nagata pisze:
Hi (2010/06/08 5:12), Steve Brooks wrote:
Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0aec>] wake_bit_function+0x0/0x23 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8003dbbf>] ifind_fast+0x6e/0x83
This message was created at Linux/fs/inode.c:ifind_fast() The source code was bellows,
Linux/fs/inode.c: 912 static struct inode *ifind_fast(struct super_block *sb, 913 struct hlist_head *head, unsigned long ino) 914 { 915 struct inode *inode; 916 917 *LOCK* spin_lock(&inode_lock);<= This takes 918 inode = find_inode_fast(sb, head, ino);<= more 120s. 919 if (inode) { 920 __iget(inode); 921 *UNLOCK* spin_unlock(&inode_lock); 922 wait_on_inode(inode); 923 return inode; 924 } 925 spin_unlock(&inode_lock); 926 return NULL; 927 } 928
I guess your your file system has a trouble with i-node(file number) resources. CAUSES: Hard Disk trouble (bit error/raid trouble.) i-node trouble (overflow. etc.) Memory/CPU trouble(&inode_lock)
Buy Fresh Hard disks& rebuild them is convenience way. Or memtest86 can finds DIMM trouble.(or CPU, mother board) Or ext4 bug in 194.3.1 kernel, back to ext3!
Ok, then i will test all of my centos 5.5 32 nodes: cpu, ram, disks etc. This came with the kernel of Centos 5.5. Before there was'nt such errors/warrning. Redhat bugizilla: https://bugzilla.redhat.com/show_bug.cgi?id=573106
I.Piasecki
-tsuyoshi _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello,
I'm getting the same thing on one of our servers since upgrading to CentOS 5.5:
INFO: task pdflush:21249 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pdflush D 00001EE1 3540 21249 11 21226 (L-TLB) f0b59f24 00000046 af2b024f 00001ee1 c04ec500 00000000 c041e314 0000000a f76ae550 af2b2566 00001ee1 00002317 00000001 f76ae65c c180dc44 c198e200 00000000 c180e5e4 f0b59fa8 f76ae550 c180dc44 c061bbc0 f76ae550 ffffffff Call Trace: [<c04ec500>] __next_cpu+0x12/0x21 [<c041e314>] find_busiest_group+0x177/0x462 [<c061bbc0>] schedule+0xbc/0xa55 [<c061d981>] rwsem_down_read_failed+0x128/0x143 [<c04389ef>] .text.lock.rwsem+0x35/0x3a [<c047c014>] sync_supers+0x2f/0xb8 [<c045df9c>] wb_kupdate+0x36/0x10f [<c045e431>] pdflush+0x0/0x1a3 [<c045e53c>] pdflush+0x10b/0x1a3 [<c045df66>] wb_kupdate+0x0/0x10f [<c0435f43>] kthread+0xc0/0xed [<c0435e83>] kthread+0x0/0xed [<c0405c53>] kernel_thread_helper+0x7/0x10
From the bugs already filed, it seems to happen to many (or any?) processes and some notice hangups and performance drops. But our system seems okay, probably because it has low traffic and is mostly idle. But I'll still reboot to the previous kernel version tonight.
dianne
On Jun 8, 2010, at 1:04 AM, Ireneusz Piasecki wrote:
W dniu 2010-06-08 09:54, Tsuyoshi Nagata pisze:
Hi (2010/06/08 5:12), Steve Brooks wrote:
Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0aec>] wake_bit_function+0x0/0x23 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8003dbbf>] ifind_fast+0x6e/0x83
This message was created at Linux/fs/inode.c:ifind_fast() The source code was bellows,
Linux/fs/inode.c: 912 static struct inode *ifind_fast(struct super_block *sb, 913 struct hlist_head *head, unsigned long ino) 914 { 915 struct inode *inode; 916 917 *LOCK* spin_lock(&inode_lock);<= This takes 918 inode = find_inode_fast(sb, head, ino);<= more 120s. 919 if (inode) { 920 __iget(inode); 921 *UNLOCK* spin_unlock(&inode_lock); 922 wait_on_inode(inode); 923 return inode; 924 } 925 spin_unlock(&inode_lock); 926 return NULL; 927 } 928
I guess your your file system has a trouble with i-node(file number) resources. CAUSES: Hard Disk trouble (bit error/raid trouble.) i-node trouble (overflow. etc.) Memory/CPU trouble(&inode_lock)
Buy Fresh Hard disks& rebuild them is convenience way. Or memtest86 can finds DIMM trouble.(or CPU, mother board) Or ext4 bug in 194.3.1 kernel, back to ext3!
Ok, then i will test all of my centos 5.5 32 nodes: cpu, ram, disks etc. This came with the kernel of Centos 5.5. Before there was'nt such errors/warrning. Redhat bugizilla: https://bugzilla.redhat.com/show_bug.cgi?id=573106
I.Piasecki
-tsuyoshi _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Jun 8, 2010, at 12:08 PM, Dianne Yumul wrote:
Hello,
I'm getting the same thing on one of our servers since upgrading to CentOS 5.5:
INFO: task pdflush:21249 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pdflush D 00001EE1 3540 21249 11 21226 (L-TLB) f0b59f24 00000046 af2b024f 00001ee1 c04ec500 00000000 c041e314 0000000a f76ae550 af2b2566 00001ee1 00002317 00000001 f76ae65c c180dc44 c198e200 00000000 c180e5e4 f0b59fa8 f76ae550 c180dc44 c061bbc0 f76ae550 ffffffff Call Trace: [<c04ec500>] __next_cpu+0x12/0x21 [<c041e314>] find_busiest_group+0x177/0x462 [<c061bbc0>] schedule+0xbc/0xa55 [<c061d981>] rwsem_down_read_failed+0x128/0x143 [<c04389ef>] .text.lock.rwsem+0x35/0x3a [<c047c014>] sync_supers+0x2f/0xb8 [<c045df9c>] wb_kupdate+0x36/0x10f [<c045e431>] pdflush+0x0/0x1a3 [<c045e53c>] pdflush+0x10b/0x1a3 [<c045df66>] wb_kupdate+0x0/0x10f [<c0435f43>] kthread+0xc0/0xed [<c0435e83>] kthread+0x0/0xed [<c0405c53>] kernel_thread_helper+0x7/0x10
From the bugs already filed, it seems to happen to many (or any?) processes and some notice hangups and performance drops. But our system seems okay, probably because it has low traffic and is mostly idle. But I'll still reboot to the previous kernel version tonight.
dianne
On Jun 8, 2010, at 1:04 AM, Ireneusz Piasecki wrote:
W dniu 2010-06-08 09:54, Tsuyoshi Nagata pisze:
Hi (2010/06/08 5:12), Steve Brooks wrote:
Jun 7 19:45:21 sraid3 kernel: [<ffffffff800ec2a2>] inode_wait+0x0/0xd Jun 7 19:45:21 sraid3 kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jun 7 19:45:21 sraid3 kernel: [<ffffffff800a0aec>] wake_bit_function+0x0/0x23 Jun 7 19:45:21 sraid3 kernel: [<ffffffff8003dbbf>] ifind_fast+0x6e/0x83
This message was created at Linux/fs/inode.c:ifind_fast() The source code was bellows,
Linux/fs/inode.c: 912 static struct inode *ifind_fast(struct super_block *sb, 913 struct hlist_head *head, unsigned long ino) 914 { 915 struct inode *inode; 916 917 *LOCK* spin_lock(&inode_lock);<= This takes 918 inode = find_inode_fast(sb, head, ino);<= more 120s. 919 if (inode) { 920 __iget(inode); 921 *UNLOCK* spin_unlock(&inode_lock); 922 wait_on_inode(inode); 923 return inode; 924 } 925 spin_unlock(&inode_lock); 926 return NULL; 927 } 928
I guess your your file system has a trouble with i-node(file number) resources. CAUSES: Hard Disk trouble (bit error/raid trouble.) i-node trouble (overflow. etc.) Memory/CPU trouble(&inode_lock)
Buy Fresh Hard disks& rebuild them is convenience way. Or memtest86 can finds DIMM trouble.(or CPU, mother board) Or ext4 bug in 194.3.1 kernel, back to ext3!
Ok, then i will test all of my centos 5.5 32 nodes: cpu, ram, disks etc. This came with the kernel of Centos 5.5. Before there was'nt such errors/warrning. Redhat bugizilla: https://bugzilla.redhat.com/show_bug.cgi?id=573106
I.Piasecki
-tsuyoshi _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Oh crud, I did a top post! So sorry, won't happen again.
dianne