I am seeing this in dmesg on Dell Poweredge 860
INFO: task tail:17872 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tail D ffff81021ac87040 0 17872 31199 (NOTLB) ffff810124fc7bd8 0000000000000082 ffff81021c8ad200 ffff8101714f4d80 0000000000000000 0000000000000007 ffff81016471b820 ffff81021ac87040 0000e805f8d72d5b 0000000000002d53 ffff81016471ba08 0000000388055b77 Call Trace: [<ffffffff8006ecd9>] do_gettimeofday+0x40/0x90 [<ffffffff8005a412>] getnstimeofday+0x10/0x29 [<ffffffff80028bb2>] sync_page+0x0/0x43 [<ffffffff800637de>] io_schedule+0x3f/0x67 [<ffffffff80028bf0>] sync_page+0x3e/0x43 [<ffffffff80063922>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003f980>] __lock_page+0x5e/0x64 [<ffffffff800a34d5>] wake_bit_function+0x0/0x23 [<ffffffff8000c425>] do_generic_mapping_read+0x1df/0x359 [<ffffffff8000d251>] file_read_actor+0x0/0x159 [<ffffffff8000c6eb>] __generic_file_aio_read+0x14c/0x198 [<ffffffff80016eb7>] generic_file_aio_read+0x36/0x3b [<ffffffff8000cf39>] do_sync_read+0xc7/0x104 [<ffffffff800a34a7>] autoremove_wake_function+0x0/0x2e [<ffffffff80063002>] thread_return+0x62/0xfe [<ffffffff8000b721>] vfs_read+0xcb/0x171 [<ffffffff80011d15>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0
What should be done?
jerry
On Mon, 3 Nov 2014 08:33:27 -0500 Jerry Geis geisj@pagestation.com wrote:
I am seeing this in dmesg on Dell Poweredge 860
INFO: task tail:17872 blocked for more than 120 seconds.
It's a generic "something might be wrong" alert issued since (in this case) a tail process blocked (on I/O) for more than 120s. It can be I/O overload, flaky remote filesystems, badly behaving drivers, too slow (I/O) hardware...
/Peter
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tail D ffff81021ac87040 0 17872 31199 (NOTLB) ffff810124fc7bd8 0000000000000082 ffff81021c8ad200 ffff8101714f4d80 0000000000000000 0000000000000007 ffff81016471b820 ffff81021ac87040 0000e805f8d72d5b 0000000000002d53 ffff81016471ba08 0000000388055b77 Call Trace: [<ffffffff8006ecd9>] do_gettimeofday+0x40/0x90 [<ffffffff8005a412>] getnstimeofday+0x10/0x29 [<ffffffff80028bb2>] sync_page+0x0/0x43 [<ffffffff800637de>] io_schedule+0x3f/0x67 [<ffffffff80028bf0>] sync_page+0x3e/0x43 [<ffffffff80063922>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003f980>] __lock_page+0x5e/0x64 [<ffffffff800a34d5>] wake_bit_function+0x0/0x23 [<ffffffff8000c425>] do_generic_mapping_read+0x1df/0x359 [<ffffffff8000d251>] file_read_actor+0x0/0x159 [<ffffffff8000c6eb>] __generic_file_aio_read+0x14c/0x198 [<ffffffff80016eb7>] generic_file_aio_read+0x36/0x3b [<ffffffff8000cf39>] do_sync_read+0xc7/0x104 [<ffffffff800a34a7>] autoremove_wake_function+0x0/0x2e [<ffffffff80063002>] thread_return+0x62/0xfe [<ffffffff8000b721>] vfs_read+0xcb/0x171 [<ffffffff80011d15>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0
What should be done?
jerry _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, Nov 03, 2014 at 03:03:16PM +0100, Peter Kjellström wrote:
On Mon, 3 Nov 2014 08:33:27 -0500 Jerry Geis geisj@pagestation.com wrote:
I am seeing this in dmesg on Dell Poweredge 860
INFO: task tail:17872 blocked for more than 120 seconds.
It's a generic "something might be wrong" alert issued since (in this case) a tail process blocked (on I/O) for more than 120s. It can be I/O overload, flaky remote filesystems, badly behaving drivers, too slow (I/O) hardware...
In my recent experience (~ 5 years) with a big fleet of servers (800 right now), this always indicates a hardware problem with the cpu/mobo/something-like-that.
Then again, I haven't used a remote filesystem in the past 5 years! :-)
-- greg