[CentOS] CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Wed Apr 11 09:48:15 UTC 2012
Rafał Radecki <radecki.rafal at gmail.com>

Hi All.

I have a server which hanged two times because we could not connect to the
box, it was not responding. In /var/log/messages I saw:

Apr 11 10:13:29 server kernel: INFO: task imap:5855 blocked for more than
120 seconds.
Apr 11 10:13:29 server kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 10:13:29 server kernel: imap          D 000124D0  1716  5855
14393          6320  5839 (NOTLB)
Apr 11 10:13:29 server kernel: d1adeed0 00000082 632f304d 000124d0 000124cf
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: c750b000 632f379a 000124d0 0000074d 00000000
c750b10c c17f6d00 d2637580
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 d1adeecc c041ec40
00000000 d1adeed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:29 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:29 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:29 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:29 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:29 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:29 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:29 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:29 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:29 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:29 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:29 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
Apr 11 10:13:29 server kernel: =======================
Apr 11 10:13:29 server kernel: INFO: task imap:6882 blocked for more than
120 seconds.
Apr 11 10:13:29 server kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 10:13:29 server kernel: imap          D 000124D0  1772  6882
14393          6999  6053 (NOTLB)
Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000
f52e910c c17f6d00 cbe44740
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40
00000000 cf1d0ed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:31 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:31 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:31 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
Apr 11 10:13:31 server kernel: =======================

I have found:
http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/,
so I think that the processes took all resources of the machine.

What should be done to troubleshoot the situation?
I have information that probably imap:5855 and imap:6882 caused the
problem. How the output:

Apr 11 10:13:29 server kernel: imap          D 000124D0  1772  6882
14393          6999  6053 (NOTLB)
Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc
0000000e 00000000 00000009
Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000
f52e910c c17f6d00 cbe44740
Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40
00000000 cf1d0ed8 f75d3e50
Apr 11 10:13:29 server kernel: Call Trace:
Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
Apr 11 10:13:31 server kernel: [<c0436bd7>]
autoremove_wake_function+0x0/0x2d
Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
Apr 11 10:13:31 server kernel: [<c0495caa>]
__writeback_single_inode+0x197/0x2a3
Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
Apr 11 10:13:31 server kernel: [<c045a96c>]
__filemap_fdatawrite_range+0x66/0x72
Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
should be read? Any howto?

Best regards,
Rafal Radecki.