Hi All.
I have a server which hanged two times because we could not connect to the box, it was not responding. In /var/log/messages I saw:
Apr 11 10:13:29 server kernel: INFO: task imap:5855 blocked for more than 120 seconds. Apr 11 10:13:29 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 11 10:13:29 server kernel: imap D 000124D0 1716 5855 14393 6320 5839 (NOTLB) Apr 11 10:13:29 server kernel: d1adeed0 00000082 632f304d 000124d0 000124cf 0000000e 00000000 00000009 Apr 11 10:13:29 server kernel: c750b000 632f379a 000124d0 0000074d 00000000 c750b10c c17f6d00 d2637580 Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 d1adeecc c041ec40 00000000 d1adeed8 f75d3e50 Apr 11 10:13:29 server kernel: Call Trace: Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46 Apr 11 10:13:29 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd] Apr 11 10:13:29 server kernel: [<c0436bd7>] autoremove_wake_function+0x0/0x2d Apr 11 10:13:29 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd] Apr 11 10:13:29 server kernel: [<c0495caa>] __writeback_single_inode+0x197/0x2a3 Apr 11 10:13:29 server kernel: [<c045ee18>] do_writepages+0x2b/0x32 Apr 11 10:13:29 server kernel: [<c045a96c>] __filemap_fdatawrite_range+0x66/0x72 Apr 11 10:13:29 server kernel: [<c0496346>] sync_inode+0x19/0x24 Apr 11 10:13:29 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3] Apr 11 10:13:29 server kernel: [<c047956c>] do_fsync+0x41/0x83 Apr 11 10:13:29 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b Apr 11 10:13:29 server kernel: [<c0404f4b>] syscall_call+0x7/0xb Apr 11 10:13:29 server kernel: ======================= Apr 11 10:13:29 server kernel: INFO: task imap:6882 blocked for more than 120 seconds. Apr 11 10:13:29 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882 14393 6999 6053 (NOTLB) Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc 0000000e 00000000 00000009 Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000 f52e910c c17f6d00 cbe44740 Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40 00000000 cf1d0ed8 f75d3e50 Apr 11 10:13:29 server kernel: Call Trace: Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46 Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd] Apr 11 10:13:31 server kernel: [<c0436bd7>] autoremove_wake_function+0x0/0x2d Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd] Apr 11 10:13:31 server kernel: [<c0495caa>] __writeback_single_inode+0x197/0x2a3 Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32 Apr 11 10:13:31 server kernel: [<c045a96c>] __filemap_fdatawrite_range+0x66/0x72 Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24 Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3] Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83 Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb Apr 11 10:13:31 server kernel: =======================
I have found: http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-t..., so I think that the processes took all resources of the machine.
What should be done to troubleshoot the situation? I have information that probably imap:5855 and imap:6882 caused the problem. How the output:
Apr 11 10:13:29 server kernel: imap D 000124D0 1772 6882 14393 6999 6053 (NOTLB) Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0 000124cc 0000000e 00000000 00000009 Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770 00000000 f52e910c c17f6d00 cbe44740 Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc c041ec40 00000000 cf1d0ed8 f75d3e50 Apr 11 10:13:29 server kernel: Call Trace: Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46 Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd] Apr 11 10:13:31 server kernel: [<c0436bd7>] autoremove_wake_function+0x0/0x2d Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd] Apr 11 10:13:31 server kernel: [<c0495caa>] __writeback_single_inode+0x197/0x2a3 Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32 Apr 11 10:13:31 server kernel: [<c045a96c>] __filemap_fdatawrite_range+0x66/0x72 Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24 Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3] Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83 Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb should be read? Any howto?
Best regards, Rafal Radecki.