[CentOS] CentOS 5 - problem with kernel/process: task blocked for more than 120 seconds.

Wed Apr 11 09:53:02 UTC 2012
Rafał Radecki <radecki.rafal at gmail.com>

The host is:
Linux server 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:32:29 EDT 2011 i686
i686 i386 GNU/Linux
CentOS release 5.6 (Final)
dovecot 1.2.16

Best regards,
R.

W dniu 11 kwietnia 2012 11:48 użytkownik Rafał Radecki <
radecki.rafal at gmail.com> napisał:

> Hi All.
>
> I have a server which hanged two times because we could not connect to the
> box, it was not responding. In /var/log/messages I saw:
>
> Apr 11 10:13:29 server kernel: INFO: task imap:5855 blocked for more than
> 120 seconds.
> Apr 11 10:13:29 server kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 11 10:13:29 server kernel: imap          D 000124D0  1716  5855
> 14393          6320  5839 (NOTLB)
> Apr 11 10:13:29 server kernel: d1adeed0 00000082 632f304d 000124d0
> 000124cf 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: c750b000 632f379a 000124d0 0000074d
> 00000000 c750b10c c17f6d00 d2637580
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 d1adeecc
> c041ec40 00000000 d1adeed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:29 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:29 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:29 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:29 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:29 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:29 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:29 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:29 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:29 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:29 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:29 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> Apr 11 10:13:29 server kernel: =======================
> Apr 11 10:13:29 server kernel: INFO: task imap:6882 blocked for more than
> 120 seconds.
> Apr 11 10:13:29 server kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 11 10:13:29 server kernel: imap          D 000124D0  1772  6882
> 14393          6999  6053 (NOTLB)
> Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0
> 000124cc 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770
> 00000000 f52e910c c17f6d00 cbe44740
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc
> c041ec40 00000000 cf1d0ed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:31 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:31 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:31 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> Apr 11 10:13:31 server kernel: =======================
>
> I have found:
> http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/, so I think that the processes took all resources of the machine.
>
> What should be done to troubleshoot the situation?
> I have information that probably imap:5855 and imap:6882 caused the
> problem. How the output:
>
> Apr 11 10:13:29 server kernel: imap          D 000124D0  1772  6882
> 14393          6999  6053 (NOTLB)
> Apr 11 10:13:29 server kernel: cf1d0ed0 00000082 632f379a 000124d0
> 000124cc 0000000e 00000000 00000009
> Apr 11 10:13:29 server kernel: f52e9000 632f3f0a 000124d0 00000770
> 00000000 f52e910c c17f6d00 cbe44740
> Apr 11 10:13:29 server kernel: f75d3e68 00000000 00000000 cf1d0ecc
> c041ec40 00000000 cf1d0ed8 f75d3e50
> Apr 11 10:13:29 server kernel: Call Trace:
> Apr 11 10:13:29 server kernel: [<c041ec40>] __wake_up+0x2a/0x3d
> Apr 11 10:13:29 server kernel: [<c0436d23>] prepare_to_wait+0x24/0x46
> Apr 11 10:13:31 server kernel: [<f885f1e2>] log_wait_commit+0x80/0xc7 [jbd]
> Apr 11 10:13:31 server kernel: [<c0436bd7>]
> autoremove_wake_function+0x0/0x2d
> Apr 11 10:13:31 server kernel: [<f885a661>] journal_stop+0x195/0x1ba [jbd]
> Apr 11 10:13:31 server kernel: [<c0495caa>]
> __writeback_single_inode+0x197/0x2a3
> Apr 11 10:13:31 server kernel: [<c045ee18>] do_writepages+0x2b/0x32
> Apr 11 10:13:31 server kernel: [<c045a96c>]
> __filemap_fdatawrite_range+0x66/0x72
> Apr 11 10:13:31 server kernel: [<c0496346>] sync_inode+0x19/0x24
> Apr 11 10:13:31 server kernel: [<f8983009>] ext3_sync_file+0xb1/0xdc [ext3]
> Apr 11 10:13:31 server kernel: [<c047956c>] do_fsync+0x41/0x83
> Apr 11 10:13:31 server kernel: [<c04795cb>] __do_fsync+0x1d/0x2b
> Apr 11 10:13:31 server kernel: [<c0404f4b>] syscall_call+0x7/0xb
> should be read? Any howto?
>
> Best regards,
> Rafal Radecki.
>