Hi All, I've been trying to trace the cause of a hang on a 5.6 i386 system.
After running for almost a year, it hung last week, when I plugged in a
screen it was blank, machine was unresponsive to the keyboard, over the
network ssh and other daemons didn't respond but the thing has two
network cards and routing from one to the other was still working. So
the kernel was up and I suspected a dying disk. But smartctl -a revealed
nothing untoward. Didn't see anything significant in the log files at
the time. Logging had stopped when the machine hung.
The machine rebooted normally and has run for almost a week and hung
again with the same symptoms. Again rebooted, nothing untoward in the
logs and smartctl still OK. But shortly after I left site this was
logged in /var/log/messages
Mar 27 16:52:04 cjcsrv kernel: INFO: task hald-addon-stor:2179 blocked
for more than 120 seconds.
Mar 27 16:52:04 cjcsrv kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 27 16:52:04 cjcsrv kernel: hald-addon-st D 00000197 2552 2179
2160 2170 (NOTLB)
Mar 27 16:52:04 cjcsrv kernel: d895bbbc 00000086 a3374d80
00000197 e89436f0 d895bbbc c084e894 0000000a
Mar 27 16:52:04 cjcsrv kernel: d891aaa0 a3468fc0 00000197
000f4240 00000000 d891abac c1506800 e43383c0
Mar 27 16:52:04 cjcsrv kernel: 00000000 00000086 00000000
e7ad2b80 c061f7ca 00000000 c1506844 d895bc0c
Mar 27 16:52:04 cjcsrv kernel: Call Trace:
Mar 27 16:52:04 cjcsrv kernel: [<e89436f0>]
cdrom_do_pc_continuation+0x0/0x2c [ide_cd]
Mar 27 16:52:04 cjcsrv kernel: [<c061f7ca>] schedule+0x9c6/0xa4f
Mar 27 16:52:04 cjcsrv kernel: [<c061f905>] wait_for_completion+0x6b/0x8f
Mar 27 16:52:04 cjcsrv kernel: [<c041f80f>] default_wake_function+0x0/0xc
Mar 27 16:52:04 cjcsrv kernel: [<c0575b1b>] ide_do_drive_cmd+0xd7/0xfa
Mar 27 16:52:04 cjcsrv kernel: [<e894071c>]
cdrom_queue_packet_command+0x35/0xbc [ide_cd]
Mar 27 16:52:05 cjcsrv kernel: [<c0488406>] poll_freewait+0x18/0x4c
Mar 27 16:52:05 cjcsrv kernel: [<c048874e>] do_sys_poll+0x314/0x339
Mar 27 16:52:05 cjcsrv kernel: [<e8940c16>]
cdrom_check_status+0x52/0x5d [ide_cd]
Mar 27 16:52:05 cjcsrv kernel: [<c04e29ee>] blk_end_sync_rq+0x0/0x1d
Mar 27 16:52:05 cjcsrv kernel: [<e8940c3b>]
ide_cdrom_check_media_change_real+0x1a/0x34 [ide_cd]
Mar 27 16:52:05 cjcsrv kernel: [<e88da06e>] media_changed+0x40/0x6e [cdrom]
Mar 27 16:52:05 cjcsrv kernel: [<c047de20>] check_disk_change+0x13/0x3b
Mar 27 16:52:05 cjcsrv kernel: [<e88ddfe4>] cdrom_open+0x833/0x876 [cdrom]
Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
Mar 27 16:52:05 cjcsrv kernel: [<c048c42f>] __d_lookup+0x98/0xdb
Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
Mar 27 16:52:05 cjcsrv kernel: [<c04c9c29>] inode_has_perm+0x54/0x5c
Mar 27 16:52:05 cjcsrv kernel: [<c04eef8a>] kobject_get+0xf/0x13
Mar 27 16:52:05 cjcsrv kernel: [<c04e5e51>] get_disk+0x35/0x6e
Mar 27 16:52:05 cjcsrv kernel: [<c04e5e91>] exact_lock+0x7/0xd
Mar 27 16:52:05 cjcsrv kernel: [<c056291d>] kobj_lookup+0x10d/0x168
Mar 27 16:52:05 cjcsrv kernel: [<e8941042>] idecd_open+0x7b/0xa8 [ide_cd]
Mar 27 16:52:05 cjcsrv kernel: [<c047e448>] do_open+0x89/0x2cc
Mar 27 16:52:05 cjcsrv kernel: [<c047e7f7>] blkdev_open+0x0/0x44
Mar 27 16:52:05 cjcsrv kernel: [<c047e813>] blkdev_open+0x1c/0x44
Mar 27 16:52:05 cjcsrv kernel: [<c0475937>] __dentry_open+0xc7/0x1ab
Mar 27 16:52:05 cjcsrv kernel: [<c0475a7f>] nameidata_to_filp+0x19/0x28
Mar 27 16:52:05 cjcsrv kernel: [<c0475ab9>] do_filp_open+0x2b/0x31
Mar 27 16:52:05 cjcsrv kernel: [<c0475afd>] do_sys_open+0x3e/0xae
Mar 27 16:52:05 cjcsrv kernel: [<c0475b9a>] sys_open+0x16/0x18
Mar 27 16:52:05 cjcsrv kernel: [<c0404f4b>] syscall_call+0x7/0xb
Mar 27 16:52:05 cjcsrv kernel: =======================
Mar 27 16:52:19 cjcsrv kernel: ide1: reset timed-out, status=0xd0
ide1 has a CD attached. Not essential, the CD could be unplugged. ide0
has the hard disk, hda, attached. Looking back through the logs there is
another of these recorded before the previous hang. Any clues as to what
this is telling me - other than something crashed.
Thanks
Ken
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.