On 03/27/2012 11:59 PM, Ken Smith wrote: > Hi All, I've been trying to trace the cause of a hang on a 5.6 i386 system. > > After running for almost a year, it hung last week, when I plugged in a > screen it was blank, machine was unresponsive to the keyboard, over the > network ssh and other daemons didn't respond but the thing has two > network cards and routing from one to the other was still working. So > the kernel was up and I suspected a dying disk. But smartctl -a revealed > nothing untoward. Didn't see anything significant in the log files at > the time. Logging had stopped when the machine hung. > > The machine rebooted normally and has run for almost a week and hung > again with the same symptoms. Again rebooted, nothing untoward in the > logs and smartctl still OK. But shortly after I left site this was > logged in /var/log/messages > > > Mar 27 16:52:04 cjcsrv kernel: INFO: task hald-addon-stor:2179 blocked > for more than 120 seconds. > Mar 27 16:52:04 cjcsrv kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 27 16:52:04 cjcsrv kernel: hald-addon-st D 00000197 2552 2179 > 2160 2170 (NOTLB) > Mar 27 16:52:04 cjcsrv kernel: d895bbbc 00000086 a3374d80 > 00000197 e89436f0 d895bbbc c084e894 0000000a > Mar 27 16:52:04 cjcsrv kernel: d891aaa0 a3468fc0 00000197 > 000f4240 00000000 d891abac c1506800 e43383c0 > Mar 27 16:52:04 cjcsrv kernel: 00000000 00000086 00000000 > e7ad2b80 c061f7ca 00000000 c1506844 d895bc0c > Mar 27 16:52:04 cjcsrv kernel: Call Trace: > Mar 27 16:52:04 cjcsrv kernel: [<e89436f0>] > cdrom_do_pc_continuation+0x0/0x2c [ide_cd] > Mar 27 16:52:04 cjcsrv kernel: [<c061f7ca>] schedule+0x9c6/0xa4f > Mar 27 16:52:04 cjcsrv kernel: [<c061f905>] wait_for_completion+0x6b/0x8f > Mar 27 16:52:04 cjcsrv kernel: [<c041f80f>] default_wake_function+0x0/0xc > Mar 27 16:52:04 cjcsrv kernel: [<c0575b1b>] ide_do_drive_cmd+0xd7/0xfa > Mar 27 16:52:04 cjcsrv kernel: [<e894071c>] > cdrom_queue_packet_command+0x35/0xbc [ide_cd] > Mar 27 16:52:05 cjcsrv kernel: [<c0488406>] poll_freewait+0x18/0x4c > Mar 27 16:52:05 cjcsrv kernel: [<c048874e>] do_sys_poll+0x314/0x339 > Mar 27 16:52:05 cjcsrv kernel: [<e8940c16>] > cdrom_check_status+0x52/0x5d [ide_cd] > Mar 27 16:52:05 cjcsrv kernel: [<c04e29ee>] blk_end_sync_rq+0x0/0x1d > Mar 27 16:52:05 cjcsrv kernel: [<e8940c3b>] > ide_cdrom_check_media_change_real+0x1a/0x34 [ide_cd] > Mar 27 16:52:05 cjcsrv kernel: [<e88da06e>] media_changed+0x40/0x6e [cdrom] > Mar 27 16:52:05 cjcsrv kernel: [<c047de20>] check_disk_change+0x13/0x3b > Mar 27 16:52:05 cjcsrv kernel: [<e88ddfe4>] cdrom_open+0x833/0x876 [cdrom] > Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46 > Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46 > Mar 27 16:52:05 cjcsrv kernel: [<c048c42f>] __d_lookup+0x98/0xdb > Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46 > Mar 27 16:52:05 cjcsrv kernel: [<c04c9c29>] inode_has_perm+0x54/0x5c > Mar 27 16:52:05 cjcsrv kernel: [<c04eef8a>] kobject_get+0xf/0x13 > Mar 27 16:52:05 cjcsrv kernel: [<c04e5e51>] get_disk+0x35/0x6e > Mar 27 16:52:05 cjcsrv kernel: [<c04e5e91>] exact_lock+0x7/0xd > Mar 27 16:52:05 cjcsrv kernel: [<c056291d>] kobj_lookup+0x10d/0x168 > Mar 27 16:52:05 cjcsrv kernel: [<e8941042>] idecd_open+0x7b/0xa8 [ide_cd] > Mar 27 16:52:05 cjcsrv kernel: [<c047e448>] do_open+0x89/0x2cc > Mar 27 16:52:05 cjcsrv kernel: [<c047e7f7>] blkdev_open+0x0/0x44 > Mar 27 16:52:05 cjcsrv kernel: [<c047e813>] blkdev_open+0x1c/0x44 > Mar 27 16:52:05 cjcsrv kernel: [<c0475937>] __dentry_open+0xc7/0x1ab > Mar 27 16:52:05 cjcsrv kernel: [<c0475a7f>] nameidata_to_filp+0x19/0x28 > Mar 27 16:52:05 cjcsrv kernel: [<c0475ab9>] do_filp_open+0x2b/0x31 > Mar 27 16:52:05 cjcsrv kernel: [<c0475afd>] do_sys_open+0x3e/0xae > Mar 27 16:52:05 cjcsrv kernel: [<c0475b9a>] sys_open+0x16/0x18 > Mar 27 16:52:05 cjcsrv kernel: [<c0404f4b>] syscall_call+0x7/0xb > Mar 27 16:52:05 cjcsrv kernel: ======================= > Mar 27 16:52:19 cjcsrv kernel: ide1: reset timed-out, status=0xd0 > > > ide1 has a CD attached. Not essential, the CD could be unplugged. ide0 > has the hard disk, hda, attached. Looking back through the logs there is > another of these recorded before the previous hang. Any clues as to what > this is telling me - other than something crashed. The process "hald-addon-storage" got stuck while trying to access the cd ("ide-cd"). It probably tried to poll the drive to check if there was a cd inserted. Unplugging the drive should do the trick although you could try to disable the polling by creating a file "/etc/hal/fdi/policy/99-custom.fdi" with the following content: <?xml version="1.0" encoding="UTF-8"?> <deviceinfo version="0.2"> <device> <match key="storage.removable" bool="true"> <remove key="info.addons" type="strlist">hald-addon-storage</remove> </match> </device> </deviceinfo> After doing so restart hald or reboot. hald should no longer poll the drive after this. Regards, Dennis