[CentOS] Odd hang on a 5.6 system

Tue Mar 27 22:08:16 UTC 2012
Dennis Jacobfeuerborn <dennisml at conversis.de>

On 03/27/2012 11:59 PM, Ken Smith wrote:
> Hi All, I've been trying to trace the cause of a hang on a 5.6 i386 system.
> 
> After running for almost a year, it hung last week, when I plugged in a 
> screen it was blank, machine was unresponsive to the keyboard, over the 
> network ssh and other daemons didn't respond but the thing has two 
> network cards and routing from one to the other was still working. So 
> the kernel was up and I suspected a dying disk. But smartctl -a revealed 
> nothing untoward. Didn't see anything significant in the log files at 
> the time. Logging had stopped when the machine hung.
> 
> The machine rebooted normally and has run for almost a week and hung 
> again with the same symptoms. Again rebooted, nothing untoward in the 
> logs and smartctl still OK. But shortly after I left site this was 
> logged in /var/log/messages
> 
> 
> Mar 27 16:52:04 cjcsrv kernel: INFO: task hald-addon-stor:2179 blocked 
> for more than 120 seconds.
> Mar 27 16:52:04 cjcsrv kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 27 16:52:04 cjcsrv kernel: hald-addon-st D 00000197  2552  2179   
> 2160                2170 (NOTLB)
> Mar 27 16:52:04 cjcsrv kernel:        d895bbbc 00000086 a3374d80 
> 00000197 e89436f0 d895bbbc c084e894 0000000a
> Mar 27 16:52:04 cjcsrv kernel:        d891aaa0 a3468fc0 00000197 
> 000f4240 00000000 d891abac c1506800 e43383c0
> Mar 27 16:52:04 cjcsrv kernel:        00000000 00000086 00000000 
> e7ad2b80 c061f7ca 00000000 c1506844 d895bc0c
> Mar 27 16:52:04 cjcsrv kernel: Call Trace:
> Mar 27 16:52:04 cjcsrv kernel:  [<e89436f0>] 
> cdrom_do_pc_continuation+0x0/0x2c [ide_cd]
> Mar 27 16:52:04 cjcsrv kernel:  [<c061f7ca>] schedule+0x9c6/0xa4f
> Mar 27 16:52:04 cjcsrv kernel:  [<c061f905>] wait_for_completion+0x6b/0x8f
> Mar 27 16:52:04 cjcsrv kernel:  [<c041f80f>] default_wake_function+0x0/0xc
> Mar 27 16:52:04 cjcsrv kernel:  [<c0575b1b>] ide_do_drive_cmd+0xd7/0xfa
> Mar 27 16:52:04 cjcsrv kernel:  [<e894071c>] 
> cdrom_queue_packet_command+0x35/0xbc [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel:  [<c0488406>] poll_freewait+0x18/0x4c
> Mar 27 16:52:05 cjcsrv kernel:  [<c048874e>] do_sys_poll+0x314/0x339
> Mar 27 16:52:05 cjcsrv kernel:  [<e8940c16>] 
> cdrom_check_status+0x52/0x5d [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel:  [<c04e29ee>] blk_end_sync_rq+0x0/0x1d
> Mar 27 16:52:05 cjcsrv kernel:  [<e8940c3b>] 
> ide_cdrom_check_media_change_real+0x1a/0x34 [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel:  [<e88da06e>] media_changed+0x40/0x6e [cdrom]
> Mar 27 16:52:05 cjcsrv kernel:  [<c047de20>] check_disk_change+0x13/0x3b
> Mar 27 16:52:05 cjcsrv kernel:  [<e88ddfe4>] cdrom_open+0x833/0x876 [cdrom]
> Mar 27 16:52:05 cjcsrv kernel:  [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel:  [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel:  [<c048c42f>] __d_lookup+0x98/0xdb
> Mar 27 16:52:05 cjcsrv kernel:  [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel:  [<c04c9c29>] inode_has_perm+0x54/0x5c
> Mar 27 16:52:05 cjcsrv kernel:  [<c04eef8a>] kobject_get+0xf/0x13
> Mar 27 16:52:05 cjcsrv kernel:  [<c04e5e51>] get_disk+0x35/0x6e
> Mar 27 16:52:05 cjcsrv kernel:  [<c04e5e91>] exact_lock+0x7/0xd
> Mar 27 16:52:05 cjcsrv kernel:  [<c056291d>] kobj_lookup+0x10d/0x168
> Mar 27 16:52:05 cjcsrv kernel:  [<e8941042>] idecd_open+0x7b/0xa8 [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel:  [<c047e448>] do_open+0x89/0x2cc
> Mar 27 16:52:05 cjcsrv kernel:  [<c047e7f7>] blkdev_open+0x0/0x44
> Mar 27 16:52:05 cjcsrv kernel:  [<c047e813>] blkdev_open+0x1c/0x44
> Mar 27 16:52:05 cjcsrv kernel:  [<c0475937>] __dentry_open+0xc7/0x1ab
> Mar 27 16:52:05 cjcsrv kernel:  [<c0475a7f>] nameidata_to_filp+0x19/0x28
> Mar 27 16:52:05 cjcsrv kernel:  [<c0475ab9>] do_filp_open+0x2b/0x31
> Mar 27 16:52:05 cjcsrv kernel:  [<c0475afd>] do_sys_open+0x3e/0xae
> Mar 27 16:52:05 cjcsrv kernel:  [<c0475b9a>] sys_open+0x16/0x18
> Mar 27 16:52:05 cjcsrv kernel:  [<c0404f4b>] syscall_call+0x7/0xb
> Mar 27 16:52:05 cjcsrv kernel:  =======================
> Mar 27 16:52:19 cjcsrv kernel: ide1: reset timed-out, status=0xd0
> 
> 
> ide1 has a CD attached. Not essential, the CD could be unplugged. ide0 
> has the hard disk, hda, attached. Looking back through the logs there is 
> another of these recorded before the previous hang. Any clues as to what 
> this is telling me - other than something crashed.

The process "hald-addon-storage" got stuck while trying to access the cd
("ide-cd"). It probably tried to poll the drive to check if there was a cd
inserted. Unplugging the drive should do the trick although you could try
to disable the polling by creating a file
"/etc/hal/fdi/policy/99-custom.fdi" with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<deviceinfo version="0.2">
<device>
<match key="storage.removable" bool="true">
<remove key="info.addons" type="strlist">hald-addon-storage</remove>
</match>
</device>
</deviceinfo>

After doing so restart hald or reboot. hald should no longer poll the drive
after this.

Regards,
  Dennis