-----Original Message----- From: Jason Pyeron Sent: Sunday, February 08, 2015 0:00
-----Original Message----- From: Jason Pyeron Sent: Saturday, February 07, 2015 22:54
NOTE: this is happening on Centos 6 x86_64, 2.6.32-504.3.3.el6.x86_64 not Centos 5
Dell PowerEdge 2970, Seagate SATA drive, non-raid.
I have this server which has been dying randomly, with no logs.
Here is a console picture.
Thanks to netconsole, I have the panic to post:
Feb 16 06:06:56 BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:88] Feb 16 06:06:56 Modules linked in: Feb 16 06:06:56 nf_nat Feb 16 06:06:56 mpt3sas Feb 16 06:06:56 mpt2sas Feb 16 06:06:56 raid_class Feb 16 06:06:56 mptctl Feb 16 06:06:56 ipmi_si Feb 16 06:06:56 ipmi_devintf Feb 16 06:06:56 netconsole Feb 16 06:06:56 configfs Feb 16 06:06:56 ebtable_nat Feb 16 06:06:56 ebtables Feb 16 06:06:56 nfs Feb 16 06:06:56 lockd Feb 16 06:06:56 fscache Feb 16 06:06:56 auth_rpcgss Feb 16 06:06:56 nfs_acl Feb 16 06:06:56 sunrpc Feb 16 06:06:56 bridge Feb 16 06:06:56 stp Feb 16 06:06:56 llc Feb 16 06:06:56 ipt_REJECT Feb 16 06:06:56 nf_conntrack_ipv4 Feb 16 06:06:56 nf_defrag_ipv4 Feb 16 06:06:56 iptable_filter Feb 16 06:06:56 ip_tables Feb 16 06:06:56 ip6t_REJECT Feb 16 06:06:56 nf_conntrack_ipv6 Feb 16 06:06:56 nf_defrag_ipv6 Feb 16 06:06:56 xt_state Feb 16 06:06:56 nf_conntrack Feb 16 06:06:56 ip6table_filter Feb 16 06:06:56 ip6_tables Feb 16 06:06:56 ipv6 Feb 16 06:06:56 dm_snapshot Feb 16 06:06:56 dm_bufio Feb 16 06:06:56 dm_zero Feb 16 06:06:56 vhost_net Feb 16 06:06:56 macvtap Feb 16 06:06:56 macvlan Feb 16 06:06:56 tun Feb 16 06:06:56 kvm_amd Feb 16 06:06:56 kvm Feb 16 06:06:56 ipmi_msghandler Feb 16 06:06:56 dcdbas Feb 16 06:06:56 serio_raw Feb 16 06:06:56 bnx2 Feb 16 06:06:56 k10temp Feb 16 06:06:56 amd64_edac_mod Feb 16 06:06:56 edac_core Feb 16 06:06:56 edac_mce_amd Feb 16 06:06:56 sg Feb 16 06:06:56 i2c_piix4 Feb 16 06:06:56 shpchp Feb 16 06:06:56 ext4 Feb 16 06:06:56 jbd2 Feb 16 06:06:56 mbcache Feb 16 06:06:56 sd_mod Feb 16 06:06:56 crc_t10dif Feb 16 06:06:56 mptsas Feb 16 06:06:56 mptscsih Feb 16 06:06:56 mptbase Feb 16 06:06:56 scsi_transport_sas Feb 16 06:06:56 ata_generic Feb 16 06:06:56 pata_acpi Feb 16 06:06:56 sata_svw Feb 16 06:06:56 radeon Feb 16 06:06:56 ttm Feb 16 06:06:56 drm_kms_helper Feb 16 06:06:56 drm Feb 16 06:06:56 i2c_algo_bit Feb 16 06:06:56 i2c_core Feb 16 06:06:56 dm_mirror Feb 16 06:06:56 dm_region_hash Feb 16 06:06:56 dm_log Feb 16 06:06:56 dm_mod Feb 16 06:06:56 [last unloaded: dell_rbu] Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 CPU 0 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Modules linked in: Feb 16 06:06:56 nf_nat Feb 16 06:06:56 mpt3sas Feb 16 06:06:56 mpt2sas Feb 16 06:06:56 raid_class Feb 16 06:06:56 mptctl Feb 16 06:06:56 ipmi_si Feb 16 06:06:56 ipmi_devintf Feb 16 06:06:56 netconsole Feb 16 06:06:56 configfs Feb 16 06:06:56 ebtable_nat Feb 16 06:06:56 ebtables Feb 16 06:06:56 nfs Feb 16 06:06:56 lockd Feb 16 06:06:56 fscache Feb 16 06:06:56 auth_rpcgss Feb 16 06:06:56 nfs_acl Feb 16 06:06:56 sunrpc Feb 16 06:06:56 bridge Feb 16 06:06:56 stp Feb 16 06:06:56 llc Feb 16 06:06:56 ipt_REJECT Feb 16 06:06:56 nf_conntrack_ipv4 Feb 16 06:06:56 nf_defrag_ipv4 Feb 16 06:06:56 iptable_filter Feb 16 06:06:56 ip_tables Feb 16 06:06:56 ip6t_REJECT Feb 16 06:06:56 nf_conntrack_ipv6 Feb 16 06:06:56 nf_defrag_ipv6 Feb 16 06:06:56 xt_state Feb 16 06:06:56 nf_conntrack Feb 16 06:06:56 ip6table_filter Feb 16 06:06:56 ip6_tables Feb 16 06:06:56 ipv6 Feb 16 06:06:56 dm_snapshot Feb 16 06:06:56 dm_bufio Feb 16 06:06:56 dm_zero Feb 16 06:06:56 vhost_net Feb 16 06:06:56 macvtap Feb 16 06:06:56 macvlan Feb 16 06:06:56 tun Feb 16 06:06:56 kvm_amd Feb 16 06:06:56 kvm Feb 16 06:06:56 ipmi_msghandler Feb 16 06:06:56 dcdbas Feb 16 06:06:56 serio_raw Feb 16 06:06:56 bnx2 Feb 16 06:06:56 k10temp Feb 16 06:06:56 amd64_edac_mod Feb 16 06:06:56 edac_core Feb 16 06:06:56 edac_mce_amd Feb 16 06:06:56 sg Feb 16 06:06:56 i2c_piix4 Feb 16 06:06:56 shpchp Feb 16 06:06:56 ext4 Feb 16 06:06:56 jbd2 Feb 16 06:06:56 mbcache Feb 16 06:06:56 sd_mod Feb 16 06:06:56 crc_t10dif Feb 16 06:06:56 mptsas Feb 16 06:06:56 mptscsih Feb 16 06:06:56 mptbase Feb 16 06:06:56 scsi_transport_sas Feb 16 06:06:56 ata_generic Feb 16 06:06:56 pata_acpi Feb 16 06:06:56 sata_svw Feb 16 06:06:56 radeon Feb 16 06:06:56 ttm Feb 16 06:06:56 drm_kms_helper Feb 16 06:06:56 drm Feb 16 06:06:56 i2c_algo_bit Feb 16 06:06:56 i2c_core Feb 16 06:06:56 dm_mirror Feb 16 06:06:56 dm_region_hash Feb 16 06:06:56 dm_log Feb 16 06:06:56 dm_mod Feb 16 06:06:56 [last unloaded: dell_rbu] Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Pid: 88, comm: ksmd Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:06:56 Dell Inc. PowerEdge 2970 Feb 16 06:06:56 /0JKN8W Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 RIP: 0010:[<ffffffff812a1411>] Feb 16 06:06:56 [<ffffffff812a1411>] __bitmap_empty+0x41/0x90 Feb 16 06:06:56 RSP: 0018:ffff88021831dcb0 EFLAGS: 00000202 Feb 16 06:06:56 RAX: 0000000000000000 RBX: ffff88021831dcb0 RCX: 0000000000000010 Feb 16 06:06:56 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81e2f198 Feb 16 06:06:56 RBP: ffffffff8100bb8e R08: 0000000000000000 R09: 0000000000000000 Feb 16 06:06:56 R10: ffffea0006679c20 R11: 0000000000000000 R12: 0000000000000000 Feb 16 06:06:56 R13: ffff8801c1b8f650 R14: 0000000198152467 R15: ffffffffa03af44a Feb 16 06:06:56 FS: 00007fc4756b09a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 Feb 16 06:06:56 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 16 06:06:56 CR2: 000000c641faeff0 CR3: 0000000001a85000 CR4: 00000000000007f0 Feb 16 06:06:56 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 16 06:06:56 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 16 06:06:56 Process ksmd (pid: 88, threadinfo ffff88021831c000, task ffff880218310040) Feb 16 06:06:56 Stack: Feb 16 06:06:56 ffff88021831dd00 Feb 16 06:06:56 ffffffff81052268 Feb 16 06:06:56 00007f30249b8000 Feb 16 06:06:56 ffffffff81e2f180 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 d> Feb 16 06:06:56 8000000198152025 Feb 16 06:06:56 ffff880219ade700 Feb 16 06:06:56 00007f30249b8000 Feb 16 06:06:56 ffff880219ade9c8 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 d> Feb 16 06:06:56 ffffea0006679c20 Feb 16 06:06:56 ffff880219e57ed0 Feb 16 06:06:56 ffff88021831dd30 Feb 16 06:06:56 ffffffff810522e6 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Call Trace: Feb 16 06:06:56 [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130 Feb 16 06:06:56 [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90 Feb 16 06:06:56 [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0 Feb 16 06:06:56 [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660 Feb 16 06:06:56 [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120 Feb 16 06:06:56 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Feb 16 06:06:56 [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120 Feb 16 06:06:56 [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Feb 16 06:06:56 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Feb 16 06:06:56 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Feb 16 06:06:56 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Feb 16 06:06:56 Code: Feb 16 06:06:56 c0 Feb 16 06:06:56 7e Feb 16 06:06:56 24 Feb 16 06:06:56 48 Feb 16 06:06:56 83 Feb 16 06:06:56 3f Feb 16 06:06:56 00 Feb 16 06:06:56 48 Feb 16 06:06:56 89 Feb 16 06:06:56 f8 Feb 16 06:06:56 74 Feb 16 06:06:56 13 Feb 16 06:06:56 eb Feb 16 06:06:56 5c Feb 16 06:06:56 0f Feb 16 06:06:56 1f Feb 16 06:06:56 40 Feb 16 06:06:56 00 Feb 16 06:06:56 48 Feb 16 06:06:56 8b Feb 16 06:06:56 48 Feb 16 06:06:56 08 Feb 16 06:06:56 48 Feb 16 06:06:56 83 Feb 16 06:06:56 c0 Feb 16 06:06:56 08 Feb 16 06:06:56 48 Feb 16 06:06:56 85 Feb 16 06:06:56 c9 Feb 16 06:06:56 75 Feb 16 06:06:56 4b Feb 16 06:06:56 83 Feb 16 06:06:56 c2 Feb 16 06:06:56 01 Feb 16 06:06:56 41 Feb 16 06:06:56 39 Feb 16 06:06:56 d0 Feb 16 06:06:56 7f Feb 16 06:06:56 eb Feb 16 06:06:56 40 Feb 16 06:06:56 f6 Feb 16 06:06:56 c6 Feb 16 06:06:56 3f Feb 16 06:06:56 b8> Feb 16 06:06:56 01 Feb 16 06:06:56 00 Feb 16 06:06:56 last message repeated 2 times Feb 16 06:06:56 75 Feb 16 06:06:56 08 Feb 16 06:06:56 c9 Feb 16 06:06:56 c3 Feb 16 06:06:56 66 Feb 16 06:06:56 0f Feb 16 06:06:56 1f Feb 16 06:06:56 44 Feb 16 06:06:56 00 Feb 16 06:06:56 00 Feb 16 06:06:56 89 Feb 16 06:06:56 f0 Feb 16 06:06:56 48 Feb 16 06:06:56 63 Feb 16 06:06:56 d2 Feb 16 06:06:56 c1 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Call Trace: Feb 16 06:06:56 [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130 Feb 16 06:06:56 [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90 Feb 16 06:06:56 [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0 Feb 16 06:06:56 [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660 Feb 16 06:06:56 [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120 Feb 16 06:06:56 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Feb 16 06:06:56 [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120 Feb 16 06:06:56 [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Feb 16 06:06:56 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Feb 16 06:06:56 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Feb 16 06:06:56 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Feb 16 06:07:01 Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:07:01 Call Trace: Feb 16 06:07:01 <NMI> Feb 16 06:07:01 [<ffffffff81530bdc>] ? panic+0xa7/0x16f Feb 16 06:07:01 [<ffffffff81014959>] ? sched_clock+0x9/0x10 Feb 16 06:07:01 [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0 Feb 16 06:07:01 [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240 Feb 16 06:07:01 [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110 Feb 16 06:07:01 [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20 Feb 16 06:07:01 [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250 Feb 16 06:07:01 [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0 Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff810a4ede>] ? notify_die+0x2e/0x30 Feb 16 06:07:01 [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340 Feb 16 06:07:01 [<ffffffff81534f20>] ? nmi+0x20/0x30 Feb 16 06:07:01 [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30 Feb 16 06:07:01 <<EOE>> Feb 16 06:07:01 [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00 Feb 16 06:07:01 [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm] Feb 16 06:07:01 [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300 Feb 16 06:07:01 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 Feb 16 06:07:01 [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0 Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20 Feb 16 06:07:01 [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0 Feb 16 06:07:01 [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0 Feb 16 06:07:01 [<ffffffff81534c95>] ? page_fault+0x25/0x30 Feb 16 06:07:01 [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40 Feb 16 06:07:01 [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm] Feb 16 06:07:01 [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm] Feb 16 06:07:01 [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm] Feb 16 06:07:01 [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm] Feb 16 06:07:01 [<ffffffff810b2b73>] ? futex_wake+0x93/0x150 Feb 16 06:07:01 [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm] Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 Feb 16 06:07:01 [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0 Feb 16 06:07:01 [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580 Feb 16 06:07:01 [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0 Feb 16 06:07:01 [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290 Feb 16 06:07:01 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Feb 16 06:07:01 drm_kms_helper: panic occurred, switching back to text console Feb 16 06:07:01 BUG: scheduling while atomic: qemu-kvm/1950/0x14010000 Feb 16 06:07:01 Modules linked in: Feb 16 06:07:01 nf_nat Feb 16 06:07:01 mpt3sas Feb 16 06:07:01 mpt2sas Feb 16 06:07:01 raid_class Feb 16 06:07:01 mptctl Feb 16 06:07:01 ipmi_si Feb 16 06:07:01 ipmi_devintf Feb 16 06:07:01 netconsole Feb 16 06:07:01 configfs Feb 16 06:07:01 ebtable_nat Feb 16 06:07:01 ebtables Feb 16 06:07:01 nfs Feb 16 06:07:01 lockd Feb 16 06:07:01 fscache Feb 16 06:07:01 auth_rpcgss Feb 16 06:07:01 nfs_acl Feb 16 06:07:01 sunrpc Feb 16 06:07:01 bridge Feb 16 06:07:01 stp Feb 16 06:07:01 llc Feb 16 06:07:01 ipt_REJECT Feb 16 06:07:01 nf_conntrack_ipv4 Feb 16 06:07:01 nf_defrag_ipv4 Feb 16 06:07:01 iptable_filter Feb 16 06:07:01 ip_tables Feb 16 06:07:01 ip6t_REJECT Feb 16 06:07:01 nf_conntrack_ipv6 Feb 16 06:07:01 nf_defrag_ipv6 Feb 16 06:07:01 xt_state Feb 16 06:07:01 nf_conntrack Feb 16 06:07:01 ip6table_filter Feb 16 06:07:01 ip6_tables Feb 16 06:07:01 ipv6 Feb 16 06:07:01 dm_snapshot Feb 16 06:07:01 dm_bufio Feb 16 06:07:01 dm_zero Feb 16 06:07:01 vhost_net Feb 16 06:07:01 macvtap Feb 16 06:07:01 macvlan Feb 16 06:07:01 tun Feb 16 06:07:01 kvm_amd Feb 16 06:07:01 kvm Feb 16 06:07:01 ipmi_msghandler Feb 16 06:07:01 dcdbas Feb 16 06:07:01 serio_raw Feb 16 06:07:01 bnx2 Feb 16 06:07:01 k10temp Feb 16 06:07:01 amd64_edac_mod Feb 16 06:07:01 edac_core Feb 16 06:07:01 edac_mce_amd Feb 16 06:07:01 sg Feb 16 06:07:01 i2c_piix4 Feb 16 06:07:01 shpchp Feb 16 06:07:01 ext4 Feb 16 06:07:01 jbd2 Feb 16 06:07:01 mbcache Feb 16 06:07:01 sd_mod Feb 16 06:07:01 crc_t10dif Feb 16 06:07:01 mptsas Feb 16 06:07:01 mptscsih Feb 16 06:07:01 mptbase Feb 16 06:07:01 scsi_transport_sas Feb 16 06:07:01 ata_generic Feb 16 06:07:01 pata_acpi Feb 16 06:07:01 sata_svw Feb 16 06:07:01 radeon Feb 16 06:07:01 ttm Feb 16 06:07:01 drm_kms_helper Feb 16 06:07:01 drm Feb 16 06:07:01 i2c_algo_bit Feb 16 06:07:01 i2c_core Feb 16 06:07:01 dm_mirror Feb 16 06:07:01 dm_region_hash Feb 16 06:07:01 dm_log Feb 16 06:07:01 dm_mod Feb 16 06:07:01 [last unloaded: dell_rbu] Feb 16 06:07:01 192.168.13.230 Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:07:01 Call Trace: Feb 16 06:07:01 <NMI> Feb 16 06:07:01 [<ffffffff81060bb6>] ? __schedule_bug+0x66/0x70 Feb 16 06:07:01 [<ffffffff8153193c>] ? thread_return+0x6ac/0x7d0 Feb 16 06:07:01 [<ffffffffa002e35d>] ? write_msg+0xfd/0x110 [netconsole] Feb 16 06:07:01 [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper] Feb 16 06:07:01 [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40 Feb 16 06:07:01 [<ffffffff81531d30>] ? _cond_resched+0x30/0x40 Feb 16 06:07:01 [<ffffffff81174e18>] ? __kmalloc+0x138/0x230 Feb 16 06:07:01 [<ffffffff810ba332>] ? __module_text_address+0x12/0x60 Feb 16 06:07:01 [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper] Feb 16 06:07:01 [<ffffffffa013df27>] ? r100_mm_wreg+0x67/0x90 [radeon] Feb 16 06:07:01 [<ffffffffa01332d2>] ? radeon_crtc_cursor_set+0x92/0x6e0 [radeon] Feb 16 06:07:01 [<ffffffffa005e40c>] ? drm_mode_set_config_internal+0x5c/0xe0 [drm] Feb 16 06:07:01 [<ffffffffa00b0653>] ? drm_fb_helper_restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper] Feb 16 06:07:01 [<ffffffffa00b0788>] ? drm_fb_helper_panic+0x78/0xa0 [drm_kms_helper] Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff81530c07>] ? panic+0xd2/0x16f Feb 16 06:07:01 [<ffffffff81014959>] ? sched_clock+0x9/0x10 Feb 16 06:07:01 [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0 Feb 16 06:07:01 [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240 Feb 16 06:07:01 [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110 Feb 16 06:07:01 [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20 Feb 16 06:07:01 [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250 Feb 16 06:07:01 [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0 Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff810a4ede>] ? notify_die+0x2e/0x30 Feb 16 06:07:01 [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340 Feb 16 06:07:01 [<ffffffff81534f20>] ? nmi+0x20/0x30 Feb 16 06:07:01 [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30 Feb 16 06:07:01 <<EOE>> Feb 16 06:07:01 [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00 Feb 16 06:07:01 [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm] Feb 16 06:07:01 [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300 Feb 16 06:07:01 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 Feb 16 06:07:01 [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0 Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20 Feb 16 06:07:01 [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0 Feb 16 06:07:01 [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0 Feb 16 06:07:01 [<ffffffff81534c95>] ? page_fault+0x25/0x30 Feb 16 06:07:01 [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40 Feb 16 06:07:01 [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm] Feb 16 06:07:01 [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm] Feb 16 06:07:01 [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm] Feb 16 06:07:01 [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm] Feb 16 06:07:01 [<ffffffff810b2b73>] ? futex_wake+0x93/0x150 Feb 16 06:07:01 [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm] Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 Feb 16 06:07:01 [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0 Feb 16 06:07:01 [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580 Feb 16 06:07:01 [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0 Feb 16 06:07:01 [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290 Feb 16 06:07:01 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Feb 16 06:07:01 Clocksource tsc unstable (delta = -77309385171 ns). Enable clocksource failover by adding clocksource_failover kernel parameter.
I had a tail -f over ssh for a week, when this just happened.
Feb 8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880057a0a080) Feb 8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 1a 17 a1 6f 00 00 01 00 Feb 8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING
- Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000
Feb 8 00:10:51 thirteen-230 kernel: mptbase: ioc0: Initiating recovery Feb 8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880057a0a080) Write failed: Connection reset by peer
After reading https://access.redhat.com/solutions/108273, I am increasing the logging (shown below) but I am not confident about this wait and see approach.
sysctl -w dev.scsi.logging_level=98367
I am also going to check smartctl output once I get onsite to power cycle the system.
# smartctl -a /dev/sda smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-504.3.3.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors) Device Model: ST1500DM003-9YN16G Serial Number: W24153R0 LU WWN Device Id: 5 000c50 05d03cc1d Firmware Version: CC82 User Capacity: 1,500,301,910,016 bytes [1.50 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sat Feb 7 23:41:00 2015 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 194) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 181943016 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 17 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 39599363 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 821 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 17 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 062 045 Old_age Always - 33 (Min/Max 30/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 16 193 Load_Cycle_Count 0x0032 098 098 000 Old_age Always - 4551 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 21 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 267112606073648 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2764453802303 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3442873711291
SMART Error Log Version: 1 No Errors Logged
SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Other posts I have read, but I can not act on yet:
http://unix.stackexchange.com/questions/34173/mptscsih-ioc0-ta
sk-abort-success-rv-2002-causes-30-seconds-freezing
- https://bugzilla.kernel.org/show_bug.cgi?id=18652
- https://bugzilla.redhat.com/show_bug.cgi?id=483424
- https://bugzilla.kernel.org/show_bug.cgi?id=42765
- http://sourceforge.net/p/smartmontools/mailman/message/23849184/
- http://kb.softescu.ro/category/hardware/dell/
-Jason
-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.