[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!

Mon Feb 16 17:02:46 UTC 2015
Jason Pyeron <jpyeron at pdinc.us>

> -----Original Message-----
> From: Jason Pyeron
> Sent: Sunday, February 08, 2015 0:00
> 
> > -----Original Message-----
> > From: Jason Pyeron
> > Sent: Saturday, February 07, 2015 22:54
> > 
> > NOTE: this is happening on Centos 6 x86_64, 
> > 2.6.32-504.3.3.el6.x86_64 not Centos 5
> > 
> > Dell PowerEdge 2970, Seagate SATA drive, non-raid.
> > 
> > I have this server which has been dying randomly, with no logs.
> 
> Here is a console picture.
> 
> http://i.imgur.com/ZYHlB82.jpg

Thanks to netconsole, I have the panic to post:

Feb 16 06:06:56 BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:88]
Feb 16 06:06:56 Modules linked in:
Feb 16 06:06:56  nf_nat
Feb 16 06:06:56  mpt3sas
Feb 16 06:06:56  mpt2sas
Feb 16 06:06:56  raid_class
Feb 16 06:06:56  mptctl
Feb 16 06:06:56  ipmi_si
Feb 16 06:06:56  ipmi_devintf
Feb 16 06:06:56  netconsole
Feb 16 06:06:56  configfs
Feb 16 06:06:56  ebtable_nat
Feb 16 06:06:56  ebtables
Feb 16 06:06:56  nfs
Feb 16 06:06:56  lockd
Feb 16 06:06:56  fscache
Feb 16 06:06:56  auth_rpcgss
Feb 16 06:06:56  nfs_acl
Feb 16 06:06:56  sunrpc
Feb 16 06:06:56  bridge
Feb 16 06:06:56  stp
Feb 16 06:06:56  llc
Feb 16 06:06:56  ipt_REJECT
Feb 16 06:06:56  nf_conntrack_ipv4
Feb 16 06:06:56  nf_defrag_ipv4
Feb 16 06:06:56  iptable_filter
Feb 16 06:06:56  ip_tables
Feb 16 06:06:56  ip6t_REJECT
Feb 16 06:06:56  nf_conntrack_ipv6
Feb 16 06:06:56  nf_defrag_ipv6
Feb 16 06:06:56  xt_state
Feb 16 06:06:56  nf_conntrack
Feb 16 06:06:56  ip6table_filter
Feb 16 06:06:56  ip6_tables
Feb 16 06:06:56  ipv6
Feb 16 06:06:56  dm_snapshot
Feb 16 06:06:56  dm_bufio
Feb 16 06:06:56  dm_zero
Feb 16 06:06:56  vhost_net
Feb 16 06:06:56  macvtap
Feb 16 06:06:56  macvlan
Feb 16 06:06:56  tun
Feb 16 06:06:56  kvm_amd
Feb 16 06:06:56  kvm
Feb 16 06:06:56  ipmi_msghandler
Feb 16 06:06:56  dcdbas
Feb 16 06:06:56  serio_raw
Feb 16 06:06:56  bnx2
Feb 16 06:06:56  k10temp
Feb 16 06:06:56  amd64_edac_mod
Feb 16 06:06:56  edac_core
Feb 16 06:06:56  edac_mce_amd
Feb 16 06:06:56  sg
Feb 16 06:06:56  i2c_piix4
Feb 16 06:06:56  shpchp
Feb 16 06:06:56  ext4
Feb 16 06:06:56  jbd2
Feb 16 06:06:56  mbcache
Feb 16 06:06:56  sd_mod
Feb 16 06:06:56  crc_t10dif
Feb 16 06:06:56  mptsas
Feb 16 06:06:56  mptscsih
Feb 16 06:06:56  mptbase
Feb 16 06:06:56  scsi_transport_sas
Feb 16 06:06:56  ata_generic
Feb 16 06:06:56  pata_acpi
Feb 16 06:06:56  sata_svw
Feb 16 06:06:56  radeon
Feb 16 06:06:56  ttm
Feb 16 06:06:56  drm_kms_helper
Feb 16 06:06:56  drm
Feb 16 06:06:56  i2c_algo_bit
Feb 16 06:06:56  i2c_core
Feb 16 06:06:56  dm_mirror
Feb 16 06:06:56  dm_region_hash
Feb 16 06:06:56  dm_log
Feb 16 06:06:56  dm_mod
Feb 16 06:06:56  [last unloaded: dell_rbu]
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 CPU 0
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 Modules linked in:
Feb 16 06:06:56  nf_nat
Feb 16 06:06:56  mpt3sas
Feb 16 06:06:56  mpt2sas
Feb 16 06:06:56  raid_class
Feb 16 06:06:56  mptctl
Feb 16 06:06:56  ipmi_si
Feb 16 06:06:56  ipmi_devintf
Feb 16 06:06:56  netconsole
Feb 16 06:06:56  configfs
Feb 16 06:06:56  ebtable_nat
Feb 16 06:06:56  ebtables
Feb 16 06:06:56  nfs
Feb 16 06:06:56  lockd
Feb 16 06:06:56  fscache
Feb 16 06:06:56  auth_rpcgss
Feb 16 06:06:56  nfs_acl
Feb 16 06:06:56  sunrpc
Feb 16 06:06:56  bridge
Feb 16 06:06:56  stp
Feb 16 06:06:56  llc
Feb 16 06:06:56  ipt_REJECT
Feb 16 06:06:56  nf_conntrack_ipv4
Feb 16 06:06:56  nf_defrag_ipv4
Feb 16 06:06:56  iptable_filter
Feb 16 06:06:56  ip_tables
Feb 16 06:06:56  ip6t_REJECT
Feb 16 06:06:56  nf_conntrack_ipv6
Feb 16 06:06:56  nf_defrag_ipv6
Feb 16 06:06:56  xt_state
Feb 16 06:06:56  nf_conntrack
Feb 16 06:06:56  ip6table_filter
Feb 16 06:06:56  ip6_tables
Feb 16 06:06:56  ipv6
Feb 16 06:06:56  dm_snapshot
Feb 16 06:06:56  dm_bufio
Feb 16 06:06:56  dm_zero
Feb 16 06:06:56  vhost_net
Feb 16 06:06:56  macvtap
Feb 16 06:06:56  macvlan
Feb 16 06:06:56  tun
Feb 16 06:06:56  kvm_amd
Feb 16 06:06:56  kvm
Feb 16 06:06:56  ipmi_msghandler
Feb 16 06:06:56  dcdbas
Feb 16 06:06:56  serio_raw
Feb 16 06:06:56  bnx2
Feb 16 06:06:56  k10temp
Feb 16 06:06:56  amd64_edac_mod
Feb 16 06:06:56  edac_core
Feb 16 06:06:56  edac_mce_amd
Feb 16 06:06:56  sg
Feb 16 06:06:56  i2c_piix4
Feb 16 06:06:56  shpchp
Feb 16 06:06:56  ext4
Feb 16 06:06:56  jbd2
Feb 16 06:06:56  mbcache
Feb 16 06:06:56  sd_mod
Feb 16 06:06:56  crc_t10dif
Feb 16 06:06:56  mptsas
Feb 16 06:06:56  mptscsih
Feb 16 06:06:56  mptbase
Feb 16 06:06:56  scsi_transport_sas
Feb 16 06:06:56  ata_generic
Feb 16 06:06:56  pata_acpi
Feb 16 06:06:56  sata_svw
Feb 16 06:06:56  radeon
Feb 16 06:06:56  ttm
Feb 16 06:06:56  drm_kms_helper
Feb 16 06:06:56  drm
Feb 16 06:06:56  i2c_algo_bit
Feb 16 06:06:56  i2c_core
Feb 16 06:06:56  dm_mirror
Feb 16 06:06:56  dm_region_hash
Feb 16 06:06:56  dm_log
Feb 16 06:06:56  dm_mod
Feb 16 06:06:56  [last unloaded: dell_rbu]
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 Pid: 88, comm: ksmd Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1
Feb 16 06:06:56  Dell Inc. PowerEdge 2970
Feb 16 06:06:56 /0JKN8W
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 RIP: 0010:[<ffffffff812a1411>]
Feb 16 06:06:56  [<ffffffff812a1411>] __bitmap_empty+0x41/0x90
Feb 16 06:06:56 RSP: 0018:ffff88021831dcb0  EFLAGS: 00000202
Feb 16 06:06:56 RAX: 0000000000000000 RBX: ffff88021831dcb0 RCX: 0000000000000010
Feb 16 06:06:56 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81e2f198
Feb 16 06:06:56 RBP: ffffffff8100bb8e R08: 0000000000000000 R09: 0000000000000000
Feb 16 06:06:56 R10: ffffea0006679c20 R11: 0000000000000000 R12: 0000000000000000
Feb 16 06:06:56 R13: ffff8801c1b8f650 R14: 0000000198152467 R15: ffffffffa03af44a
Feb 16 06:06:56 FS:  00007fc4756b09a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Feb 16 06:06:56 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 16 06:06:56 CR2: 000000c641faeff0 CR3: 0000000001a85000 CR4: 00000000000007f0
Feb 16 06:06:56 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 16 06:06:56 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 16 06:06:56 Process ksmd (pid: 88, threadinfo ffff88021831c000, task ffff880218310040)
Feb 16 06:06:56 Stack:
Feb 16 06:06:56  ffff88021831dd00
Feb 16 06:06:56  ffffffff81052268
Feb 16 06:06:56  00007f30249b8000
Feb 16 06:06:56  ffffffff81e2f180
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 d>
Feb 16 06:06:56  8000000198152025
Feb 16 06:06:56  ffff880219ade700
Feb 16 06:06:56  00007f30249b8000
Feb 16 06:06:56  ffff880219ade9c8
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 d>
Feb 16 06:06:56  ffffea0006679c20
Feb 16 06:06:56  ffff880219e57ed0
Feb 16 06:06:56  ffff88021831dd30
Feb 16 06:06:56  ffffffff810522e6
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 Call Trace:
Feb 16 06:06:56  [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130
Feb 16 06:06:56  [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90
Feb 16 06:06:56  [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0
Feb 16 06:06:56  [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660
Feb 16 06:06:56  [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120
Feb 16 06:06:56  [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Feb 16 06:06:56  [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120
Feb 16 06:06:56  [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Feb 16 06:06:56  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Feb 16 06:06:56  [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Feb 16 06:06:56  [<ffffffff8100c200>] ? child_rip+0x0/0x20
Feb 16 06:06:56 Code:
Feb 16 06:06:56 c0
Feb 16 06:06:56 7e
Feb 16 06:06:56 24
Feb 16 06:06:56 48
Feb 16 06:06:56 83
Feb 16 06:06:56 3f
Feb 16 06:06:56 00
Feb 16 06:06:56 48
Feb 16 06:06:56 89
Feb 16 06:06:56 f8
Feb 16 06:06:56 74
Feb 16 06:06:56 13
Feb 16 06:06:56 eb
Feb 16 06:06:56 5c
Feb 16 06:06:56 0f
Feb 16 06:06:56 1f
Feb 16 06:06:56 40
Feb 16 06:06:56 00
Feb 16 06:06:56 48
Feb 16 06:06:56 8b
Feb 16 06:06:56 48
Feb 16 06:06:56 08
Feb 16 06:06:56 48
Feb 16 06:06:56 83
Feb 16 06:06:56 c0
Feb 16 06:06:56 08
Feb 16 06:06:56 48
Feb 16 06:06:56 85
Feb 16 06:06:56 c9
Feb 16 06:06:56 75
Feb 16 06:06:56 4b
Feb 16 06:06:56 83
Feb 16 06:06:56 c2
Feb 16 06:06:56 01
Feb 16 06:06:56 41
Feb 16 06:06:56 39
Feb 16 06:06:56 d0
Feb 16 06:06:56 7f
Feb 16 06:06:56 eb
Feb 16 06:06:56 40
Feb 16 06:06:56 f6
Feb 16 06:06:56 c6
Feb 16 06:06:56 3f
Feb 16 06:06:56 b8>
Feb 16 06:06:56 01
Feb 16 06:06:56 00
Feb 16 06:06:56 last message repeated 2 times
Feb 16 06:06:56 75
Feb 16 06:06:56 08
Feb 16 06:06:56 c9
Feb 16 06:06:56 c3
Feb 16 06:06:56 66
Feb 16 06:06:56 0f
Feb 16 06:06:56 1f
Feb 16 06:06:56 44
Feb 16 06:06:56 00
Feb 16 06:06:56 00
Feb 16 06:06:56 89
Feb 16 06:06:56 f0
Feb 16 06:06:56 48
Feb 16 06:06:56 63
Feb 16 06:06:56 d2
Feb 16 06:06:56 c1
Feb 16 06:06:56 192.168.13.230
Feb 16 06:06:56 Call Trace:
Feb 16 06:06:56  [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130
Feb 16 06:06:56  [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90
Feb 16 06:06:56  [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0
Feb 16 06:06:56  [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660
Feb 16 06:06:56  [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120
Feb 16 06:06:56  [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Feb 16 06:06:56  [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120
Feb 16 06:06:56  [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Feb 16 06:06:56  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Feb 16 06:06:56  [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Feb 16 06:06:56  [<ffffffff8100c200>] ? child_rip+0x0/0x20
Feb 16 06:07:01 Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1
Feb 16 06:07:01 Call Trace:
Feb 16 06:07:01  <NMI>
Feb 16 06:07:01  [<ffffffff81530bdc>] ? panic+0xa7/0x16f
Feb 16 06:07:01  [<ffffffff81014959>] ? sched_clock+0x9/0x10
Feb 16 06:07:01  [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0
Feb 16 06:07:01  [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240
Feb 16 06:07:01  [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110
Feb 16 06:07:01  [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20
Feb 16 06:07:01  [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250
Feb 16 06:07:01  [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0
Feb 16 06:07:01  [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80
Feb 16 06:07:01  [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20
Feb 16 06:07:01  [<ffffffff810a4ede>] ? notify_die+0x2e/0x30
Feb 16 06:07:01  [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340
Feb 16 06:07:01  [<ffffffff81534f20>] ? nmi+0x20/0x30
Feb 16 06:07:01  [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30
Feb 16 06:07:01  <<EOE>>
Feb 16 06:07:01  [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00
Feb 16 06:07:01  [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm]
Feb 16 06:07:01  [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300
Feb 16 06:07:01  [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
Feb 16 06:07:01  [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0
Feb 16 06:07:01  [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70
Feb 16 06:07:01  [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20
Feb 16 06:07:01  [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0
Feb 16 06:07:01  [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0
Feb 16 06:07:01  [<ffffffff81534c95>] ? page_fault+0x25/0x30
Feb 16 06:07:01  [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40
Feb 16 06:07:01  [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm]
Feb 16 06:07:01  [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm]
Feb 16 06:07:01  [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm]
Feb 16 06:07:01  [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm]
Feb 16 06:07:01  [<ffffffff810b2b73>] ? futex_wake+0x93/0x150
Feb 16 06:07:01  [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
Feb 16 06:07:01  [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70
Feb 16 06:07:01  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
Feb 16 06:07:01  [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0
Feb 16 06:07:01  [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580
Feb 16 06:07:01  [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0
Feb 16 06:07:01  [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290
Feb 16 06:07:01  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Feb 16 06:07:01 drm_kms_helper: panic occurred, switching back to text console
Feb 16 06:07:01 BUG: scheduling while atomic: qemu-kvm/1950/0x14010000
Feb 16 06:07:01 Modules linked in:
Feb 16 06:07:01  nf_nat
Feb 16 06:07:01  mpt3sas
Feb 16 06:07:01  mpt2sas
Feb 16 06:07:01  raid_class
Feb 16 06:07:01  mptctl
Feb 16 06:07:01  ipmi_si
Feb 16 06:07:01  ipmi_devintf
Feb 16 06:07:01  netconsole
Feb 16 06:07:01  configfs
Feb 16 06:07:01  ebtable_nat
Feb 16 06:07:01  ebtables
Feb 16 06:07:01  nfs
Feb 16 06:07:01  lockd
Feb 16 06:07:01  fscache
Feb 16 06:07:01  auth_rpcgss
Feb 16 06:07:01  nfs_acl
Feb 16 06:07:01  sunrpc
Feb 16 06:07:01  bridge
Feb 16 06:07:01  stp
Feb 16 06:07:01  llc
Feb 16 06:07:01  ipt_REJECT
Feb 16 06:07:01  nf_conntrack_ipv4
Feb 16 06:07:01  nf_defrag_ipv4
Feb 16 06:07:01  iptable_filter
Feb 16 06:07:01  ip_tables
Feb 16 06:07:01  ip6t_REJECT
Feb 16 06:07:01  nf_conntrack_ipv6
Feb 16 06:07:01  nf_defrag_ipv6
Feb 16 06:07:01  xt_state
Feb 16 06:07:01  nf_conntrack
Feb 16 06:07:01  ip6table_filter
Feb 16 06:07:01  ip6_tables
Feb 16 06:07:01  ipv6
Feb 16 06:07:01  dm_snapshot
Feb 16 06:07:01  dm_bufio
Feb 16 06:07:01  dm_zero
Feb 16 06:07:01  vhost_net
Feb 16 06:07:01  macvtap
Feb 16 06:07:01  macvlan
Feb 16 06:07:01  tun
Feb 16 06:07:01  kvm_amd
Feb 16 06:07:01  kvm
Feb 16 06:07:01  ipmi_msghandler
Feb 16 06:07:01  dcdbas
Feb 16 06:07:01  serio_raw
Feb 16 06:07:01  bnx2
Feb 16 06:07:01  k10temp
Feb 16 06:07:01  amd64_edac_mod
Feb 16 06:07:01  edac_core
Feb 16 06:07:01  edac_mce_amd
Feb 16 06:07:01  sg
Feb 16 06:07:01  i2c_piix4
Feb 16 06:07:01  shpchp
Feb 16 06:07:01  ext4
Feb 16 06:07:01  jbd2
Feb 16 06:07:01  mbcache
Feb 16 06:07:01  sd_mod
Feb 16 06:07:01  crc_t10dif
Feb 16 06:07:01  mptsas
Feb 16 06:07:01  mptscsih
Feb 16 06:07:01  mptbase
Feb 16 06:07:01  scsi_transport_sas
Feb 16 06:07:01  ata_generic
Feb 16 06:07:01  pata_acpi
Feb 16 06:07:01  sata_svw
Feb 16 06:07:01  radeon
Feb 16 06:07:01  ttm
Feb 16 06:07:01  drm_kms_helper
Feb 16 06:07:01  drm
Feb 16 06:07:01  i2c_algo_bit
Feb 16 06:07:01  i2c_core
Feb 16 06:07:01  dm_mirror
Feb 16 06:07:01  dm_region_hash
Feb 16 06:07:01  dm_log
Feb 16 06:07:01  dm_mod
Feb 16 06:07:01  [last unloaded: dell_rbu]
Feb 16 06:07:01 192.168.13.230
Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1
Feb 16 06:07:01 Call Trace:
Feb 16 06:07:01  <NMI>
Feb 16 06:07:01  [<ffffffff81060bb6>] ? __schedule_bug+0x66/0x70
Feb 16 06:07:01  [<ffffffff8153193c>] ? thread_return+0x6ac/0x7d0
Feb 16 06:07:01  [<ffffffffa002e35d>] ? write_msg+0xfd/0x110 [netconsole]
Feb 16 06:07:01  [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper]
Feb 16 06:07:01  [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40
Feb 16 06:07:01  [<ffffffff81531d30>] ? _cond_resched+0x30/0x40
Feb 16 06:07:01  [<ffffffff81174e18>] ? __kmalloc+0x138/0x230
Feb 16 06:07:01  [<ffffffff810ba332>] ? __module_text_address+0x12/0x60
Feb 16 06:07:01  [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper]
Feb 16 06:07:01  [<ffffffffa013df27>] ? r100_mm_wreg+0x67/0x90 [radeon]
Feb 16 06:07:01  [<ffffffffa01332d2>] ? radeon_crtc_cursor_set+0x92/0x6e0 [radeon]
Feb 16 06:07:01  [<ffffffffa005e40c>] ? drm_mode_set_config_internal+0x5c/0xe0 [drm]
Feb 16 06:07:01  [<ffffffffa00b0653>] ? drm_fb_helper_restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper]
Feb 16 06:07:01  [<ffffffffa00b0788>] ? drm_fb_helper_panic+0x78/0xa0 [drm_kms_helper]
Feb 16 06:07:01  [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80
Feb 16 06:07:01  [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20
Feb 16 06:07:01  [<ffffffff81530c07>] ? panic+0xd2/0x16f
Feb 16 06:07:01  [<ffffffff81014959>] ? sched_clock+0x9/0x10
Feb 16 06:07:01  [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0
Feb 16 06:07:01  [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240
Feb 16 06:07:01  [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110
Feb 16 06:07:01  [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20
Feb 16 06:07:01  [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250
Feb 16 06:07:01  [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0
Feb 16 06:07:01  [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80
Feb 16 06:07:01  [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20
Feb 16 06:07:01  [<ffffffff810a4ede>] ? notify_die+0x2e/0x30
Feb 16 06:07:01  [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340
Feb 16 06:07:01  [<ffffffff81534f20>] ? nmi+0x20/0x30
Feb 16 06:07:01  [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30
Feb 16 06:07:01  <<EOE>>
Feb 16 06:07:01  [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00
Feb 16 06:07:01  [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm]
Feb 16 06:07:01  [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300
Feb 16 06:07:01  [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
Feb 16 06:07:01  [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0
Feb 16 06:07:01  [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70
Feb 16 06:07:01  [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20
Feb 16 06:07:01  [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0
Feb 16 06:07:01  [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0
Feb 16 06:07:01  [<ffffffff81534c95>] ? page_fault+0x25/0x30
Feb 16 06:07:01  [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40
Feb 16 06:07:01  [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm]
Feb 16 06:07:01  [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm]
Feb 16 06:07:01  [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm]
Feb 16 06:07:01  [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm]
Feb 16 06:07:01  [<ffffffff810b2b73>] ? futex_wake+0x93/0x150
Feb 16 06:07:01  [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
Feb 16 06:07:01  [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70
Feb 16 06:07:01  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
Feb 16 06:07:01  [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0
Feb 16 06:07:01  [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580
Feb 16 06:07:01  [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0
Feb 16 06:07:01  [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290
Feb 16 06:07:01  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Feb 16 06:07:01 Clocksource tsc unstable (delta = -77309385171 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.















> 
> > 
> > I had a tail -f over ssh for a week, when this just happened.
> > 
> > Feb  8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: 
> > attempting task abort! (sc=ffff880057a0a080)
> > Feb  8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: 
> > Write(10): 2a 00 1a 17 a1 6f 00 00 01 00
> > Feb  8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING 
> > - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000
> > Feb  8 00:10:51 thirteen-230 kernel: mptbase: ioc0: 
> > Initiating recovery
> > Feb  8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task 
> > abort: SUCCESS (rv=2002) (sc=ffff880057a0a080)
> > Write failed: Connection reset by peer
> > 
> > After reading https://access.redhat.com/solutions/108273, I 
> > am increasing the logging (shown below) but I am not 
> > confident about this wait and see approach.
> > 
> > sysctl -w dev.scsi.logging_level=98367
> > 
> > I am also going to check smartctl output once I get onsite to 
> > power cycle the system.
> 
> # smartctl -a /dev/sda
> smartctl 5.43 2012-06-30 r3573 
> [x86_64-linux-2.6.32-504.3.3.el6.x86_64] (local build)
> Copyright (C) 2002-12 by Bruce Allen, 
> http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
> Device Model:     ST1500DM003-9YN16G
> Serial Number:    W24153R0
> LU WWN Device Id: 5 000c50 05d03cc1d
> Firmware Version: CC82
> User Capacity:    1,500,301,910,016 bytes [1.50 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Sat Feb  7 23:41:00 2015 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00) Offline data 
> collection activity
>                                         was never started.
>                                         Auto Offline Data 
> Collection: Disabled.
> Self-test execution status:      (   0) The previous 
> self-test routine completed
>                                         without error or no 
> self-test has ever
>                                         been run.
> Total time to complete Offline
> data collection:                (  600) seconds.
> Offline data collection
> capabilities:                    (0x73) SMART execute Offline 
> immediate.
>                                         Auto Offline data 
> collection on/off support.
>                                         Suspend Offline 
> collection upon new
>                                         command.
>                                         No Offline surface 
> scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test 
> supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data 
> before entering
>                                         power-saving mode.
>                                         Supports SMART auto 
> save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose 
> Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        ( 194) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x3085) SCT Status supported.
> 
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE  
>     UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   118   099   006    
> Pre-fail  Always       -       181943016
>   3 Spin_Up_Time            0x0003   092   092   000    
> Pre-fail  Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    
> Old_age   Always       -       17
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    
> Pre-fail  Always       -       0
>   7 Seek_Error_Rate         0x000f   075   060   030    
> Pre-fail  Always       -       39599363
>   9 Power_On_Hours          0x0032   100   100   000    
> Old_age   Always       -       821
>  10 Spin_Retry_Count        0x0013   100   100   097    
> Pre-fail  Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    
> Old_age   Always       -       17
> 183 Runtime_Bad_Block       0x0032   100   100   000    
> Old_age   Always       -       0
> 184 End-to-End_Error        0x0032   100   100   099    
> Old_age   Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    
> Old_age   Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    
> Old_age   Always       -       0
> 189 High_Fly_Writes         0x003a   100   100   000    
> Old_age   Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   067   062   045    
> Old_age   Always       -       33 (Min/Max 30/33)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    
> Old_age   Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    
> Old_age   Always       -       16
> 193 Load_Cycle_Count        0x0032   098   098   000    
> Old_age   Always       -       4551
> 194 Temperature_Celsius     0x0022   033   040   000    
> Old_age   Always       -       33 (0 21 0 0 0)
> 197 Current_Pending_Sector  0x0012   100   100   000    
> Old_age   Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    
> Old_age   Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    
> Old_age   Always       -       0
> 240 Head_Flying_Hours       0x0000   100   253   000    
> Old_age   Offline      -       267112606073648
> 241 Total_LBAs_Written      0x0000   100   253   000    
> Old_age   Offline      -       2764453802303
> 242 Total_LBAs_Read         0x0000   100   253   000    
> Old_age   Offline      -       3442873711291
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
> 
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 
> minute delay.
> 
> 
> > 
> > Other posts I have read, but I can not act on yet:
> > 
> > * 
> > http://unix.stackexchange.com/questions/34173/mptscsih-ioc0-ta
> sk-abort-success-rv-2002-causes-30-seconds-freezing
> > * https://bugzilla.kernel.org/show_bug.cgi?id=18652
> > * https://bugzilla.redhat.com/show_bug.cgi?id=483424
> > * https://bugzilla.kernel.org/show_bug.cgi?id=42765
> > * http://sourceforge.net/p/smartmontools/mailman/message/23849184/
> > * http://kb.softescu.ro/category/hardware/dell/
> > 
> > -Jason

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-                                                               -
- Jason Pyeron                      PD Inc. http://www.pdinc.us -
- Principal Consultant              10 West 24th Street #100    -
- +1 (443) 269-1555 x333            Baltimore, Maryland 21218   -
-                                                               -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.