[CentOS] hardware issues? driver issues?

m.roth at 5-cent.us m.roth at 5-cent.us
Wed Mar 7 11:17:15 EST 2012


Got a bunch of servers from Penguin. Supermicro m/b's H8QG6. We put a 3tb
drive in for additional workspace for the users, and some of them won't
read, others will go for weeks, then spit out DRDY errors. lshw shows the
controller as an ATI SB7x0/SB8x0/SB9x0 SATA.

I did notice that it shows
 *-storage
             description: SATA controller
             product: SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
             vendor: ATI Technologies Inc
<snip>
             width: 32 bits
             ^^^^^^^^^^^^^^
             clock: 66MHz
             ^^^^^^^^^^^^
             capabilities: storage pm ahci_1.0 bus_master cap_list

>From /var/log/dmesg:
pci 0000:00:0d.0: PME# supported from D0 D3hot D3cold
pci 0000:00:0d.0: PME# disabled
pci 0000:00:11.0: reg 10 io port: [0xd000-0xd007]
pci 0000:00:11.0: reg 14 io port: [0xc000-0xc003]
pci 0000:00:11.0: reg 18 io port: [0xb000-0xb007]
pci 0000:00:11.0: reg 1c io port: [0xa000-0xa003]
pci 0000:00:11.0: reg 20 io port: [0x9000-0x900f]
pci 0000:00:11.0: reg 24 32bit mmio: [0xdfefa400-0xdfefa7ff]
<...>
ahci 0000:00:11.0: version 3.0
  alloc irq_desc for 22 on node 0
  alloc kstat_irqs on node 0
ahci 0000:00:11.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
ahci 0000:00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
ccc
<...>
ata1: SATA max UDMA/133 abar m1024 at 0xdfefa400 port 0xdfefa500 irq 22
ata2: SATA max UDMA/133 abar m1024 at 0xdfefa400 port 0xdfefa580 irq 22

I've included the above, because I note the 32bit mmio, but the 64bit
flag; also the clock speed for the controller.

Now, I've been working on one with Penguin. I noticed one thing, that it
was set to native IDE. After googling, I saw that the most recent spec,
which included EIDE, should be good to petabytes... but I tried resetting
it to AHCI anyway.

The user ran one job, ok... then another last night, and it's spitting the
same errors.

In /var/log/messages, I see JBD: detected IO errors while flushing file data:
Mar  7 00:53:28 <server> kernel: ata2.00: exception Emask 0x0 SAct 0x3
SErr 0x0 action 0x6 frozen
Mar  7 00:53:28 <server> kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar  7 00:53:28 <server> kernel: ata2.00: cmd
61/08:00:72:4a:a4/00:00:ae:00:00/40 tag 0 ncq 4096 out
Mar  7 00:53:28 <server> kernel:         res
40/00:04:20:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar  7 00:53:28 <server> kernel: ata2.00: status: { DRDY }
<...>
Mar  7 00:53:28 <server> kernel: ata2: hard resetting link
Mar  7 00:53:33 <server> kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Mar  7 00:53:33 <server> kernel: ata2.00: configured for UDMA/133
Mar  7 00:53:33 <server> kernel: ata2.00: device reported invalid CHS
sector 0
Mar  7 00:53:33 <server> kernel: ata2: EH complete

Notice the "device reported invalid CHS sector 0". The drive does have a
GPT rather than an MBR.

So, has anyone else seen similar problems, or have some suggestions as to
something I can try? Penguin's still waiting for a response from
Supermicro, and has escalated....

          mark




More information about the CentOS mailing list