On my C5 machine (a Dell XPS420) I have a 500Gb disk on the internal SATA controller.
I also have a SiI3132 dual-port multi-device eSATA card. This is connected to an external SATA array of disks.
Now occasionally I see something like this in my logs
ata7.01: exception Emask 0x0 SAct 0x0 SErr 0x0 a ction 0x0 ata7.01: irq_stat 0x00060002, device error via D 2H FIS ata7.01: cmd 25/00:08:47:1c:92/00:00:6c:00:00/e0 tag 0 dma 4096 in res 51/40:00:4e:1c:92/00:00:6c:00:00/00 Emask 0x9 (media error) ata7.01: status: { DRDY ERR } ata7.01: error: { UNC } ata7.01: configured for UDMA/100 ata7: EH complete
How do I tell what disk this is complaining about? Is there a way to determine what ata7.01 maps to in terms of /dev/sd# values?
/proc/scsi/scsi doesn't obviously match scsi# numbers to ata# numbers :-(
Stephen Harris wrote:
On my C5 machine (a Dell XPS420) I have a 500Gb disk on the internal SATA controller.
I also have a SiI3132 dual-port multi-device eSATA card. This is connected to an external SATA array of disks.
Now occasionally I see something like this in my logs
ata7.01: exception Emask 0x0 SAct 0x0 SErr 0x0 a ction 0x0 ata7.01: irq_stat 0x00060002, device error via D 2H FIS ata7.01: cmd 25/00:08:47:1c:92/00:00:6c:00:00/e0 tag 0 dma 4096 in res 51/40:00:4e:1c:92/00:00:6c:00:00/00 Emask 0x9 (media error) ata7.01: status: { DRDY ERR } ata7.01: error: { UNC } ata7.01: configured for UDMA/100 ata7: EH complete
How do I tell what disk this is complaining about? Is there a way to determine what ata7.01 maps to in terms of /dev/sd# values?
/proc/scsi/scsi doesn't obviously match scsi# numbers to ata# numbers :-(
Try looking in /dev/disk/
On Thu, Jan 28, 2010 at 10:01:17AM -0500, Toby Bluhm wrote:
Stephen Harris wrote:
ata7.01: exception Emask 0x0 SAct 0x0 SErr 0x0 a ction 0x0 ata7.01: irq_stat 0x00060002, device error via D 2H FIS ata7.01: cmd 25/00:08:47:1c:92/00:00:6c:00:00/e0 tag 0 dma 4096 in res 51/40:00:4e:1c:92/00:00:6c:00:00/00 Emask 0x9 (media error)
How do I tell what disk this is complaining about? Is there a way to determine what ata7.01 maps to in terms of /dev/sd# values?
Try looking in /dev/disk/
Hmm...
by-label and by-uuid clearly isn't useful here since that's based on filesystem data ;-)
by-id doesn't look too helpful; it'd be good for determining model/serial number mapping to disk, but I don't have that info. Potentially useful information in other cases, but not here :-(
by-path, unfortunately, returns the scsi controller data at a hardware address, not the ata#.# number eg pci-0000:00:1d.7-usb-0:3:1.0-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-1:0:0:0 pci-0000:00:1f.2-scsi-2:0:0:0 pci-0000:02:00.0-scsi-0:0:0:0 pci-0000:02:00.0-scsi-0:1:0:0 pci-0000:02:00.0-scsi-0:2:0:0 pci-0000:02:00.0-scsi-0:3:0:0 pci-0000:02:00.0-scsi-1:0:0:0
(internal disk, internal DVD writer, internal DVD-ROM, 5 external disks, 1 USB disk)
That's really useful for mapping position in the array to sd number, though!
Thanks for the idea, though!
On Thursday 28 January 2010, Stephen Harris wrote: ...
Now occasionally I see something like this in my logs
ata7.01: exception Emask 0x0 SAct 0x0 SErr 0x0 a ction 0x0
...
How do I tell what disk this is complaining about? Is there a way to determine what ata7.01 maps to in terms of /dev/sd# values?
/proc/scsi/scsi doesn't obviously match scsi# numbers to ata# numbers :-(
This seems quite hard to do. The following hack will match scsi hosts to libata-driver + number:
$ for d in $(ls -d /sys/class/scsi_host/host?); do echo "$d $(cat \ $d/proc_name) $(cat $d/unique_id)" ; done /sys/class/scsi_host/host0 ahci 1 /sys/class/scsi_host/host1 ahci 2 /sys/class/scsi_host/host2 ahci 3 /sys/class/scsi_host/host3 ahci 4 /sys/class/scsi_host/host4 ahci 5 /sys/class/scsi_host/host5 ahci 6
This does not get you all the way though, but unless you have several different libata-drivers ahci 1-6 above will match ata1-ata6 (read "dmesg | less"...).
Once you know which scsi-host you're looking for the /dev/sdX name can be had from many sources (like the output of "lsscsi").
/Peter
On Thursday 28 January 2010, Stephen Harris wrote: ...
Now occasionally I see something like this in my logs
ata7.01: exception Emask 0x0 SAct 0x0 SErr 0x0 a ction 0x0
...
How do I tell what disk this is complaining about? Is there a way to determine what ata7.01 maps to in terms of /dev/sd# values?
/proc/scsi/scsi doesn't obviously match scsi# numbers to ata# numbers :-(
This seems quite hard to do. The following hack will match scsi hosts to libata-driver + number:
<snip> Have you checked dmesg? For example, <snip> SCSI subsystem initialized libata version 3.00 loaded. sata_sil 0000:01:0b.0: version 2.3 ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 193 scsi0 : sata_sil scsi1 : sata_sil scsi2 : sata_sil scsi3 : sata_sil ata1: SATA max UDMA/100 mmio m1024@0xfc6ffc00 tf 0xfc6ffc80 irq 193 ata2: SATA max UDMA/100 mmio m1024@0xfc6ffc00 tf 0xfc6ffcc0 irq 193 ata3: SATA max UDMA/100 mmio m1024@0xfc6ffc00 tf 0xfc6ffe80 irq 193 ata4: SATA max UDMA/100 mmio m1024@0xfc6ffc00 tf 0xfc6ffec0 irq 193 <snip> mark
[ Sorry to merge messages; I appear to have lost Peter's post, so I'm replying to Peter and Mark in the same message ]
Peter Kjellstrom wrote:
On Thursday 28 January 2010, Stephen Harris wrote:
to determine what ata7.01 maps to in terms of /dev/sd# values?
This seems quite hard to do. The following hack will match scsi hosts to libata-driver + number:
Hmm, interesting:
% for d in /sys/class/scsi_host/host* do echo "$d $(cat $d/proc_name) $(cat $d/unique_id)" done /sys/class/scsi_host/host0 ahci 1 /sys/class/scsi_host/host1 ahci 2 /sys/class/scsi_host/host2 ahci 3 /sys/class/scsi_host/host3 ahci 4 /sys/class/scsi_host/host4 ahci 5 /sys/class/scsi_host/host5 ahci 6 /sys/class/scsi_host/host6 sata_sil24 7 /sys/class/scsi_host/host7 sata_sil24 8 /sys/class/scsi_host/host8 usb-storage 0
So in this case ata7 appears to map to host6. Now the usb-storage entry looks odd. Do I have to magically know that ahci and sata_sil24 both map to "ata" entries?
From lsscsi I see, for host 6
[6:0:0:0] disk ATA ST31000340AS SD15 /dev/sdc [6:1:0:0] disk ATA ST31000340AS SD15 /dev/sdd [6:2:0:0] disk ATA ST31000340AS SD15 /dev/sde [6:3:0:0] disk ATA ST31000340AS AD14 /dev/sdf
So I have to guess the second number in ata#.# represents the LUN of the device on that bus, so ata7.1 -> host6 -> [6:1:0:0] -> sdd
This looks like an unreliable method of detection. But it may be possible!
m.roth@5-cent.us wrote:
Have you checked dmesg? For example,
Yeah, but dmesg has two problems that I can think of 1) it may disappear if the number of kernel messages grows sufficiently large
2) The ID number wasn't obvious scsi6 : sata_sil24 scsi7 : sata_sil24 ata7: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fc000 irq 169 ata8: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fe000 irq 169
How does that tell me ata7 matches scsi6? We can't rely on ordering (see below).
Further,
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata7.00: hard resetting link ata7.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.01: hard resetting link ata7.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.02: hard resetting link ata7.02: SATA link up 1.5 Gbps (SStatus 113 SControl 0) ata7.03: hard resetting link floppy0: no floppy controllers found ata7.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.04: hard resetting link ata7.04: SATA link down (SStatus 0 SControl 320) ata7.05: hard resetting link ata7.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.00: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata7.00: configured for UDMA/100 ata7.01: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.01: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.01: configured for UDMA/100 ata7.02: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.02: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.02: configured for UDMA/100 ata7.03: ATA-8: ST31000340AS, AD14, max UDMA/133 ata7.03: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.03: configured for UDMA/100 ata7: EH complete Vendor: Maxtor Model: 6Y120P0 Rev: YAR4 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through sdb:<6>ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata8.00: hard resetting link ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
As can be seen, different parts of detection appear to be interleaved (floppy detection message, sdb detection message - disk in a USB disk enclosure!) so I can't seem to rely on ordering of messages in dmesg to accurately determine which device has been assigned what ID.
dmesg really gets messy when you have lots of disks!
Thanks for the feedback so far!
On 1/28/2010 4:05 PM, Stephen Harris wrote:
As can be seen, different parts of detection appear to be interleaved (floppy detection message, sdb detection message - disk in a USB disk enclosure!) so I can't seem to rely on ordering of messages in dmesg to accurately determine which device has been assigned what ID.
dmesg really gets messy when you have lots of disks!
Not to mention what happens if you hot-swap them and re-connect in a different order. I've sometimes made raid-1 devices on the disks even if I don't intend to add the partner just so auto-detect will always connect them up to the right place even if some are disconnected or moved.
At Thu, 28 Jan 2010 17:05:41 -0500 CentOS mailing list centos@centos.org wrote:
[ Sorry to merge messages; I appear to have lost Peter's post, so I'm replying to Peter and Mark in the same message ]
Peter Kjellstrom wrote:
On Thursday 28 January 2010, Stephen Harris wrote:
to determine what ata7.01 maps to in terms of /dev/sd# values?
This seems quite hard to do. The following hack will match scsi hosts to libata-driver + number:
Hmm, interesting:
% for d in /sys/class/scsi_host/host* do echo "$d $(cat $d/proc_name) $(cat $d/unique_id)" done /sys/class/scsi_host/host0 ahci 1 /sys/class/scsi_host/host1 ahci 2 /sys/class/scsi_host/host2 ahci 3 /sys/class/scsi_host/host3 ahci 4 /sys/class/scsi_host/host4 ahci 5 /sys/class/scsi_host/host5 ahci 6 /sys/class/scsi_host/host6 sata_sil24 7 /sys/class/scsi_host/host7 sata_sil24 8 /sys/class/scsi_host/host8 usb-storage 0
So in this case ata7 appears to map to host6. Now the usb-storage entry looks odd. Do I have to magically know that ahci and sata_sil24 both map to "ata" entries?
From lsscsi I see, for host 6
[6:0:0:0] disk ATA ST31000340AS SD15 /dev/sdc [6:1:0:0] disk ATA ST31000340AS SD15 /dev/sdd [6:2:0:0] disk ATA ST31000340AS SD15 /dev/sde [6:3:0:0] disk ATA ST31000340AS AD14 /dev/sdf
So I have to guess the second number in ata#.# represents the LUN of the device on that bus, so ata7.1 -> host6 -> [6:1:0:0] -> sdd
This looks like an unreliable method of detection. But it may be possible!
m.roth@5-cent.us wrote:
Have you checked dmesg? For example,
Yeah, but dmesg has two problems that I can think of
- it may disappear if the number of kernel messages grows sufficiently
large
/var/log/dmesg
- The ID number wasn't obvious
scsi6 : sata_sil24 scsi7 : sata_sil24 ata7: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fc000 irq 169 ata8: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fe000 irq 169
How does that tell me ata7 matches scsi6? We can't rely on ordering (see below).
Ata numbers seem to start with 1 (one) and scsi hosts start with 0 (zero), so, ataN => scsi<N-1>, unless you either have real SCSI controllers or PATA controllers that use SCSI-flavored drivers. The USB drivers will be loaded later, so the USB disks will have higher SCSI host numners.
Further,
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata7.00: hard resetting link ata7.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.01: hard resetting link ata7.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.02: hard resetting link ata7.02: SATA link up 1.5 Gbps (SStatus 113 SControl 0) ata7.03: hard resetting link floppy0: no floppy controllers found ata7.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.04: hard resetting link ata7.04: SATA link down (SStatus 0 SControl 320) ata7.05: hard resetting link ata7.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.00: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata7.00: configured for UDMA/100 ata7.01: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.01: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.01: configured for UDMA/100 ata7.02: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.02: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.02: configured for UDMA/100 ata7.03: ATA-8: ST31000340AS, AD14, max UDMA/133 ata7.03: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.03: configured for UDMA/100 ata7: EH complete Vendor: Maxtor Model: 6Y120P0 Rev: YAR4 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through sdb:<6>ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata8.00: hard resetting link ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
As can be seen, different parts of detection appear to be interleaved (floppy detection message, sdb detection message - disk in a USB disk enclosure!) so I can't seem to rely on ordering of messages in dmesg to accurately determine which device has been assigned what ID.
dmesg really gets messy when you have lots of disks!
Thanks for the feedback so far!
On Thu, Jan 28, 2010 at 07:18:36PM -0500, Robert Heller wrote:
At Thu, 28 Jan 2010 17:05:41 -0500 CentOS mailing list centos@centos.org wrote:
Yeah, but dmesg has two problems that I can think of
- it may disappear if the number of kernel messages grows sufficiently
large
/var/log/dmesg
Doesn't handle hotswap disks, and still has the problem with out-of-order entries.
Ata numbers seem to start with 1 (one) and scsi hosts start with 0 (zero), so, ataN => scsi<N-1>, unless you either have real SCSI controllers or PATA controllers that use SCSI-flavored drivers. The USB drivers will be loaded later, so the USB disks will have higher SCSI host numners.
Can we guarantee that? And order detection of disks is not the same as order detection of buses; in my cases the USB disk is on scsi host 8 but is sdb (the second disk found).
The problem I really want to solve is a scriptable solution so that I can always map ata#.# number to /dev/sdX entry. On my own personal machine I can always write it down 'cos it's not gonna change... but in general?
On Thursday 28 January 2010, Stephen Harris wrote:
[ Sorry to merge messages; I appear to have lost Peter's post, so I'm replying to Peter and Mark in the same message ]
Peter Kjellstrom wrote:
On Thursday 28 January 2010, Stephen Harris wrote:
to determine what ata7.01 maps to in terms of /dev/sd# values?
This seems quite hard to do. The following hack will match scsi hosts to libata-driver + number:
Hmm, interesting:
% for d in /sys/class/scsi_host/host* do echo "$d $(cat $d/proc_name) $(cat $d/unique_id)" done /sys/class/scsi_host/host0 ahci 1 /sys/class/scsi_host/host1 ahci 2 /sys/class/scsi_host/host2 ahci 3 /sys/class/scsi_host/host3 ahci 4 /sys/class/scsi_host/host4 ahci 5 /sys/class/scsi_host/host5 ahci 6 /sys/class/scsi_host/host6 sata_sil24 7 /sys/class/scsi_host/host7 sata_sil24 8 /sys/class/scsi_host/host8 usb-storage 0
So in this case ata7 appears to map to host6. Now the usb-storage entry looks odd. Do I have to magically know that ahci and sata_sil24 both map to "ata" entries?
AFAICT, I'm afraid so :-/
From lsscsi I see, for host 6
[6:0:0:0] disk ATA ST31000340AS SD15 /dev/sdc [6:1:0:0] disk ATA ST31000340AS SD15 /dev/sdd [6:2:0:0] disk ATA ST31000340AS SD15 /dev/sde [6:3:0:0] disk ATA ST31000340AS AD14 /dev/sdf
So I have to guess the second number in ata#.# represents the LUN of the device on that bus, so ata7.1 -> host6 -> [6:1:0:0] -> sdd
The LUN is actually the last of the four digits, the 2nd one is target (host:target:device:lun), but yes, your observation seems correct.
This looks like an unreliable method of detection. But it may be possible!
Uhu
m.roth@5-cent.us wrote:
Have you checked dmesg? For example,
Yeah, but dmesg has two problems that I can think of
- it may disappear if the number of kernel messages grows sufficiently
large
- The ID number wasn't obvious
scsi6 : sata_sil24 scsi7 : sata_sil24 ata7: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fc000 irq 169 ata8: SATA max UDMA/100 host m128@0xfe7fbf80 port 0xfe7fe000 irq 169
How does that tell me ata7 matches scsi6? We can't rely on ordering (see below).
That is basically why I tried to find a better way than "interpreting" dmesg, yes you can probably figure it out but man it's an ugly way...
Further,
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata7.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata7.00: hard resetting link ata7.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.01: hard resetting link ata7.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.02: hard resetting link ata7.02: SATA link up 1.5 Gbps (SStatus 113 SControl 0) ata7.03: hard resetting link floppy0: no floppy controllers found ata7.03: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7.04: hard resetting link ata7.04: SATA link down (SStatus 0 SControl 320) ata7.05: hard resetting link ata7.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) ata7.00: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata7.00: configured for UDMA/100 ata7.01: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.01: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.01: configured for UDMA/100 ata7.02: ATA-8: ST31000340AS, SD15, max UDMA/133 ata7.02: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.02: configured for UDMA/100 ata7.03: ATA-8: ST31000340AS, AD14, max UDMA/133 ata7.03: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata7.03: configured for UDMA/100 ata7: EH complete Vendor: Maxtor Model: 6Y120P0 Rev: YAR4 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB) sdb: Write Protect is off sdb: Mode Sense: 53 00 00 08 sdb: assuming drive cache: write through sdb:<6>ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9 ata8.00: hard resetting link ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
As can be seen, different parts of detection appear to be interleaved (floppy detection message, sdb detection message - disk in a USB disk enclosure!) so I can't seem to rely on ordering of messages in dmesg to accurately determine which device has been assigned what ID.
Some order can be relied on, some not. Bottom line, you can by experience and testing find a correct mapping from dmesg, but really there should be a better way.
/Peter
dmesg really gets messy when you have lots of disks!
Thanks for the feedback so far!