Software RAID1 with CentOS-6.2

List overview All Threads
Download

newer

older

Re: [CentOS] LibreOffice rpm's vs...

Dovecot GUI

Kahlil Hodgson

29 Feb 2012 29 Feb '12

12:27 a.m.

Hello,

Having a problem with software RAID that is driving me crazy.

Here's the details:

1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot). 2. Reasonably good PC hardware (i.e. not budget, but not server grade either) with a pair of 1TB Western Digital SATA3 Drives. 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and cables say they can do 6Gb/s). 4. During the install I set up software RAID1 for the two drives with two raid partitions: md0 - 500M for /boot md1 - "the rest" for a physical volume 5. Setup LVM on md1 in the standard slash, swap, home layout

Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I ran yum update, added a few minor packages and performed some basic configuration.

Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

Okay, fair enough, I've got at least one bad drive. I boot the system from a live usb and run the short and long SMART tests on both drive. No problems reported but I know that can be misleading, so I'm going to have to gather some evidence before I try to return these drives. I run badblocks in destructive mode on both drives as follows

badblocks -w -b 4096 -c 98304 -s /dev/sda badblocks -w -b 4096 -c 98304 -s /dev/sdb

Come back the next day and see that no errors are reported. Er thats odd. I check the SMART data in case badblocks activity has triggered something. Nope. Maybe I screwed up the install somehow?

So I start again and repeat the install process very carefully. This time I check the raid array straight after boot.

mdadm -D /dev/md0 - all is fine. mdadm -D /dev/md1 - the two drives are resyncing.

Okay, that is odd. The RAID1 array was created at the start of the install process, before any software was installed. Surely it should be in sync already? Googled a bit and found a post were someone else had seen same thing happen. The advice was to just wait until the drives sync so the 'blocks match exactly' but I'm not really happy with the explanation. At this rate its going to take a whole day to do a single minimal install and I'm sure I would have heard others complaining about the process.

Anyway, I leave the system to sync for the rest of the day. When I get back to it I see the same (similar) I/O errors on the console and mdadm shows the RAID array is degraded, /dev/sdb2 has been marked as faulty. This time I notice that the I/O errors all refer to /dev/sda. Have to reboot because the fs is now readonly. When the system comes back up, its trying to resync the drive again. Eh?

Any ideas what is going on here? If its bad drives, I really need some confirmation independent of the software raid failing. I thought SMART or badblocks give me that. Perhaps it has nothing to do with the drives. Could a problem with the mainboard or the memory cause this issue? Is it a SATA3 issue? Should I try it on the 3Gb/s channels since there's probably little speed difference with non-SSDs?

Cheers,

Kal

Show replies by date

Keith Keller

29 Feb 29 Feb

12:43 a.m.

On 2012-02-29, Kahlil Hodgson kahlil.hodgson@dealmax.com.au wrote:

...

Reasonably good PC hardware (i.e. not budget, but not server grade either)

with a pair of 1TB Western Digital SATA3 Drives.

One thing you can try is to download WD's drive tester and throw it at your drives. It seems unlikely to find anything, but you never know. The tester is available on the UBCD bootable CD image (which has lots of other handy tools).

Which model drives do you have? I've found a lot of variability between WDxxEARS vs their RE drives.

...

Okay, that is odd. The RAID1 array was created at the start of the install process, before any software was installed. Surely it should be in sync already? Googled a bit and found a post were someone else had seen same thing happen. The advice was to just wait until the drives sync so the 'blocks match exactly' but I'm not really happy with the explanation.

Supposedly, at least with RAID[456], the array is completely usable when it's resyncing after an initial creation. In practice, I found that writing significant amounts of data to that array killed resync performance, so I just let the resync finish before doing any heavy lifting on the array.

...

Anyway, I leave the system to sync for the rest of the day. When I get back to it I see the same (similar) I/O errors on the console and mdadm shows the RAID array is degraded, /dev/sdb2 has been marked as faulty. This time I notice that the I/O errors all refer to /dev/sda. Have to reboot because the fs is now readonly. When the system comes back up, its trying to resync the drive again. Eh?

This sounds a little odd. You're having IO errors on sda, but sdb2 has been kicked out of the RAID? Do you have any other errors in /var/log/messages that relate to sdb, and/or the errors right around when the md devices failed?

--keith

-- kkeller-usenet@wombat.san-francisco.ca.us

Kahlil Hodgson

1:24 a.m.

Hi Keith,

On Tue, 2012-02-28 at 16:43 -0800, Keith Keller wrote:

...

One thing you can try is to download WD's drive tester and throw it at your drives. It seems unlikely to find anything, but you never know. The tester is available on the UBCD bootable CD image (which has lots of other handy tools).

Ah cool. I'll give that a go :-)

...

Which model drives do you have? I've found a lot of variability between WDxxEARS vs their RE drives.

These are WD1002FAEX drives (qTB, SATA3, 7200rpm, 64MB).

...

Supposedly, at least with RAID[456], the array is completely usable when it's resyncing after an initial creation. In practice, I found that writing significant amounts of data to that array killed resync performance, so I just let the resync finish before doing any heavy lifting on the array.

Yeah. That was my understanding. Thanks for the confirmation:-)

...

...
Anyway, I leave the system to sync for the rest of the day. When I get back to it I see the same (similar) I/O errors on the console and mdadm shows the RAID array is degraded, /dev/sdb2 has been marked as faulty. This time I notice that the I/O errors all refer to /dev/sda. Have to reboot because the fs is now readonly. When the system comes back up, its trying to resync the drive again. Eh?

This sounds a little odd. You're having IO errors on sda, but sdb2 has been kicked out of the RAID? Do you have any other errors in /var/log/messages that relate to sdb, and/or the errors right around when the md devices failed?

Having a little trouble getting at the log files. When it fails the fs goes read-only and I can't run any programs (less, tail, ...) except 'cat' against the log file or dmesg output (I get I/O errors). On reboot there's nothing in the log files, presumably because they could not be written to. May have to have to set up a remote logging to get at this (PITA).

Thanks for the suggestions :-)

Kal

-- Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd (w) +61 (0) 3 9008 5281 Suite 1005 401 Docklands Drive Docklands VIC 3008 Australia "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

Scott Silva

12:48 a.m.

on 2/28/2012 4:27 PM Kahlil Hodgson spake the following:

...

Hello,

Having a problem with software RAID that is driving me crazy.

Here's the details:

CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).

Reasonably good PC hardware (i.e. not budget, but not server grade either)

with a pair of 1TB Western Digital SATA3 Drives. 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and cables say they can do 6Gb/s). 4. During the install I set up software RAID1 for the two drives with two raid partitions: md0 - 500M for /boot md1 - "the rest" for a physical volume 5. Setup LVM on md1 in the standard slash, swap, home layout

Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I ran yum update, added a few minor packages and performed some basic configuration.

Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

Okay, fair enough, I've got at least one bad drive. I boot the system from a live usb and run the short and long SMART tests on both drive. No problems reported but I know that can be misleading, so I'm going to have to gather some evidence before I try to return these drives. I run badblocks in destructive mode on both drives as follows
 badblocks -w -b 4096 -c 98304 -s /dev/sda
 badblocks -w -b 4096 -c 98304 -s /dev/sdb
Come back the next day and see that no errors are reported. Er thats odd. I check the SMART data in case badblocks activity has triggered something. Nope. Maybe I screwed up the install somehow?

So I start again and repeat the install process very carefully. This time I check the raid array straight after boot.
 mdadm -D /dev/md0   -   all is fine.
 mdadm -D /dev/md1   -   the two drives are resyncing.
Okay, that is odd. The RAID1 array was created at the start of the install process, before any software was installed. Surely it should be in sync already? Googled a bit and found a post were someone else had seen same thing happen. The advice was to just wait until the drives sync so the 'blocks match exactly' but I'm not really happy with the explanation. At this rate its going to take a whole day to do a single minimal install and I'm sure I would have heard others complaining about the process.

Anyway, I leave the system to sync for the rest of the day. When I get back to it I see the same (similar) I/O errors on the console and mdadm shows the RAID array is degraded, /dev/sdb2 has been marked as faulty. This time I notice that the I/O errors all refer to /dev/sda. Have to reboot because the fs is now readonly. When the system comes back up, its trying to resync the drive again. Eh?

Any ideas what is going on here? If its bad drives, I really need some confirmation independent of the software raid failing. I thought SMART or badblocks give me that. Perhaps it has nothing to do with the drives. Could a problem with the mainboard or the memory cause this issue? Is it a SATA3 issue? Should I try it on the 3Gb/s channels since there's probably little speed difference with non-SSDs?

Cheers,

Kal

First thing... Are they green drives? Green drives power down randomly and can cause these types of errors... Also, maybe the 6GB sata isn't fully supported by linux and that board... Try the 3 GB channels

Kahlil Hodgson

1:08 a.m.

Hi Scott,

On Tue, 2012-02-28 at 16:48 -0800, Scott Silva wrote:

...

First thing... Are they green drives? Green drives power down randomly and can cause these types of errors...

These are 'Black' drives.

...

Also, maybe the 6GB sata isn't fully supported by linux and that board... Try the 3 GB channels

Yer, I was thinking that might be the case. I'll give that a go.

Thanks,

Kal

William Warren

11:44 a.m.

first off..if you are using the on bios raid turn it off. Secondly black drives form WD intentionally put themselves into deep cycle diags every so often. This makes them impossible to use in hardware and FRAID setups. I have 4 of them in raid 10 under mdraid and i had to disable bios raid for them to not go nuts. I still get the occasional error but usually the kernel is smart enough to let them take their breather they want to take. unfortunately the only drives that don't do this are their "enterprise" drives. this is the primary reason i have discontinued my use of all western digital products.

On 2/28/2012 8:08 PM, Kahlil Hodgson wrote:

...

Hi Scott,

On Tue, 2012-02-28 at 16:48 -0800, Scott Silva wrote:

...
First thing... Are they green drives? Green drives power down randomly and can cause these types of errors...

These are 'Black' drives.

...
Also, maybe the 6GB sata isn't fully supported by linux and that board... Try the 3 GB channels

Yer, I was thinking that might be the case. I'll give that a go.

Thanks,

Kal

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Luke S. Crawford

1:30 a.m.

On Wed, Feb 29, 2012 at 11:27:53AM +1100, Kahlil Hodgson wrote:

...

Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

what I/O errors?

...

So I start again and repeat the install process very carefully. This time I check the raid array straight after boot.
mdadm -D /dev/md0   -   all is fine.
mdadm -D /dev/md1   -   the two drives are resyncing.
Okay, that is odd. The RAID1 array was created at the start of the install process, before any software was installed. Surely it should be in sync already? Googled a bit and found a post were someone else had seen same thing happen. The advice was to just wait until the drives sync so the 'blocks match exactly' but I'm not really happy with the explanation. At this rate its going to take a whole day to do a single minimal install and I'm sure I would have heard others complaining about the process.

Yeah, it's normal for a raid1 to 'sync' when you first create it. the odd part is the I/O errors.

...

Any ideas what is going on here? If its bad drives, I really need some confirmation independent of the software raid failing. I thought SMART or badblocks give me that. Perhaps it has nothing to do with the drives. Could a problem with the mainboard or the memory cause this issue? Is it a SATA3 issue? Should I try it on the 3Gb/s channels since there's probably little speed difference with non-SSDs?

look up the drive errors.

Oh, and my experience? both wd and seagate won't complain if you error on the side of 'when in doubt, return the drive' - that's what I do.

But yeah, usually smart will report something... at least a high reallocated sectors or something.

Kahlil Hodgson

1:57 a.m.

On Tue, 2012-02-28 at 20:30 -0500, Luke S. Crawford wrote:

...

On Wed, Feb 29, 2012 at 11:27:53AM +1100, Kahlil Hodgson wrote:

...
Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

what I/O errors?

Good point :) Okay, copied manually from the console:

end_request: I/O error, dev sda, sector 8690896 Buffer I/O error on device dm-0, logical block 1081344 JBD2: I/O error detected wen updating journal superblock for dm-0-8 end_request: I/0 error, dev sda, sector 1026056 etc

I gather device mapper and journal errors are caused by the preceding low level error.

...

Oh, and my experience? both wd and seagate won't complain if you error on the side of 'when in doubt, return the drive' - that's what I do.

Yeah, was hopping to avoid the delay though. Its already sucked two days of my time so I might just have to bite the bullet :-(

Cheers!

Kal

John R Pierce

2:29 a.m.

On 02/28/12 5:57 PM, Kahlil Hodgson wrote:

...

end_request: I/O error, dev sda, sector 8690896 Buffer I/O error on device dm-0, logical block 1081344 JBD2: I/O error detected wen updating journal superblock for dm-0-8 end_request: I/0 error, dev sda, sector 1026056

there's no more info on those I/O error's in DMESG or whatever?

sounds like /dev/sda may be a bad drive. it happens.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Ellen Shull

1:59 a.m.

On Tue, Feb 28, 2012 at 5:27 PM, Kahlil Hodgson kahlil.hodgson@dealmax.com.au wrote:

...

Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

I had a problem like this once. In a heterogeneous array of 80 GB PATA drives (it was a while ago), the one WD drive kept dropping out like this. WD's diagnostic tool showed a problem, so I RMA'ed the drive... only to discover the replacement did the same thing on the system, but checked out just fine on a different system. Turned out to be a combination of a power supply with less-than-stellar regulation (go Enermax...) and the WD was particularly sensitive to it; nothing else in the system seemed to be affected Replacing the power supply finally eliminated the issue.

--ln

Kahlil Hodgson

3:21 a.m.

Hi Ellen,

On Tue, 2012-02-28 at 18:59 -0700, Ellen Shull wrote:

...

On Tue, Feb 28, 2012 at 5:27 PM, Kahlil Hodgson kahlil.hodgson@dealmax.com.au wrote:

...
Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

I had a problem like this once. In a heterogeneous array of 80 GB PATA drives (it was a while ago), the one WD drive kept dropping out like this. WD's diagnostic tool showed a problem, so I RMA'ed the drive... only to discover the replacement did the same thing on the system, but checked out just fine on a different system. Turned out to be a combination of a power supply with less-than-stellar regulation (go Enermax...) and the WD was particularly sensitive to it; nothing else in the system seemed to be affected Replacing the power supply finally eliminated the issue.

Hmm ... that could be the problem. Power supply is the only component that is not brand new. Just swapped it out for a spare. Will see how that goes over the next few hours.

Cheers!

Kal

Kahlil Hodgson

6:43 a.m.

On Wed, 2012-02-29 at 14:21 +1100, Kahlil Hodgson wrote:

...

...
I had a problem like this once. In a heterogeneous array of 80 GB PATA drives (it was a while ago), the one WD drive kept dropping out like this. WD's diagnostic tool showed a problem, so I RMA'ed the drive... only to discover the replacement did the same thing on the system, but checked out just fine on a different system. Turned out to be a combination of a power supply with less-than-stellar regulation (go Enermax...) and the WD was particularly sensitive to it; nothing else in the system seemed to be affected Replacing the power supply finally eliminated the issue.

Hmm ... that could be the problem. Power supply is the only component that is not brand new. Just swapped it out for a spare. Will see how that goes over the next few hours.

Sadly still getting the same errors. Will have another go at this tomorrow. Three more things to try 1. UBCD and Western Digital diagnostics. 2. Bring the RAID array up via the live cd and see if it resyncs. At least I'll be able to inspect the log files when the I/O errors hit. 3. Try using the 3Gb/s channel.

Will let everyone know how that goes.

Thanks for all you help!

Cheers!

Kal

Kahlil Hodgson

7 Mar 7 Mar

12:33 a.m.

New subject: [SOLVED] Software RAID1 with CentOS-6.2

On Wed, 2012-02-29 at 17:43 +1100, Kahlil Hodgson wrote:

...

Sadly still getting the same errors. Will have another go at this tomorrow. Three more things to try

UBCD and Western Digital diagnostics.

Bring the RAID array up via the live cd and see if it resyncs. At

least I'll be able to inspect the log files when the I/O errors hit. 3. Try using the 3Gb/s channel.

Finally got a chance to get back to this (have just moved offices).

1. Tried the WD disagnostics via UBCD and, as predicted, drives came up all clear. 2. Brought the RAID array up under Partition Majic (via live UBCD) and it resynced fine. No amount of prodding could reproduce the errors I was seeing before. Hmmm... 3. Did a fresh install and got the same problems as before. 4. Poked around in the BIOS to ensure that there were _no_ Fake RAID settings active. 5. Sigh, okay lets try the 3Gb/s channel ... (face palm!)

On opening the box I noticed the drives were plugged into the GSATA3 ports. Under a normal viewing angle the 'G' was occluded by the plug mounts and it wasn't until I took a real close look that I noticed this. The positioning of the GSATA3 ports on this board was the same as the normal SATA3 ports on the previous board I used and, I confess, I didn't read the manual carefully enough <shame face>. Apparently the GSATA3 ports are controlled by a Marvell 88SE9172 chip rather than the Intel Z68 chip.

After swapping over to the normal SATA3 ports and rebuilding, updating, installing, everything is fine :-)

Thanks to everyone for their helpful comments and advice.

Cheers!

Kal

Emmett Culley

29 Feb 29 Feb

2:18 a.m.

On 02/28/2012 04:27 PM, Kahlil Hodgson wrote:

...

Hello,

Having a problem with software RAID that is driving me crazy.

Here's the details:

CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).

Reasonably good PC hardware (i.e. not budget, but not server grade either)

with a pair of 1TB Western Digital SATA3 Drives. 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and cables say they can do 6Gb/s). 4. During the install I set up software RAID1 for the two drives with two raid partitions: md0 - 500M for /boot md1 - "the rest" for a physical volume 5. Setup LVM on md1 in the standard slash, swap, home layout

Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I ran yum update, added a few minor packages and performed some basic configuration.

Now I start to get I/O errors on printed on the console. Run 'mdadm -D /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as faulty.

Okay, fair enough, I've got at least one bad drive. I boot the system from a live usb and run the short and long SMART tests on both drive. No problems reported but I know that can be misleading, so I'm going to have to gather some evidence before I try to return these drives. I run badblocks in destructive mode on both drives as follows
 badblocks -w -b 4096 -c 98304 -s /dev/sda
 badblocks -w -b 4096 -c 98304 -s /dev/sdb
Come back the next day and see that no errors are reported. Er thats odd. I check the SMART data in case badblocks activity has triggered something. Nope. Maybe I screwed up the install somehow?

So I start again and repeat the install process very carefully. This time I check the raid array straight after boot.
 mdadm -D /dev/md0   -   all is fine.
 mdadm -D /dev/md1   -   the two drives are resyncing.
Okay, that is odd. The RAID1 array was created at the start of the install process, before any software was installed. Surely it should be in sync already? Googled a bit and found a post were someone else had seen same thing happen. The advice was to just wait until the drives sync so the 'blocks match exactly' but I'm not really happy with the explanation. At this rate its going to take a whole day to do a single minimal install and I'm sure I would have heard others complaining about the process.

Anyway, I leave the system to sync for the rest of the day. When I get back to it I see the same (similar) I/O errors on the console and mdadm shows the RAID array is degraded, /dev/sdb2 has been marked as faulty. This time I notice that the I/O errors all refer to /dev/sda. Have to reboot because the fs is now readonly. When the system comes back up, its trying to resync the drive again. Eh?

Any ideas what is going on here? If its bad drives, I really need some confirmation independent of the software raid failing. I thought SMART or badblocks give me that. Perhaps it has nothing to do with the drives. Could a problem with the mainboard or the memory cause this issue? Is it a SATA3 issue? Should I try it on the 3Gb/s channels since there's probably little speed difference with non-SSDs?

Cheers,

Kal

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

I just had a very similar problem with a raid 10 array with four new 1TB drives. It turned out to be the SATA cable.

I first tried a new drive and even replaced the five disk hot plug carrier. It was always the same logical drive (/dev/sdb). I then tried using an additional SATA adapter card. That cinched it, as the only thing common to all the above was the SATA cable.

All has been well for a week now.

I should have tired replacing the cable first :-)

Emmett

Kahlil Hodgson

3:26 a.m.

Hi Emmett,

On Tue, 2012-02-28 at 18:18 -0800, Emmett Culley wrote:

...

I just had a very similar problem with a raid 10 array with four new 1TB drives. It turned out to be the SATA cable.

...

All has been well for a week now.

I should have tired replacing the cable first :-)

Ah yes. Good point. That was one of my first thoughts. I forgot to mention that I swapped both cables between the first and the second install:-)

Cheers,

Kal

Nataraj

6 Mar 6 Mar

3:16 a.m.

On 02/28/2012 07:26 PM, Kahlil Hodgson wrote:

...

Hi Emmett,

On Tue, 2012-02-28 at 18:18 -0800, Emmett Culley wrote:

...
I just had a very similar problem with a raid 10 array with four new 1TB drives. It turned out to be the SATA cable.

...

...
All has been well for a week now.

I should have tired replacing the cable first :-)

Ah yes. Good point. That was one of my first thoughts. I forgot to mention that I swapped both cables between the first and the second install:-)

Cheers,

Kal

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

You can also try swapping the drives. I bet the problem won't move. I've seen problems like this and I don't believe that your IO errors are real errors on the disk. I think something is going offline. It kicks the second drive out of the array because it is trying to read the first drive and sync by copying it to the second drive. Since it can't resync it kicks the drive which it thinks is not 'current' out of the array. I don't think it is a raid problem, your just seeing raids reaction to a problem somewhere between the disk controller and the drive.

Is your kernel using the AHCI driver and if not which driver is being used? Do you know if you have an Intel controller onboard? If your not getting the AHCI driver and your using an onboard controller, your BIOS may not be setup properly for native AHCI.

Do you actually have a two channel controller or is there any kind of port multiplier on there? I have seen problems like you describe with some port multipliers. In fact I'm having a problem like that right now. You could test for problems like this by installing the system on just one drive with the other disconnected and then try the second drive with the first disconnected. Obviously you won't be able to configure raid, but you should be able to do substantial IO testing.

You can tell which driver is being used by looking at the log output, also lsmod to see which driver is loaded.

Mar 5 16:06:33 myhost kernel: ahci 0000:00:1f.2: PCI INT A -> GSI 20 (level, low) -> IRQ 20 Mar 5 16:06:33 myhost kernel: ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x3f impl SATA mode Mar 5 16:06:33 myhost kernel: ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pmp pio slum part ems sxs apst Mar 5 16:06:33 myhost kernel: scsi0 : ahci Mar 5 16:06:33 myhost kernel: scsi1 : ahci Mar 5 16:06:33 myhost kernel: scsi2 : ahci Mar 5 16:06:33 myhost kernel: scsi3 : ahci Mar 5 16:06:33 myhost kernel: scsi4 : ahci Mar 5 16:06:33 myhost kernel: scsi5 : ahci

Here's what you might see in log output (/var/log/messages) if you have a port multiplier:

Mar 5 16:06:33 myhost kernel: ata6.15: Port Multiplier 1.1, 0x197b:0x2352 r0, 2 ports, feat 0x0/0x0 Mar 5 16:06:33 myhost kernel: ata6.15: Asynchronous notification not supported, hotplug won't Mar 5 16:06:33 myhost kernel: work on fan-out ports. Use warm-plug instead. Mar 5 16:06:33 myhost kernel: ata6.00: hard resetting link

I am actually looking for any recommendations for port multipliers that work well with the Centos 6 driver, because the one that I have here doesn't.

Nataraj

Miguel Medalha

29 Feb 29 Feb

7:46 p.m.

A few months ago I had an enormous amount of grief trying to understand why a RAID array in a new server kept getting corrupted and suddenly changing configuration. After a lot of despair and head scratching it turned out to be the SATA cables. This was a rack server from Asus with a SATA backplane. The cables, made by Foxconn, came pre-installed.

After I replaced the SATA cables with new ones, all problems were gone and the array is now rock solid.

Many SATA cables on the market are pieces of junk either incapable of coping with the high frequencies involved in SATA 3Gb/s or 6Gb/s or their connector are made of bad quality plastics unable to keep the necessary pressure on the contacts.

I had already found this problem with desktop machines, I simply wouldn't believe that such a class of hardware would exhibit it also.

So, I would advise you to replace the SATA cables with good quality ones.

As an additional information, I quote from the Caviar Black range datasheet:

"Desktop / Consumer RAID Environments - WD Caviar Black Hard Drives are tested and recommended for use in consumer-type RAID applications (RAID-0 /RAID-1). - Business Critical RAID Environments – WD Caviar Black Hard Drives are not recommended for and are not warranted for use in RAID environments utilizing Enterprise HBAs and/or expanders and in multi-bay chassis, as they are not designed for, nor tested in, these specific types of RAID applications. For all Business Critical RAID applications, please consider WD’s Enterprise Hard Drives that are specifically designed with RAID-specific, time-limited error recovery (TLER), are tested extensively in 24x7 RAID applications, and include features like enhanced RAFF technology and thermal extended burn-in testing."

m.roth＠5-cent.us

8:05 p.m.

Miguel Medalha wrote:

...

A few months ago I had an enormous amount of grief trying to understand why a RAID array in a new server kept getting corrupted and suddenly changing configuration. After a lot of despair and head scratching it turned out to be the SATA cables. This was a rack server from Asus with a SATA backplane. The cables, made by Foxconn, came pre-installed.

After I replaced the SATA cables with new ones, all problems were gone and the array is now rock solid.

Thanks for this info, Miguel. <snip>

...

As an additional information, I quote from the Caviar Black range datasheet:

"Desktop / Consumer RAID Environments - WD Caviar Black Hard Drives are tested and recommended for use in consumer-type RAID applications (RAID-0 /RAID-1).

Business Critical RAID Environments WD Caviar Black Hard Drives are

not recommended for and are not warranted for use in RAID environments utilizing Enterprise HBAs and/or expanders and in multi-bay chassis, as they are not designed for, nor tested in, these specific types of RAID applications. For all Business Critical RAID applications, please consider WDs Enterprise Hard Drives that are specifically designed with RAID-specific, time-limited error recovery (TLER), are tested extensively in 24x7 RAID applications, and include features like enhanced RAFF technology and thermal extended burn-in testing."

Wonderful... NOT. We've got a number of Caviar Green, so I looked up its datasheet... and it says the same.

That rebuild of my system at home? I think I'll look at commercial grade drives....

mark

William Warren

6 Mar 6 Mar

2:33 a.m.

What's funny is WD is just being idiotic. Seagate does NOT have that extended error checking. I have two barracuda green drives in an sbs 2k8 server on a sas 6 ir and they work perfectly.

On 2/29/2012 3:05 PM, m.roth@5-cent.us wrote:

...

Miguel Medalha wrote:

...
A few months ago I had an enormous amount of grief trying to understand why a RAID array in a new server kept getting corrupted and suddenly changing configuration. After a lot of despair and head scratching it turned out to be the SATA cables. This was a rack server from Asus with a SATA backplane. The cables, made by Foxconn, came pre-installed.

After I replaced the SATA cables with new ones, all problems were gone and the array is now rock solid.

Thanks for this info, Miguel.

<snip> > As an additional information, I quote from the Caviar Black range > datasheet: > > "Desktop / Consumer RAID Environments - WD Caviar Black Hard Drives are > tested and recommended for use in consumer-type RAID applications > (RAID-0 /RAID-1). > - Business Critical RAID Environments – WD Caviar Black Hard Drives are > not recommended for and are not warranted for use in RAID environments > utilizing Enterprise HBAs and/or expanders and in multi-bay chassis, as > they are not designed for, nor tested in, these specific types of RAID > applications. For all Business Critical RAID applications, please > consider WD’s Enterprise Hard Drives that are specifically designed with > RAID-specific, time-limited error recovery (TLER), are tested > extensively in 24x7 RAID applications, and include features like > enhanced RAFF technology and thermal extended burn-in testing." Wonderful... NOT. We've got a number of Caviar Green, so I looked up its datasheet... and it says the same.

That rebuild of my system at home? I think I'll look at commercial grade drives....
    mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

4894

Age (days ago)

4901

Last active (days ago)

discuss@lists.centos.org

18 comments

11 participants

tags (0)

participants (11)

Ellen Shull
Emmett Culley
John R Pierce
Kahlil Hodgson
Keith Keller
Luke S. Crawford
m.roth＠5-cent.us
Miguel Medalha
Nataraj
Scott Silva
William Warren