I have been reading about software raid. I configured my first software raid system about a month ago.
I have 4 500 Gig drives configured in RAID 5 configuration with a total of 1.5TB.
Currently I configured the complete individual drivers as software raid, then created a /dev/md0 with the drives
I then created a /file_storage partition on /dev/md0.
I created my /boot / and swap partitions on a non raid drive in my system.
Is the the proper way to configure software raid?
On Mon, Mar 4, 2013 at 10:53 PM, Chris Weisiger cweisiger@bellsouth.netwrote:
I have been reading about software raid. I configured my first software raid system about a month ago.
I have 4 500 Gig drives configured in RAID 5 configuration with a total of 1.5TB.
Currently I configured the complete individual drivers as software raid, then created a /dev/md0 with the drives
I've read (and would agree) that using the entire drive can hide the fact that a softraid is there. If you set up a partition on the disk and mark the file system as type "fd" then no matter what you know that disk is part of a softraid array.
What you configured works. Is it wrong? No.
I then created a /file_storage partition on /dev/md0.
I created my /boot / and swap partitions on a non raid drive in my system.
Is the the proper way to configure software raid?
I generally use LVM on my systems, so my layouts has /boot carved out as its own partition and then another partition for the LVM PV. And if you use standard partitions instead of LVM, then create individual softraid arrays for each partition. * Remember, /boot can only be on a raid1 software array.
Veering slightly from your original question... I recently set up softraid arrays by hand before invoking the Anaconda installer (on a 6.3 install). Recent mdadm packages that ship with CentOS support metadata 1.1 and 1.2 (... actually defaulting to 1.2 I believe), but GRUB 0.97 only supports metadata 1.0 and not the metadata version that mdadm defaulted to. On my CentOS 5 installs in the past I've specifically set --metadata=0.90 to avert any catastrophes like this.
Fun problem to troubleshoot ... I knew it once that system wouldn't boot though. Kind of odd that the installer didn't pick up on the metadata version and flag it or modify it. In the end I ended up rescuing the system by backing up and recreating the /boot softraid with metadata 1.0 :)
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 03/05/2013 05:58 AM, SilverTip257 wrote:
Veering slightly from your original question... I recently set up softraid arrays by hand before invoking the Anaconda installer (on a 6.3 install). Recent mdadm packages that ship with CentOS support metadata 1.1 and 1.2 (... actually defaulting to 1.2 I believe), but GRUB 0.97 only supports metadata 1.0 and not the metadata version that mdadm defaulted to. On my CentOS 5 installs in the past I've specifically set --metadata=0.90 to avert any catastrophes like this.
As far as I know, GRUB 0.97 only supports metadata 0.90, as does LILO. Anaconda will create arrays with 0.90 metadata for this reason.
The kernel wiki disagrees with me: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
Debian's documentation indicates that only grub 1.98+20100720-1 or later will boot from a RAID volume with a newer metadata format: http://www.debian.org/releases/stable/i386/release-notes/ch-information.en.h...
On Wed, Mar 6, 2013 at 2:12 PM, Gordon Messmer yinyang@eburg.com wrote:
On 03/05/2013 05:58 AM, SilverTip257 wrote:
Veering slightly from your original question... I recently set up softraid arrays by hand before invoking the Anaconda installer (on a 6.3 install). Recent mdadm packages that ship with
CentOS
support metadata 1.1 and 1.2 (... actually defaulting to 1.2 I believe), but GRUB 0.97 only supports metadata 1.0 and not the metadata version
that
mdadm defaulted to. On my CentOS 5 installs in the past I've
specifically
set --metadata=0.90 to avert any catastrophes like this.
As far as I know, GRUB 0.97 only supports metadata 0.90, as does LILO. Anaconda will create arrays with 0.90 metadata for this reason.
I can tell you from my recent experience that whatever concoction of GRUB 0.97 (shipped with CentOS 6.3) supports booting off of metadata 1.0
I often encounter metadata 0.90 ... on many [all?] of the aging CentOS 5 installs I see.
The kernel wiki disagrees with me: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
Debian's documentation indicates that only grub 1.98+20100720-1 or later will boot from a RAID volume with a newer metadata format:
Grub 1.99 or thereabouts is pretty awesome. Grub2 (as Debian packages it) supports booting off LVM which is slick. Not overly useful, but convenient if you would rather /boot be part of the rootfs ... especially with kernels getting larger.
A /boot partition that could be 100MB with CentOS 5 now needs to be around 512MB with CentOS 6 (found this out the hard way with a development system). Disk space is cheap ... but I still don't want to waste space! :)
http://www.debian.org/releases/stable/i386/release-notes/ch-information.en.h... _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 03/04/2013 10:53 PM, Chris Weisiger wrote:
I have been reading about software raid. I configured my first software raid system about a month ago.
I have 4 500 Gig drives configured in RAID 5 configuration with a total of 1.5TB.
Currently I configured the complete individual drivers as software raid, then created a /dev/md0 with the drives
I then created a /file_storage partition on /dev/md0.
I created my /boot / and swap partitions on a non raid drive in my system.
Is the the proper way to configure software raid? _______________________________________________
Hey Chris,
What you have done is a totally acceptable way of building a raid array.
Software raid on Linux is amazingly flexible. It is able to build arrays on individual matching drives as you have done, drives of different physical sizes, a combination physical drives and partitions on other drives, or a combination of partitions on different drives. It can even build a raid array on several partitions on one physical drive, not that you would ever want to do that.
In other words, if you can dream it up, software raid can probably build it. The question is why are you using raid at all? If you are trying to increase access speed or data security then raid makes sense. The appropriate configuration depends on your available resources and the nature of your intent.
On 3/5/2013 4:27 PM, Mark LaPierre wrote:
The question is why are you using raid at all?
indeed. the primary justification for the "R" in RAID, Redundant, is high availability. having the OS on a non-raid volume completely violates this. RAID is most definitely NOT a substitute for backups.
Chris,
I've used software raid quite a bit, and have developed a few rules of thumb, hope these help!
- Use one raid array, generally md0, for /boot, and one for LVM, md1. This allows the individual drives to be mounted and read on another server for recovery if you're using RAID1.
This is generally how the drives in a RAID1 array would look. This is from a CentOS 5 server, so /boot is only 100MB, on CentOS 6 it would be 500MB.
# fdisk -l /dev/sda Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 fd Linux raid autodetect /dev/sda2 14 30401 244091610 fd Linux raid autodetect
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0] 244091520 blocks [2/2] [UU]
- Avoid software RAID5 or 6, only use it for RAID1 or 10. Software RAID5 performance can be abysmal, because of the parity calculations and the fact that each write to the array requires that all drives be read and written. Older hardware raid controllers can be pretty cheap on eBay, I'm using an old 3Ware on my home CentOS server. Avoid hostraid adapters, these are just software raid in the controller rather than the OS. Even with hardware raid performance won't be near as good as RAID10, I generally only use RAID5 or 6 for partitions that hold backups.
If you are using drives over 1TB, consider partitioning the drives into smaller chunks, say around 500MB, and creating multiple arrays. That way if you get a read error on one sector that causes one of the raid partitions to be marked as bad, only that partition needs to be rebuild rather than the whole drive.
Mark Snyder Highland Solutions 200 South Michigan Ave., Suite 1000 Chicago, IL 60604 http://www.highlandsolutions.com
----- Original Message ----- From: "Chris Weisiger" cweisiger@bellsouth.net To: centos@centos.org Sent: Monday, March 4, 2013 9:53:48 PM Subject: [CentOS] Software RAID complete drives or individual partitions
I have been reading about software raid. I configured my first software raid system about a month ago.
I have 4 500 Gig drives configured in RAID 5 configuration with a total of 1.5TB.
Currently I configured the complete individual drivers as software raid, then created a /dev/md0 with the drives
I then created a /file_storage partition on /dev/md0.
I created my /boot / and swap partitions on a non raid drive in my system.
Is the the proper way to configure software raid? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Wed, Mar 6, 2013 at 10:00 AM, Mark Snyder msnyder@highlandsolutions.com wrote:
I've used software raid quite a bit, and have developed a few rules of thumb, hope these help!
Do you (or anyone...) have any rules of thumb regarding drives over 2TB or using raid sets that were created on Centos5 under Centos6? I once booted a centos5 box with the initial centos6 'live' CD and it permanently broke all of the auto-detected mirrors so I have been a little reluctant to it again.
Am 06.03.2013 um 17:25 schrieb Les Mikesell lesmikesell@gmail.com:
On Wed, Mar 6, 2013 at 10:00 AM, Mark Snyder msnyder@highlandsolutions.com wrote:
I've used software raid quite a bit, and have developed a few rules of thumb, hope these help!
Do you (or anyone...) have any rules of thumb regarding drives over 2TB or using raid sets that were created on Centos5 under Centos6? I
AFAIK you need to use a GPT for partitions with >2TB ...
-- LF
On 3/6/2013 12:22 PM, Leon Fauster wrote:
AFAIK you need to use a GPT for partitions with >2TB ...
you need GPT to partition devices larger than 2TB, regardless of the parittion size.
me, I use parted....
# parted /dev/sdc "label gpt" # parted /dev/sdc -a none "mkpart primary 512s -1s"
to make a single full disk partition that starts at sector 512, which is 256K bytes, which is usually a decent boundary for SSDs, large raids, and other such devices.
On Wed, Mar 6, 2013 at 2:42 PM, John R Pierce pierce@hogranch.com wrote:
AFAIK you need to use a GPT for partitions with >2TB ...
you need GPT to partition devices larger than 2TB, regardless of the parittion size.
me, I use parted....
# parted /dev/sdc "label gpt" # parted /dev/sdc -a none "mkpart primary 512s -1s"
to make a single full disk partition that starts at sector 512, which is 256K bytes, which is usually a decent boundary for SSDs, large raids, and other such devices.
Large drives often have 4k sectors - don't you want a 1M offset to make sure the partition start is aligned? Gparted seems to do that by default. Also, is it possible to make the raid auto-assemble at boot like smaller ones do? I had to put an ARRAY entry into /etc/mdadm.conf with the devices but I think someone advised doing "set <partition number> raid on" in parted.
On 3/6/2013 1:12 PM, Les Mikesell wrote:
Large drives often have 4k sectors - don't you want a 1M offset to make sure the partition start is aligned?
256K is a multiple of 4K. sure, you could use 2048s instead if you wanted 1M boundaries, its all good.
On 03/06/2013 08:00 AM, Mark Snyder wrote:
- Avoid software RAID5 or 6, only use it for RAID1 or 10. Software
RAID5 performance can be abysmal, because of the parity calculations and the fact that each write to the array requires that all drives be read and written.
My understanding of Linux mdadm RAID5 is that a write will read the block being written and the parity block. The calculations can be done with only those blocks, and the two are written. That's one extra read per write plus parity calculations.
I'm quite certain that I've seem some hardware RAID arrays that will read the entire stripe to do a write.
RAID5 will always write more slowly than RAID1 or RAID10, but that can sometimes be acceptable if capacity is more important than performance.
Older hardware raid controllers can be pretty cheap on eBay, I'm using an old 3Ware on my home CentOS server.
If there's anything to avoid, it'd be old 3ware hardware. Those cards are often less reliable than the disks they're attached to, and that's saying something.
Avoid hostraid adapters, these are just software raid in the controller rather than the OS.
All hardware raid is "just software raid in the controller rather than the OS". The advantages of hardware RAID are offloading parity calculations to dedicated hardware so that the CPU doesn't need to do it, and a battery backed write cache.
The write cache is critical to safely writing a RAID array in the event of a power loss, and can greatly improve performance provided that you don't write enough data to fill the cache.
The host CPU is very often faster with parity than the dedicated hardware, which is why Alan Cox has been quoted as saying that the best RAID controllers in the world are made by Intel and AMD. However, if you think you need the couple of percent of CPU cycles that would have been used by software RAID, you might prefer the hardware solution.
If you are using drives over 1TB, consider partitioning the drives into smaller chunks, say around 500MB, and creating multiple arrays. That way if you get a read error on one sector that causes one of the raid partitions to be marked as bad, only that partition needs to be rebuild rather than the whole drive.
If you have a disk on which a bad sector is found, it's time to replace it no matter how your partitions are set up. Drives reserve a set of sectors for re-mapping sectors that are detected as bad. If your OS sees a bad sector, it's because that reserve has been exhausted. More sectors will continue to go bad, and you will lose data. Always replace a drive as soon as your OS sees bad sectors, or before based on SMART data.
Partitioning into many smaller chunks is probably a waste of time. Like most of the other participants in this thread, I create software RAID sets of one or two partitions per disk and use LVM on top of that.
Hopefully BTRFS will simplify this even further in the near future. :)
On Wed, Mar 6, 2013 at 1:27 PM, Gordon Messmer yinyang@eburg.com wrote:
Hopefully BTRFS will simplify this even further in the near future. :)
I wouldn't hold my breath. Someone on the backuppc list reported that it had a tiny limit on hardlinks which would make it seem to me like they don't quite understand how filesystems are supposed to work.
On 03/06/2013 11:49 AM, Les Mikesell wrote:
I wouldn't hold my breath. Someone on the backuppc list reported that it had a tiny limit on hardlinks which would make it seem to me like they don't quite understand how filesystems are supposed to work.
The limitation was fixed in 3.7: https://bugzilla.kernel.org/show_bug.cgi?id=15762
I think the btfs developers understand quite well how filesystems are supposed to work. As best I understand it, the limitation was a result of back-references, which are an important feature to keep inodes from ending up as orphans. If you're going to have an on-line fsck, that matters.
On Wed, Mar 6, 2013 at 2:16 PM, Gordon Messmer yinyang@eburg.com wrote:
On 03/06/2013 11:49 AM, Les Mikesell wrote:
I wouldn't hold my breath. Someone on the backuppc list reported that it had a tiny limit on hardlinks which would make it seem to me like they don't quite understand how filesystems are supposed to work.
The limitation was fixed in 3.7: https://bugzilla.kernel.org/show_bug.cgi?id=15762
I think the btfs developers understand quite well how filesystems are supposed to work. As best I understand it, the limitation was a result of back-references, which are an important feature to keep inodes from ending up as orphans. If you're going to have an on-line fsck, that matters.
OK, but in my opinion it is worse if that was a design decision that is just now being changed. Bugs I can understand, but not choosing to design a new filesystem with unrealistic limitations.
On 03/06/2013 12:20 PM, Les Mikesell wrote:
OK, but in my opinion it is worse if that was a design decision that is just now being changed. Bugs I can understand, but not choosing to design a new filesystem with unrealistic limitations.
It's not being changed, per se. A reasonable means of increasing the number of back-references has been implemented. (again, AFAICT)
On 03/04/2013 07:53 PM, Chris Weisiger wrote:
Currently I configured the complete individual drivers as software raid, then created a /dev/md0 with the drives
If you configure an entire drive as a raid device, you'd have a device name like /dev/mdp0, which you'd then partition. I think what you've done is created only one partition per disk, and made those partitions into a RAID set. That's not wrong, but it's not the same thing.
Non-partitionable RAID sets such as the one you've created are the most common configuration for software RAID. Hardware RAID volumes are almost always the partionable type.
Is the the proper way to configure software raid?
"Proper" is relative to its fitness for a specific purpose. As you haven't indicated a specific purpose, "proper" doesn't have any real meaning.
The array you've created will work, and it will protect your data from loss due to the failure of a single disk. You need to make sure your "root" mail is delivered to someone who will read it in a timely manner, or else that protection is not useful. The array's performance will be relatively lower than a single-drive configuration or a RAID10 configuration, but that may be acceptable for bulk storage. The array will not protect you from filesystem corruption or from accidental deletion. Subject to those and other limitations, your array seems more or less proper.