Growing HW RAID arrays, Online

List overview All Threads
Download

newer

older

Cange from Samba 3.0 to samba 3...

transparent_huge_pages problem...

Billy Crook

20 Feb 2014 20 Feb '14

10:50 p.m.

We add disks to an LSI raid array periodically to increase the amount of available space for business needs.

It is understood that this process starts with metal, and has many layers that must each adjust to make use of the additional space. Each of these layers also says that it can do that 'online' without interruption or rebooting. But making it happen is not that easy.

When the HW raid controller's grow completes, we echo 1 to /sys/bus/scsi/devices/#:#:#:#/rescan and the kernel notices and updates the size of the block device. (sdc in this case) sdc1 is the only partition on the device, and should consume the entire device.

sdc1 is a PV in a VG that holds production data and must not become unavailable at any time. After growing sdc as mentioned earlier, parted notices that the end-of-drive partition table is missing, fixes it, and grows its disk size to match the new size of sdc.

It all makes sense up to this point. but what happens next is what I need some advice on. How do we grow sdc1, online? parted says it doesn't support 'resize' on the filesystem (LVM PV).

The usual answer to parted's lack of support for filesystems, and insistence to only resize partitions when it can stick its nose in the filesystem and do that too is: parted sdc rm 1, parted mkpart primary 0% 100% (thus making a new partition "Around" the old one)

That should work, but I can't get the kernel to 'notice' that sdc1 is now larger. hdparm -z barfs up an error that sdc is in use. I know that rebooting likely will fix it. But we cannot reboot. We also cannot keep making more partitions every time we add a disk. So that's not a solution either. We need to GROW the gpt partitions online, or use another partitioning type that supports >6TB I've googled it for hours and found no good solutions.

This same situation would affect VMs with virtual disks that grow over time to satisfy business needs, as well as servers mounting iSCSI/FC storage that grows over time. How would you grow this online?

Going without partitions at all and putting the pv directly on sdc is no good either. So we need partitions, and msdos tables don't support

...

2TB and the only other in practical use that I know of is gpt, and

thse aparently can't expand online!

Here's where I'm at now in case you're curious [root@host lib]# parted /dev/sdc unit s print free

Model: SMC SMC2108 (scsi) Disk /dev/sdc: 8189439999s Sector size (logical/physical): 512B/512B Partition Table: gpt

Number Start End Size File system Name Flags 1 34s 8189439966s 8189439933s primary

Information: Don't forget to update /etc/fstab, if necessary.

[root@host lib]# cat /sys/bus/scsi/devices/0:2:2:0/block:sdc/size 8189440000 [root@host lib]# cat /sys/bus/scsi/devices/0:2:2:0/block:sdc/sdc1/size 7019519933

# I bet the above 7billion will be around 8billion at the next reboot. (Each physical disk has about a billion sectors

-- Billy Crook * Network and Security Administrator * RiskAnalytics, LLC

Show replies by date

Phoenix, Merka

22 Feb 22 Feb

1:50 a.m.

Hi Billy,

...

...
add disks to an LSI raid array periodically to increase the amount of available space for business needs sdc1 is a PV in a VG that holds production data and must not become unavailable at any time How do we grow sdc1, online?

If you are using the Logical Volume Manager (LVM ) on Linux, you should not have to grow the PV each time. Instead, carve Logical Units (LUNs) out of the RAID array and present them to the operating system as disk devices that can be initialized as physical volumes (PVs).

LVM can then be used to add (or remove) PVs to a Volume Group (VG) without having to reboot. Logical volumes (LVs) are carved out of the VG and present to the operating system (Linux) as block devices on which filesystems (one filesystem per block device) can be created.

While you can resize the H/W RAID "online", and add/remove PVs to a VG "online", you still need to unmount a filesystem before resizing both the LV and the corresponding filesystem that was created on the LV. Attempting to resize a filesystem that is mounted and actively being used is just asking for data corruption.

Both the LV and the filesystem can be resized "on the fly" without rebooting, but you still have to unmount the filesystem first before resizing either. Resizing the 'root' filesystem (or any filesystem on which the core operating system has files open) requires shutting down and booting into an alternate boot env (for example, the "rescue" boot cd) -- presenting another good argument for separating operating system files and user/application files. This is why /var/log is often mounted on a separate filesystem.

At the filesystem level, remember that Linux allows you to mount filesystems at various mount points within the directory tree. Most systems do not have the entire directory tree contained on a single filesystem. The 'root' filesystem is typically just large enough to hold the basic operating system, and then the rest of the files (applications, user data, and application data) are stored on separate filesystems that are mounted on the directory tree at various points (for example /home, /data, /opt, /opt/dedicated/app1, etc.)

...

From the user (and applications') view, the files still appear to be stored on one gigantic single filesystem, even though it is actually mapped out to two or more filesystems.

By structuring/segmenting the system's directory tree this way, you gain the ability to unmount and resize portions of the tree without having to shutdown and reboot. The 'fsck' pass also runs much more quickly when your filesystems are not in the terabyte size. Unless you are creating individual files that are gigabytes/terabytes in size, there is little benefit in having a massive filesystem (250 GB or larger). Remember, the larger the filesystem, the flatter your data becomes when (not if) the filesystem fails.

Cheers!

Simba Engineering

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Billy Crook Sent: Thursday, 20 February, 2014 13:50 To: CentOS mailing list Subject: [CentOS] Growing HW RAID arrays, Online

We add disks to an LSI raid array periodically to increase the amount of available space for business needs.

It all makes sense up to this point. but what happens next is what I need some advice on. How do we grow sdc1, online? parted says it doesn't support 'resize' on the filesystem (LVM PV).

Going without partitions at all and putting the pv directly on sdc is no good either. So we need partitions, and msdos tables don't support

...

2TB and the only other in practical use that I know of is gpt, and

thse aparently can't expand online!

Here's where I'm at now in case you're curious [root@host lib]# parted /dev/sdc unit s print free

Model: SMC SMC2108 (scsi) Disk /dev/sdc: 8189439999s Sector size (logical/physical): 512B/512B Partition Table: gpt

Number Start End Size File system Name Flags 1 34s 8189439966s 8189439933s primary

Information: Don't forget to update /etc/fstab, if necessary.

[root@host lib]# cat /sys/bus/scsi/devices/0:2:2:0/block:sdc/size 8189440000 [root@host lib]# cat /sys/bus/scsi/devices/0:2:2:0/block:sdc/sdc1/size 7019519933

# I bet the above 7billion will be around 8billion at the next reboot. (Each physical disk has about a billion sectors

-- Billy Crook * Network and Security Administrator * RiskAnalytics, LLC _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

John R Pierce

2:05 a.m.

On 2/21/2014 4:50 PM, Phoenix, Merka wrote:

...

Both the LV and the filesystem can be resized "on the fly" without rebooting, but you still have to unmount the filesystem first before resizing either.

this is not true for XFS, you can grow XFS online without unmounting it, with live activity. just extend the LV, then xfs_grow /path/to/volume and it picks up the new size and adjusts the file system to suit, live and on the fly.

-- john r pierce 37N 122W somewhere on the middle of the left coast

Billy Crook

2:06 a.m.

On Fri, Feb 21, 2014 at 6:50 PM, Phoenix, Merka merka.phoenix@hp.com wrote:

...

...
...
add disks to an LSI raid array periodically to increase the amount of available space for business needs sdc1 is a PV in a VG that holds production data and must not become unavailable at any time How do we grow sdc1, online?

If you are using the Logical Volume Manager (LVM ) on Linux, you should not have to grow the PV each time. Instead, carve Logical Units (LUNs) out of the RAID array and present them to the operating system as disk devices that can be initialized as physical volumes (PVs).

My raid controller doesn't permit adding luns to an exiating array that has a single lun. It requires it to bave originated as a multiple-lun array. Even if it did permit this, I need the number of PVs to stay fairly consistent for logistical reasons. I cannot have a new one come into existance every time I grow the array.

...

While you can resize the H/W RAID "online", and add/remove PVs to a VG "online", you still need to unmount a filesystem before resizing both the LV and the corresponding filesystem that was created on the LV. Attempting to resize a filesystem that is mounted and actively being used is just asking for data corruption.

Not true, most filesystems support online grow/expansion. The ones I'm using do, and it works fine, and is fairly quick.

I am aware of how lvm, and filesystems work. I don't need help with those. I'm asking one thing: how to get the kernel to notice that a partition has grown.

James A. Peltier

7:24 a.m.

----- Original Message ----- | On Fri, Feb 21, 2014 at 6:50 PM, Phoenix, Merka | merka.phoenix@hp.com wrote:

<snip>

| I am aware of how lvm, and filesystems work. I don't need help with | those. I'm asking one thing: how to get the kernel to notice that a | partition has grown.

Don't use partitions. Use whole disk PVs and avoid partitioning all together. With LVM there are no need for partitions. When you grow the underlying PV and then rescan the bus to see the new sizes you just start using the new space.

If you choose to use partitions then you need to scan the disk to detect the new size, create a new partition (easiest) that uses the free space, configure it as LVM (8e) pvcreate on the new partition, vgextend the VG to the new partition and you're off and running. Alternatively, you can make note of the current partition boundaries, delete the existing partition, recreate it on the exact partition starting boundary and make it the size of the total disk space.

The choice is yours. I use whole disk PVs myself.

-- James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices "Around here, however, we don’t look backwards for very long. We KEEP MOVING FORWARD, opening up new doors and doing things because we’re curious and curiosity keeps leading us down new paths." - Walt Disney

Billy Crook

8:12 a.m.

On Sat, Feb 22, 2014 at 12:24 AM, James A. Peltier jpeltier@sfu.ca wrote:

...

The choice is yours. I use whole disk PVs myself.

Indeed I did originally use whole-disk PVs. But Anaconda doesn't support them so during a recent rebuild we went to partitions. I'm prepared to blame anaconda for that to an extent.

But I also want to know a tangible reason why the kernel can't rescan partitions as it can block device sizes.

James A. Peltier

11:27 p.m.

----- Original Message ----- | On Sat, Feb 22, 2014 at 12:24 AM, James A. Peltier jpeltier@sfu.ca | wrote: | > The choice is yours. I use whole disk PVs myself. | | Indeed I did originally use whole-disk PVs. But Anaconda doesn't | support them so during a recent rebuild we went to partitions. I'm | prepared to blame anaconda for that to an extent. | | But I also want to know a tangible reason why the kernel can't rescan | partitions as it can block device sizes.

partprobe can rescan partitions, but it can't resize them. You may be able to use gparted or the parted text mode to resize partitions online. The statement that anaconda can't is not really true. Through kickstart you can do pretty much anything you can script. You could use a prescript or postscript in kickstart to setup the partition layout and what not.

If you have a clear separation of the OS disk from the data disks then it's a simple post script to create the data disk as a full disk PVs.

Nux!

23 Feb 23 Feb

3:33 p.m.

On 22.02.2014 22:27, James A. Peltier wrote:

...

partprobe can rescan partitions, but it can't resize them. You may be able to use gparted or the parted text mode to resize partitions online.

Sadly you can't really do this without reboot. I'd love to be wrong, but I hit the same problem in the past and I simply found no way of doing it. Even with centos cloud instances, this operation (resize partition) has to be done from initramfs before the filesystems go "live".

Lucian

-- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro

Billy Crook

6:36 p.m.

On Sun, Feb 23, 2014 at 8:33 AM, Nux! nux@li.nux.ro wrote:

...

On 22.02.2014 22:27, James A. Peltier wrote:

...
partprobe can rescan partitions, but it can't resize them. You may be able to use gparted or the parted text mode to resize partitions online.

Sadly you can't really do this without reboot. I'd love to be wrong, but I hit the same problem in the past and I simply found no way of doing it. Even with centos cloud instances, this operation (resize partition) has to be done from initramfs before the filesystems go "live".

So my question is 'why can't partitions be grown live like disks can?' I'm tempted to call this a bug.

James A. Peltier

24 Feb 24 Feb

4:35 a.m.

Try

blockdev --rereadpt /dev/sdX

of course substituting /dev/sdX for the correct device.

Andrew Holway

23 Feb 23 Feb

1:08 a.m.

On 20 February 2014 21:50, Billy Crook bcrook@riskanalytics.com wrote:

...

We add disks to an LSI raid array periodically to increase the amount of available space for business needs.

I *would* highly recommend ZFS for this kind of application. The ability to dynamically expand the zpool (zpool is the zfs "volume manager") is excellent for these kinds of applications. ZFS on Linux is still a bit.....hmm....young as a filesystem so give it another year perhaps. Its being heavily developed by Lawrence Livermore National Lab at the moment for use with the Lustre Parallel filesystem and so NFS export of ZFS filesystems is still a bit flaky however the core is good and stable now.

With RHEL 7 and Centos 7 we shall see btrfs as a native filesystem. This should make this kinda thing extremely easy.

4316

Age (days ago)

4320

Last active (days ago)

discuss@lists.centos.org

10 comments

6 participants

tags (0)

participants (6)

Andrew Holway
Billy Crook
James A. Peltier
John R Pierce
Nux!
Phoenix, Merka