On a CentOS 6 64bit system, I added a couple prototype SAS SSDs on a HP P411 raid controller (I believe this is a rebranded LSI megaraid with HP firmware) and am trying to format them for best random IO performance with something like postgresql.
so, I used the raid command tool to build a raid0 with 2 SAS SSDs
# hpacucli ctrl slot=1 logicaldrive 3 show detail
Smart Array P410 in Slot 1
array C
Logical Drive: 3 Size: 186.3 GB Fault Tolerance: RAID 0 Heads: 255 Sectors Per Track: 32 Cylinders: 47869 Strip Size: 256 KB Status: OK Array Accelerator: Enabled Unique Identifier: 600508B1001C2EDB6026F9ADF9F88A09 Disk Name: /dev/sdc Mount Points: /ssd 186.3 GB Logical Drive Label: AF36B716PACCRCN810E1R9J646A
# hpacucli ctrl slot=1 show config
Smart Array P410 in Slot 1 (sn: PACCRCN810E1R9J) .... array C (Solid State SAS, Unused Space: 0 MB)
logicaldrive 3 (186.3 GB, RAID 0, OK)
physicaldrive 1I:1:23 (port 1I:box 1:bay 23, Solid State SAS, 100 GB, OK) physicaldrive 1I:1:24 (port 1I:box 1:bay 24, Solid State SAS, 100 GB, OK)
# hpacucli ctrl slot=1 show ssdinfo detail
Smart Array P410 in Slot 1 Total Solid State Drives with Wearout Status: 0 Total Smart Array Solid State Drives: 2 Total Solid State SAS Drives: 2 Total Solid State Drives: 2
array C
physicaldrive 1I:1:23 Port: 1I Box: 1 Bay: 23 Status: OK Drive Type: Data Drive Interface Type: Solid State SAS Size: 100 GB Firmware Revision: 1234 Serial Number: 999999999999999999 Model: XYZZY M2011 Current Temperature (C): 30 Maximum Temperature (C): 37 SSD Smart Trip Wearout: Not Supported PHY Count: 2 PHY Transfer Rate: 6.0GBPS, Unknown
physicaldrive 1I:1:24 Port: 1I Box: 1 Bay: 24 Status: OK Drive Type: Data Drive Interface Type: Solid State SAS Size: 100 GB Firmware Revision: 1234 Serial Number: 999999999999999999 Model: XYZZY M2011 Current Temperature (C): 29 Maximum Temperature (C): 36 SSD Smart Trip Wearout: Not Supported PHY Count: 2 PHY Transfer Rate: 6.0GBPS, Unknown
# tail /var/log/messages Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: Attached scsi generic sg3 type 0 Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] 390611040 512-byte logical blocks: (199 GB/186 GiB) Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] 8192-byte physical blocks Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Write Protect is off Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Oct 22 22:56:24 svfis-dl180b kernel: sdc: unknown partition table Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Attached SCSI disk Oct 22 22:56:36 svfis-dl180b cmaeventd[2540]: Logical drive 3 of Array Controller in slot 1, has changed from status Unconfigured to OK
# mkfs.ext4 /dev/sdc mke2fs 1.41.12 (17-May-2010) /dev/sdc is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=8192 (log=3) Fragment size=8192 (log=3) Stride=1 blocks, Stripe width=0 blocks 12210528 inodes, 24413190 blocks 1220659 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4311218176 373 block groups 65528 blocks per group, 65528 fragments per group 32736 inodes per group Superblock backups stored on blocks: 65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872, 5307768, 8191000, 15923304, 22476104
Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
# mount -t ext4 /dev/sdc /ssd mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
# tail /var/log/messages ... Oct 22 23:54:36 svfis-dl180b kernel: EXT4-fs (sdc): bad block size 8192
ok, so lets try 4K blocks?
# mkfs.ext4 -b 4096 /dev/sdc mke2fs 1.41.12 (17-May-2010) /dev/sdc is entire device, not just one partition! Proceed anyway? (y,n) y mkfs.ext4: Invalid argument while setting blocksize; too small for device
hmmm. can't do that either?
can I configure this 64bit system for large pages or something so it will support 8K blocks?
Maybe try to partition it to see what happens.
On 10/23/2011 12:07 AM, John R Pierce wrote:
On a CentOS 6 64bit system, I added a couple prototype SAS SSDs on a HP P411 raid controller (I believe this is a rebranded LSI megaraid with HP firmware) and am trying to format them for best random IO performance with something like postgresql.
so, I used the raid command tool to build a raid0 with 2 SAS SSDs
# hpacucli ctrl slot=1 logicaldrive 3 show detail
Smart Array P410 in Slot 1
array C Logical Drive: 3 Size: 186.3 GB Fault Tolerance: RAID 0 Heads: 255 Sectors Per Track: 32 Cylinders: 47869 Strip Size: 256 KB Status: OK Array Accelerator: Enabled Unique Identifier: 600508B1001C2EDB6026F9ADF9F88A09 Disk Name: /dev/sdc Mount Points: /ssd 186.3 GB Logical Drive Label: AF36B716PACCRCN810E1R9J646A
# hpacucli ctrl slot=1 show config
Smart Array P410 in Slot 1 (sn: PACCRCN810E1R9J) .... array C (Solid State SAS, Unused Space: 0 MB)
logicaldrive 3 (186.3 GB, RAID 0, OK) physicaldrive 1I:1:23 (port 1I:box 1:bay 23, Solid State SAS, 100
GB, OK) physicaldrive 1I:1:24 (port 1I:box 1:bay 24, Solid State SAS, 100 GB, OK)
# hpacucli ctrl slot=1 show ssdinfo detail
Smart Array P410 in Slot 1 Total Solid State Drives with Wearout Status: 0 Total Smart Array Solid State Drives: 2 Total Solid State SAS Drives: 2 Total Solid State Drives: 2
array C physicaldrive 1I:1:23 Port: 1I Box: 1 Bay: 23 Status: OK Drive Type: Data Drive Interface Type: Solid State SAS Size: 100 GB Firmware Revision: 1234 Serial Number: 999999999999999999 Model: XYZZY M2011 Current Temperature (C): 30 Maximum Temperature (C): 37 SSD Smart Trip Wearout: Not Supported PHY Count: 2 PHY Transfer Rate: 6.0GBPS, Unknown physicaldrive 1I:1:24 Port: 1I Box: 1 Bay: 24 Status: OK Drive Type: Data Drive Interface Type: Solid State SAS Size: 100 GB Firmware Revision: 1234 Serial Number: 999999999999999999 Model: XYZZY M2011 Current Temperature (C): 29 Maximum Temperature (C): 36 SSD Smart Trip Wearout: Not Supported PHY Count: 2 PHY Transfer Rate: 6.0GBPS, Unknown
# tail /var/log/messages Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: Attached scsi generic sg3 type 0 Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] 390611040 512-byte logical blocks: (199 GB/186 GiB) Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] 8192-byte physical blocks Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Write Protect is off Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Oct 22 22:56:24 svfis-dl180b kernel: sdc: unknown partition table Oct 22 22:56:24 svfis-dl180b kernel: sd 0:0:0:3: [sdc] Attached SCSI disk Oct 22 22:56:36 svfis-dl180b cmaeventd[2540]: Logical drive 3 of Array Controller in slot 1, has changed from status Unconfigured to OK
# mkfs.ext4 /dev/sdc mke2fs 1.41.12 (17-May-2010) /dev/sdc is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=8192 (log=3) Fragment size=8192 (log=3) Stride=1 blocks, Stripe width=0 blocks 12210528 inodes, 24413190 blocks 1220659 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4311218176 373 block groups 65528 blocks per group, 65528 fragments per group 32736 inodes per group Superblock backups stored on blocks: 65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872, 5307768, 8191000, 15923304, 22476104
Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
# mount -t ext4 /dev/sdc /ssd mount: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
# tail /var/log/messages ... Oct 22 23:54:36 svfis-dl180b kernel: EXT4-fs (sdc): bad block size 8192
ok, so lets try 4K blocks?
# mkfs.ext4 -b 4096 /dev/sdc mke2fs 1.41.12 (17-May-2010) /dev/sdc is entire device, not just one partition! Proceed anyway? (y,n) y mkfs.ext4: Invalid argument while setting blocksize; too small for device
hmmm. can't do that either?
can I configure this 64bit system for large pages or something so it will support 8K blocks?
On 10/23/11 12:23 AM, Ken godee wrote:
Maybe try to partition it to see what happens.
with parted at least, I'm stuck with a vicious circle that won't let me align the data right?
# parted /dev/sdc GNU Parted 2.1 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel msdos Warning: The existing disk label on /dev/sdc will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? y (parted) mkpart primary ext4 512k -1s Warning: The resulting partition is not properly aligned for best performance. Ignore/Cancel? i (parted) quit
# mkfs.ext4 /dev/sdc1 mke2fs 1.41.12 (17-May-2010) /dev/sdc1 alignment is offset by 4096 bytes. This may result in very poor performance, (re)-partitioning suggested. Filesystem label= OS type: Linux Block size=8192 (log=3) Fragment size=8192 (log=3) Stride=1 blocks, Stripe width=0 blocks 12210528 inodes, 24413127 blocks 1220656 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4311218176 373 block groups 65528 blocks per group, 65528 fragments per group 32736 inodes per group Superblock backups stored on blocks: 65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872, 5307768, 8191000, 15923304, 22476104
Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 22 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
# mount -t ext4 /dev/sdc1 /ssd mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
# tail /var/log/messages .... Oct 23 00:27:43 svfis-dl180b kernel: EXT4-fs (sdc1): bad block size 8192
GRRR ok. um.
# parted /dev/sdc GNU Parted 2.1 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit b (parted) print Model: HP LOGICAL VOLUME (scsi) Disk /dev/sdc: 199992852480B Sector size (logical/physical): 512B/8192B Partition Table: msdos
Number Start End Size Type File system Flags 1 512000B 199992852479B 199992340480B primary ext4
(parted) rm 1 (parted) mkpart primary ext4 1024s -1s Warning: The resulting partition is not properly aligned for best performance. Ignore/Cancel? y parted: invalid token: y Ignore/Cancel? ignore (parted) print Model: HP LOGICAL VOLUME (scsi) Disk /dev/sdc: 199992852480B Sector size (logical/physical): 512B/8192B Partition Table: msdos
Number Start End Size Type File system Flags 1 524288B 199992852479B 199992328192B primary
(parted) quit
# mkfs.ext4 /dev/sdc1 mke2fs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=8192 (log=3) Fragment size=8192 (log=3) Stride=1 blocks, Stripe width=0 blocks 12210528 inodes, 24413126 blocks 1220656 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4311218176 373 block groups 65528 blocks per group, 65528 fragments per group 32736 inodes per group Superblock backups stored on blocks: 65528, 196584, 327640, 458696, 589752, 1638200, 1769256, 3210872, 5307768, 8191000, 15923304, 22476104
Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
*and, yup, 8K blocks still won't mount*
so...
# mkfs.ext4 -b 4096 -F /dev/sdc1 mke2fs 1.41.12 (17-May-2010) Warning: specified blocksize 4096 is less than device physical sectorsize 8192, forced to continue Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) ^^^^^^ ...
fixes it. have to use -F to format this thing. now I''m seeing IO more like what I'd expect to see.
On 10/23/2011 09:48 AM, John R Pierce wrote:
On 10/23/11 12:23 AM, Ken godee wrote:
Maybe try to partition it to see what happens.
with parted at least, I'm stuck with a vicious circle that won't let me align the data right?
Didn't parted have issues with alignment? Here are two links with info about alignment of SSDs which I found helpful in the past:
http://www.ocztechnologyforum.com/forum/showthread.php?54379-Linux-Tips-twea...
http://www.linux-mag.com/id/8397/
Hope this helps.
Regards, Patrick
On 10/23/11 4:00 AM, Patrick Lists wrote:
Didn't parted have issues with alignment? Here are two links with info about alignment of SSDs which I found helpful in the past:
parted handles alignment as well or better than fdisk, which that blog suggested using.
anyways, I have it formatted and mounted and aligned now.
this SSD raid is telling the OS it has 8K physical sectors,(512 byte logical). mkfs ext4 or xfs will create a file system with 8K logical blocks, but the kernel won't let me mount it because its larger than the systems 4K page size..., so I have to force mkfs to build a 4K block file system.
my database (postgres) uses 8K blocks. the storage has 8k physical blocks. it seems to me that having the file system block match the database and physical blocks would be a Very Good Thing...
... so, whats the status of large page support in linux and specifically centos 6 ?
From: John R Pierce pierce@hogranch.com
On a CentOS 6 64bit system, I added a couple prototype SAS SSDs on a HP P411 raid controller ... Disk Name: /dev/sdc ...
Just wondering how come the array is detected as /dev/sd* instead of the classical /dev/cciss/c0d*... Is the P411 a fake raid?
JD
On 10/25/11 2:16 AM, John Doe wrote:
Just wondering how come the array is detected as /dev/sd* instead of the classical /dev/cciss/c0d*... Is the P411 a fake raid?
no, its a seriously fast pci-e SAS2 hardware raid. the physical devices don't show at all. it has 1GB of writeback cache thats backed by flash with a supercap, instead of the traditional battery that dies in 3 years.
I didn't have to install any special drivers to use it, C6 just saw it as-is, /dev/sda is a raid1 of disks 0,1, /dev/sdb is a raid10 of disks 2-22, and sdc is a raid0 of SSD 23,24 (disk 25 is a hot spare for sda,sdb)
its configurable with the hpacucli command tool I got from HP's site.
# lspci -vnn -s 6:0.0 06:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array G6 controllers [103c:323a] (rev 01) Subsystem: Hewlett-Packard Company Smart Array P410 [103c:3243] Flags: bus master, fast devsel, latency 0, IRQ 24 Memory at fbc00000 (64-bit, non-prefetchable) [size=2M] Memory at fbbff000 (64-bit, non-prefetchable) [size=4K] I/O ports at d800 [size=256] Expansion ROM at fbb00000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [ac] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Kernel driver in use: hpsa Kernel modules: hpsa
# hpacucli ctrl slot=1 show detail
Smart Array P410 in Slot 1 Bus Interface: PCI Slot: 1 Serial Number: PACCRCN810E1R9J Cache Serial Number: PBCDF0CRH0Q13H RAID 6 (ADG) Status: Disabled Controller Status: OK Hardware Revision: Rev C Firmware Version: 5.12 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Surface Scan Mode: Idle Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Accelerator Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 1024 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Capacitors Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True
dunno if that helps?
since I got it sorted out with 4k blocks and XFS, I'm seeing about 12000 write IOPS via pgbench to that sdb raid10, and 16000 wr/s to the sdc SSD raid1, these are both pretty close to flat out for the disks. sustained writes from iozone hit 1.2GB/sec on sdb and 800MB/s on sdc, also pretty much hardware bandwidth of the disks.
From: John R Pierce pierce@hogranch.com
On 10/25/11 2:16 AM, John Doe wrote:
Just wondering how come the array is detected as /dev/sd* instead of the
classical /dev/cciss/c0d*...
Is the P411 a fake raid?
no, its a seriously fast pci-e SAS2 hardware raid. the physical devices don't show at all. it has 1GB of writeback cache thats backed by flash with a supercap, instead of the traditional battery that dies in 3 years.
Indeed, nice specs. How much does it cost...?
I didn't have to install any special drivers to use it, C6 just saw it as-is, /dev/sda
Guess this new ctrl does not use the cciss module anymore.
Thx, JD
On 10/25/2011 12:20 PM, John Doe wrote:
Guess this new ctrl does not use the cciss module anymore.
If it's like a P410i like what I have it uses the hpsa driver.
Mogens
On 10/25/11 4:48 AM, Mogens Kjaer wrote:
On 10/25/2011 12:20 PM, John Doe wrote:
Guess this new ctrl does not use the cciss module anymore.
If it's like a P410i like what I have it uses the hpsa driver.
yes, says as much on that lspci -v output I pasted earlier tonight in this thread.
and its exactly like the p410i, just not integrated on the mainboard, instead its a pci-e card. at least, I think it is. these servers came preassembled and I don't recall looking inside when I plonked them on the racks and fired them up.
On Tuesday, October 25, 2011 01:48:13 PM Mogens Kjaer wrote:
On 10/25/2011 12:20 PM, John Doe wrote:
Guess this new ctrl does not use the cciss module anymore.
If it's like a P410i like what I have it uses the hpsa driver.
HP is moving from the old cciss driver to the new hpsa driver. On C5 all controllers user cciss but on C6 the newer generation (like 411, 812, ..) use hpsa.
In the future all controllers may be moved to hpsa (afaict, but who knows).
The old driver used the odd /dev/cciss/cXdYpZ naming and was a non-scsi block device driver. hpsa is (as almost everything else) a scsi driver (finally).
/Peter
From: Peter Kjellström cap@nsc.liu.se
On Tuesday, October 25, 2011 01:48:13 PM Mogens Kjaer wrote:
On 10/25/2011 12:20 PM, John Doe wrote:
Guess this new ctrl does not use the cciss module anymore.
If it's like a P410i like what I have it uses the hpsa driver.
HP is moving from the old cciss driver to the new hpsa driver. On C5 all controllers user cciss but on C6 the newer generation (like 411, 812, ..) use hpsa. In the future all controllers may be moved to hpsa (afaict, but who knows). The old driver used the odd /dev/cciss/cXdYpZ naming and was a non-scsi block device driver. hpsa is (as almost everything else) a scsi driver (finally).
Good to know. I will have to adapt my kickstart/scripts to handle both...
Thx to all, JD