Hey everyone,
My company is beginning to look at using SSD drives in our CentOS based servers. Does C5 and C6 support TRIM and other "required" functions for the SSD to operate?
Thanks,
Andrew Reis
Microsoft Certified Technology Specialist
CompTIA Network+
Networking/Systems Analyst
Webmaster
DBMS Inc.
On 15.07.2013 14:45, Andrew Reis wrote:
Hey everyone,
My company is beginning to look at using SSD drives in our CentOS based servers. Does C5 and C6 support TRIM and other "required" functions for the SSD to operate?
Hi,
As far as I know both Centos 5 and 6 support TRIM ("discard" option in fstab), but only for EXT4 filesystems (and probably XFS).
From: Nux! nux@li.nux.ro
As far as I know both Centos 5 and 6 support TRIM ("discard" option in fstab), but only for EXT4 filesystems (and probably XFS).
I do not think CentOS 5 supports TRIM (unless back-ported from 2.6.33)... http://kernelnewbies.org/Linux_2_6_33#head-b9b8a40358aaef60a61fcf12e90559007...
JD
On 07/15/2013 07:33 AM, John Doe wrote:
I do not think CentOS 5 supports TRIM (unless back-ported from 2.6.33)... http://kernelnewbies.org/Linux_2_6_33#head-b9b8a40358aaef60a61fcf12e90559007... JD _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
AFAIK, EL5 doesn't support trim. We've been using SSDs for DB servers for almost 2 years and love 'em. Conservative performance estimates were off-the-charts: PostgreSQL query results with a 95% reduction in query latency, even after extended, continuous use. Haven't seen trim make much difference in actual performance.
Main thing is DO NOT EVEN THINK OF USING CONSUMER GRADE SSDs. SSDs are a bit like a salt shaker, they have only a certain number of shakes and when it runs out of writes, well, the salt shaker is empty. Spend the money and get a decent Enterprise SSD. We've been conservatively using the (spendy) Intel drives with good results.
On 7/18/2013 6:17 PM, Lists wrote:
Main thing is DO NOT EVEN THINK OF USING CONSUMER GRADE SSDs. SSDs are a bit like a salt shaker, they have only a certain number of shakes and when it runs out of writes, well, the salt shaker is empty. Spend the money and get a decent Enterprise SSD. We've been conservatively using the (spendy) Intel drives with good results.
and not all Intel drives have the key features of supercap backed cache, and reliable write-acknowlegement behavior you want from a server.
that 95% (20:1) only applies to a SSD compared with a single desktop grade (7200rpm) disk.
do note, you can easily build proper SAS raids that are just about as fast as a single SSD when used for write intensive database OTLP operations, whether measured in raw disk IOPS or transactions/second, and they are many times bigger. SSD's have the biggest advantage over a single spinning disk in random read performance.
one funny thing I've noted about various SSD's. when they are new, they benchmark much faster than after they've been in production use. expect a several times slowdown in write performance once you've written approximately the size of the disk worth of blocks. NEVER let them get above about 75% full.
On 07/18/2013 06:55 PM, John R Pierce wrote:
and not all Intel drives have the key features of supercap backed cache, and reliable write-acknowlegement behavior you want from a server.
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
that 95% (20:1) only applies to a SSD compared with a single desktop grade (7200rpm) disk.
do note, you can easily build proper SAS raids that are just about as fast as a single SSD when used for write intensive database OTLP
Yes, but an array can be built with SSDs as well. Its performance will have the same advantage over the SAS array that an SSD has over a single drive.
one funny thing I've noted about various SSD's. when they are new, they benchmark much faster than after they've been in production use. expect a several times slowdown in write performance once you've written approximately the size of the disk worth of blocks. NEVER let them get above about 75% full.
Again, yes, but that's what TRIM is for. The slowdown you noticed is the result of using a filesystem or array that didn't support TRIM.
My understanding is that some of the current generation of drives no longer need TRIM. The wear-leveling and block remapping features already present were combined with a percentage of reserved blocks to automatically reset blocks as they're re-written. I couldn't name those drives, though.
On 7/19/2013 12:54 AM, Gordon Messmer wrote:
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
done right, there should be two UPS's, each hooked up to alternate redundant power supplies in each chassis.
even so, things happen. a PDU gets tripped and shuts off a whole rack unexpectedly.
On 2013-07-19 3:54 AM, Gordon Messmer wrote:
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
It should also be a 'true sine wave' output when running on battery. Many UPS units output a 'stepped approximation' (typically pulse width modulation), which some computer power supplies may not like.
p.s. not really CentOS-related /per se/, but I have set centos@centos.org's entry in the address book to receive Plain Text... still, this looks like HTML, so far. What other setting might I need to check in Thunderbird 17?
On 7/19/2013 5:51 AM, Darr247 wrote:
On 2013-07-19 3:54 AM, Gordon Messmer wrote:
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
It should also be a 'true sine wave' output when running on battery. Many UPS units output a 'stepped approximation' (typically pulse width modulation), which some computer power supplies may not like.
virtually all PC and server power supplies now days are 'switchers', and could care less what the input wave form looks like. they full wave rectify the input voltage to DC, then chop it at 200Khz or so and run it through a toroidal transformer to generate the various DC voltages.
On 2013-07-19 1:01 PM, John R Pierce wrote:
On 7/19/2013 5:51 AM, Darr247 wrote:
On 2013-07-19 3:54 AM, Gordon Messmer wrote:
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
It should also be a 'true sine wave' output when running on battery. Many UPS units output a 'stepped approximation' (typically pulse width modulation), which some computer power supplies may not like.
virtually all PC and server power supplies now days are 'switchers', and could care less what the input wave form looks like. they full wave rectify the input voltage to DC, then chop it at 200Khz or so and run it through a toroidal transformer to generate the various DC voltages.
Heh... go ahead and use stepped approximation UPS's then. What do I know; I'm just a dumb electrician.
On Jul 19, 2013 10:04 PM, "Darr247" darr247@gmail.com wrote:
On 2013-07-19 1:01 PM, John R Pierce wrote:
On 7/19/2013 5:51 AM, Darr247 wrote:
On 2013-07-19 3:54 AM, Gordon Messmer wrote:
Regardless of your storage, your system should be powered by a monitored UPS. Verify that it works, and the drive's cache shouldn't be a major concern.
It should also be a 'true sine wave' output when running on battery. Many UPS units output a 'stepped approximation' (typically pulse width modulation), which some computer power supplies may not like.
virtually all PC and server power supplies now days are 'switchers', and could care less what the input wave form looks like. they full wave rectify the input voltage to DC, then chop it at 200Khz or so and run it through a toroidal transformer to generate the various DC voltages.
Heh... go ahead and use stepped approximation UPS's then. What do I know; I'm just a dumb electrician.
I just trust Florida Flicker n Flash - never had outages.... more than once a day!
Sorry could not resist.......
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[Somewhat off-topic, but I see this misinformation so often I'll reply for the archives.....]
On 07/19/2013 01:01 PM, John R Pierce wrote:
On 7/19/2013 5:51 AM, Darr247 wrote:
It should also be a 'true sine wave' output when running on battery. Many UPS units output a 'stepped approximation' (typically pulse width modulation), which some computer power supplies may not like.
virtually all PC and server power supplies now days are 'switchers', and could care less what the input wave form looks like. they full wave rectify the input voltage to DC, then chop it at 200Khz or so and run it through a toroidal transformer to generate the various DC voltages.
Uh, while it's true that switching supplies are the norm these days, it's also true that a square wave or even a modified sine input waveform, even if the same RMS voltage, will have a lower peak voltage that might be out of range of the input rectifier(s) in the power supply.
Do the math; for a full square wave, V(rms) = V(peak) so a 120Vrms square wave has a peak voltage of 120V. In contrast, a sine wave with a peak voltage of 120V has a Vrms of only 84.8 volts. The input rectifiers, which work on peak voltage, when powered with a 120Vrms sinewave sees a peak voltage of 169.73 volts. Even power supplies that are rated 100-240Vrms aren't rated for peak voltages less than 141.42 volts. So a 120Vrms square wave has a peak voltage well below the tolerance for a typical 100-240Vrms rated power supply. This is the reason for modifed square wave UPS's, which get closer to the 169.73 peak voltage, but are loaded with odd-order harmonics that can play havoc with power-factor correction circuits in less-expensive but newer supplies.
So the waveform does matter, and a sine-wave or multi-step modified sine wave UPS is more likely to work, even with less well-designed supplies (I would say cheaper, but I've seen expensive supplies balk at anything but a true sine wave).
My standard example of this is the 90A 5V supply used in pairs in 3Com's CoreBuilder/CellPlex 7000 ATM switches. These were rather expensive and beefy supplies, but trying to power them with a modified sine UPS simply did not work, but a true sine UPS worked fine (and the glacial ATM PNNI reconvergence times when one or more of the five core switches went down for a short glitch wreaked havoc on our network!).
I also have here in production an older high-end industrial PC with a redundant power supply made by Astec that would consistently drop out with an alarm under modified sine wave UPS power. It works fine with the APC SmartUPS PWM-derived sine wave. A Cisco 7609 I have here, with non-Astec supplies, also does not work well at all with anything but a true sinewave UPS.
True-sine UPS's are desireable and even necessary for maximum compatibility, even with modern power supplies.
Am 07/19/2013 03:17 AM, schrieb Lists:
Main thing is DO NOT EVEN THINK OF USING CONSUMER GRADE SSDs. SSDs are a bit like a salt shaker, they have only a certain number of shakes and when it runs out of writes, well, the salt shaker is empty. Spend the money and get a decent Enterprise SSD. We've been conservatively using the (spendy) Intel drives with good results.
Hm. I'm not sure, if I'd go with that. In my understanding, I'd just buy something like a Samsung SSD 840 Pro (for not using TLC) and do a overprovisioning of about 60% of the capacity. With the 512GiB-Variant, I'd end up with 200GiB netto. By this way, I have no issues with TRIM or GC (there are always enough empty cells) and wear leveling is also a non-issue (at least right now...).
It's a lot cheaper than the "Enterprise Grade SSDs", which are still basically MLC-SSDs and are also doing just the same as we are. And for the price of those golden SSDs I get about 7 or 8 of the "Consumer SSD", so I just swap those out, whenever I feel like it. Or smart tells me to do so.
I have been following this and have some notes. Can you folks comment on them? I am considering migrating some systems to SSD but have not had time to set up a test system yet to verify it.
I found lots of references to TRIM, but it is not included with CentOS 5. However, I found that TRIM is in the newer hdparm which could be build from source, but AFIK is not included with CentOS 5 RPMS. That way, one could trim via a cron job?
Could you folks please comment on the below notes that I found from multiple sites online. These are what I was planning on doing for my systems. Notes include:
- use file system supporting TRIM (e.g., EXT4 or BTRFS). - update hdparm to get TRIM support on CentOS 5 - align on block erase boundaries for drive, or use 1M boundaries - use native, non LVM partitions - under provision (only use 60-75% of drive, leave unallocated space) - set noatime in /etc/fstab (or relatime w/ newer to keep atime data sane) - move some tmp files to tmpfs (e.g., periodic status files and things that change often)
- move /tmp to RAM (per some suggestions)
- use secure erase before re-use of drive - make sure drive has the latest firmware - add “elevator=noop” to the kernel boot options or use deadline, can change on a drive-by-drive basis (e.g., if HD + SSD in a system) - reduce swappiness of kernel via /etc/sysctl.conf:
vm.swappiness=1
vm.vfs_cache_pressure=50
-- or swap to HD, not SSD
- BIOS tuning to set drives to “write back” and using hdparm:
hdparm -W1 /dev/sda
Any comments?
--
Wade Hampton
On Fri, Jul 19, 2013 at 10:10 AM, Alexander Arlt centos@track5.de wrote:
Am 07/19/2013 03:17 AM, schrieb Lists:
Main thing is DO NOT EVEN THINK OF USING CONSUMER GRADE SSDs. SSDs are a bit like a salt shaker, they have only a certain number of shakes and when it runs out of writes, well, the salt shaker is empty. Spend the money and get a decent Enterprise SSD. We've been conservatively using the (spendy) Intel drives with good results.
Hm. I'm not sure, if I'd go with that. In my understanding, I'd just buy something like a Samsung SSD 840 Pro (for not using TLC) and do a overprovisioning of about 60% of the capacity. With the 512GiB-Variant, I'd end up with 200GiB netto. By this way, I have no issues with TRIM or GC (there are always enough empty cells) and wear leveling is also a non-issue (at least right now...).
It's a lot cheaper than the "Enterprise Grade SSDs", which are still basically MLC-SSDs and are also doing just the same as we are. And for the price of those golden SSDs I get about 7 or 8 of the "Consumer SSD", so I just swap those out, whenever I feel like it. Or smart tells me to do so.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 7/19/2013 8:48 AM, Wade Hampton wrote:
I found lots of references to TRIM, but it is not included with CentOS 5. However, I found that TRIM is in the newer hdparm which could be build from source, but AFIK is not included with CentOS 5 RPMS. That way, one could trim via a cron job?
trim is done at the file system kernel level. essentially, its a extra command to the disk telling it this block is complete and the rest of it 'doesn't matter' so the drive doesn't need to actually store it.
On 7/19/2013 7:10 AM, Alexander Arlt wrote:
Hm. I'm not sure, if I'd go with that. In my understanding, I'd just buy something like a Samsung SSD 840 Pro (for not using TLC) and do a overprovisioning of about 60% of the capacity. With the 512GiB-Variant, I'd end up with 200GiB netto. By this way, I have no issues with TRIM or GC (there are always enough empty cells) and wear leveling is also a non-issue (at least right now...).
those drives do NOT have 'supercaps' so they will lose any recently written data on power failures. This WILL result in corrupted file systems, much the same as using a RAID controller with write-back cache that doesn't have a internal RAID battery.
From what I have read, TRIM can also be done on demand
for older systems or file systems that are not TRIM aware. For CentOS 5.x, a modified hdparm could be used to send the TRIM comamnd to the drive. Anyone have experience with this? -- Wade Hampton
On Fri, Jul 19, 2013 at 1:05 PM, John R Pierce pierce@hogranch.com wrote:
On 7/19/2013 8:48 AM, Wade Hampton wrote:
I found lots of references to TRIM, but it is not included with CentOS 5. However, I found that TRIM is in the newer hdparm which could be build from source, but AFIK is not included with CentOS 5 RPMS. That way, one could trim via a cron job?
trim is done at the file system kernel level. essentially, its a extra command to the disk telling it this block is complete and the rest of it 'doesn't matter' so the drive doesn't need to actually store it.
On 7/19/2013 7:10 AM, Alexander Arlt wrote:
Hm. I'm not sure, if I'd go with that. In my understanding, I'd just buy something like a Samsung SSD 840 Pro (for not using TLC) and do a overprovisioning of about 60% of the capacity. With the 512GiB-Variant, I'd end up with 200GiB netto. By this way, I have no issues with TRIM or GC (there are always enough empty cells) and wear leveling is also a non-issue (at least right now...).
those drives do NOT have 'supercaps' so they will lose any recently written data on power failures. This WILL result in corrupted file systems, much the same as using a RAID controller with write-back cache that doesn't have a internal RAID battery.
-- john r pierce 37N 122W somewhere on the middle of the left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 07/19/2013 08:48 AM, Wade Hampton wrote:
I found lots of references to TRIM, but it is not included with CentOS 5. However, I found that TRIM is in the newer hdparm which could be build from source, but AFIK is not included with CentOS 5 RPMS. That way, one could trim via a cron job?
NO!
From the man page: --trim-sectors For Solid State Drives (SSDs). EXCEPTIONALLY DANGEROUS. DO NOT USE THIS FLAG!!
That command can be used to trim sectors if you know which sector to start at and how many to TRIM. The only thing it's likely to be useful for is deleting all of the data on a drive.
- use file system supporting TRIM (e.g., EXT4 or BTRFS).
Yes, on release 6 or newer.
- update hdparm to get TRIM support on CentOS 5
No.
- align on block erase boundaries for drive, or use 1M boundaries
- use native, non LVM partitions
LVM is fine. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/...
- under provision (only use 60-75% of drive, leave unallocated space)
That only applies to some drives, probably not current generation hardware.
- set noatime in /etc/fstab (or relatime w/ newer to keep atime data sane)
Don't bother. The current default is relatime.
- move some tmp files to tmpfs (e.g., periodic status files and things that change often)
- move /tmp to RAM (per some suggestions)
Same thing. Most SSDs should have write capacity far in excess of a spinning disk, so the decision to do this shouldn't be driven by the use of SSD.
- use secure erase before re-use of drive
- make sure drive has the latest firmware
Not always. Look at the changelog for your drive's firmware if you're concerned and decide whether you need to update it based on whether any of the named fixes affect your system. For instance, one of my co-workers was using a Crucial brand drive in his laptop, and it frequently wasn't seen by the system on a cold boot. This caused hibernate to always fail. Firmware upgrades made the problem worse, as I recall.
- add “elevator=noop” to the kernel boot options or use deadline, can change on a drive-by-drive basis (e.g., if HD + SSD in a system)
- reduce swappiness of kernel via /etc/sysctl.conf:
vm.swappiness=1 vm.vfs_cache_pressure=50 -- or swap to HD, not SSD
None of those should be driven by SSD use. Evaluate their performance effects on your specific workload and decide whether they help. I wouldn't use them in most cases.
- BIOS tuning to set drives to “write back” and using hdparm: hdparm -W1 /dev/sda
That's not write-back, that's write-cache. It's probably enabled by default. When it's on, the drives will be faster and less safe (this is why John keeps advising you to look for a drive with a capacitor-backed write cache). When it's off, the drive will be slower and more safe (and you don't need a capacitor backed write cache).
On 7/19/2013 11:07 AM, Gordon Messmer wrote:
- under provision (only use 60-75% of drive, leave unallocated space)
That only applies to some drives, probably not current generation hardware.
it applies to all SSDs. they NEED to do write block remapping, if they don't have free space, its much much less efficient..
On 07/19/2013 11:21 AM, John R Pierce wrote:
On 7/19/2013 11:07 AM, Gordon Messmer wrote:
- under provision (only use 60-75% of drive, leave unallocated space)
That only applies to some drives, probably not current generation hardware.
it applies to all SSDs. they NEED to do write block remapping, if they don't have free space, its much much less efficient..
Well, maybe.
The important factor is how much the manufacturer has over-provisioned the storage. Performance targeted drives are going to have a large chunk of storage hidden from the OS in order to support block remapping functions. Drives that are sold at a lower cost are often going to provide less reserved storage for that purpose.
So, my point is that if you're buying good drives, you probably don't need to leave unpartitioned space because there's already a big chunk of space that's not even visible to the OS.
Here are a couple of articles on the topic:
http://www.edn.com/design/systems-design/4404566/Understanding-SSD-over-prov... http://www.anandtech.com/show/6489/playing-with-op
Anand's tests indicate that there's not really a difference between cells reserved by the manufacturer and cells in unpartitioned space on the drive. If your manufacturer left less space reserved, you can probably boost performance by reserving space yourself by leaving it unpartitioned.
There are diminishing returns, so if the manufacturer did reserve sufficient space, you won't get much performance benefit from leaving additional space unallocated.
Thanks for the feedback.
Sounds like all this needs to be merged into a wiki?
Couple of take-aways: - options will depend on the drive -- cheap drives, be more conservative with options including turning write-cache off -- provisioning depends on how much mfg reserves - better options are available for CentOS 6 - kernel scheduler, swap, and /tmp changes might help for some use cases -- test and determine if they will help (e.g., if your system processes data and creates a lot of files in /tmp for processing, putting /tmp in RAM might help)
1) Determine your use case 2) Determine the type of drive you need and any items specific to the drive (reserved space, TRIM, big caps) 3) Use newer Linux systems (CentOS 6, later UBUNTU, RHEL, Fedora) if you can -- and use EXT4 with trim enabled (if drive supports it) 4) Test 5) Deploy
Cheers, -- Wade Hampton
On Fri, Jul 19, 2013 at 4:07 PM, Gordon Messmer gordon.messmer@gmail.comwrote:
On 07/19/2013 11:21 AM, John R Pierce wrote:
On 7/19/2013 11:07 AM, Gordon Messmer wrote:
- under provision (only use 60-75% of drive, leave unallocated space)
That only applies to some drives, probably not current generation
hardware.
it applies to all SSDs. they NEED to do write block remapping, if they don't have free space, its much much less efficient..
Well, maybe.
The important factor is how much the manufacturer has over-provisioned the storage. Performance targeted drives are going to have a large chunk of storage hidden from the OS in order to support block remapping functions. Drives that are sold at a lower cost are often going to provide less reserved storage for that purpose.
So, my point is that if you're buying good drives, you probably don't need to leave unpartitioned space because there's already a big chunk of space that's not even visible to the OS.
Here are a couple of articles on the topic:
http://www.edn.com/design/systems-design/4404566/Understanding-SSD-over-prov... http://www.anandtech.com/show/6489/playing-with-op
Anand's tests indicate that there's not really a difference between cells reserved by the manufacturer and cells in unpartitioned space on the drive. If your manufacturer left less space reserved, you can probably boost performance by reserving space yourself by leaving it unpartitioned.
There are diminishing returns, so if the manufacturer did reserve sufficient space, you won't get much performance benefit from leaving additional space unallocated. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 7/19/2013 2:04 PM, Wade Hampton wrote:
-- cheap drives, be more conservative with options including turning write-cache off
you can't turn off the write cache on SSDs... if they did let you do that, they would grind to a halt as each n sector write operation would require read modify writing 1MB or so blocks of flash.
On Friday 19 July 2013, Wade Hampton wadehamptoniv@gmail.com wrote:
- set noatime in /etc/fstab (or relatime w/ newer to keep atime data sane)
Also set nodiratime.
Yves
Am 21.07.2013 04:59, schrieb Yves Bellefeuille:
On Friday 19 July 2013, Wade Hampton wadehamptoniv@gmail.com wrote:
- set noatime in /etc/fstab (or relatime w/ newer to keep atime data sane)
Also set nodiratime.
if you specify noatime it includes nodiratime already.
Yves
Alexander
From: Andrew Reis andy@dbmsinc.com
My company is beginning to look at using SSD drives in our CentOS based servers. Does C5 and C6 support TRIM and other "required" functions for the SSD to operate?
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/... Also, some new SSDs seem to have (less efficient?) autonomous (that do not depend on TRIM) "garbage collectors".
JD