On 02/28/2014 08:15 AM, Götz Reinicke - IT Koordinator wrote:
... Big in that context is up to couple of 100 TB may be.
... From my POV I could e.g. use hardware raidboxes, and use LVM and filesystem growth options to extend the final share, but what if one of the boxes crash totally? The whole Filesystem would be gone.
... So how do you handle big filesystems/storages/shares?
We handle it with EMC Clariion fibre channel units, and LVM. Always make your PV's relatively small, use RAID6 on the array, and mirror it, either at the SAN level with MirrorView or at the OS level, using LUNs as mdraid components for the PV's. Today that would become a set of VNX systems with SAS on the backend and iSCSI on the front end, but, well, a SAN is a SAN. Surplus FC HBA's are cheap; licenses won't be, nor will the array. But how valuable is that data, or, more to the point, if you were to lose all of that 100TB what would it cost you to recreate it?
With this size of data, rolling-your-own should be the last resort, and only if you can't afford something properly engineered for high availability, like basically anything from NetApp, Nimble, or EMC (among others; those are the first three off the top of my head). The value-add with these three (among others) is the long track record of reliability and the software management tools that make life so much easier when a drive or other component inevitably fails.
An enterprise-grade SAN or NAS from a serious vendor is going to cost serious money, but you do get what you pay for, again primarily on the software side. Our four Clariions (two CX3-10c's, one CX3-80, and a CX4-480) just simply don't go down, and upgrades are very easy and reliable, in my experience. The two CX3-10c's have been online continually since mid-2007, and while they are way out of warranty, past the normal service life, even, they just run and run and run and run. (I even used the redundancy features in one of them to good effect while (slowly and carefully!) moving the array from one room to another.... long extension power cords, and long fiber jumpers worked to my advantage; of course, a stable rack on wheels made it possible. The array stayed up, and no servers lost connectivity to the SAN during the move, not that I would recommend it for normal operations, but this wasn't a normal operation.) The storage processor sends alerts when drives fault, and a drive fault is an easy hotswap with the DAE and the drive clearly identified at the front of the array. Everything (drives, power supplies, storage processor modules, LCC's) except a whole DAE or storage processor enclosure is hotswap, and I haven't had a DAE fault yet that required pulling the whole DAE out of service.
If you do roll-your-own, do not use consumer-class drives. One reason NetApp and the rest charge so much for drives is due to the extra testing and sometimes the custom firmware that goes into the drives (in a nutshell, you do NOT want the drive doing its own error recovery, that's the array storage processor's job!).
Those are my opinions and experience. YMMV.