Hi All.
I have a CentOS server:
CentOS 5.6 x86_64 2.6.18-238.12.1.el5.centos.plus e4fsprogs-1.41.12-2.el5.x86_64
which has a 11TB ext4 filesystem. I have problems with running fsck on it and would like to change the filesystem because I do not like the possibility of running long fsck on it, it's a production machine. Also I have some problems with running fsck (not enough RAM, problem with scratch_files option) and if the filesystem will need intervention I will be in a problematic situation.
Which other mature and stable filesystem can you recommend for such large storage?
Best regards, Rafal Radecki.
On 27.09.2012 09:10, Rafał Radecki wrote:
Hi All.
I have a CentOS server:
CentOS 5.6 x86_64 2.6.18-238.12.1.el5.centos.plus e4fsprogs-1.41.12-2.el5.x86_64
which has a 11TB ext4 filesystem. I have problems with running fsck on it and would like to change the filesystem because I do not like the possibility of running long fsck on it, it's a production machine. Also I have some problems with running fsck (not enough RAM, problem with scratch_files option) and if the filesystem will need intervention I will be in a problematic situation.
Which other mature and stable filesystem can you recommend for such large storage?
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
Alternatively you can look at less supported filesystems such as BTRFS. Or even http://zfsonlinux.org/.
On 09/27/12 1:52 AM, Nux! wrote:
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
On 27.09.2012 10:08, John R Pierce wrote:
On 09/27/12 1:52 AM, Nux! wrote:
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
Oh yeah, definitely upgrade to Centos 6. Maybe even go for the elrepo kernel-ml a go, too. They usually provide the latest.
Definitely shoot for CentOS 6.3 ...
XFS with a kernel _more recent_ than 2.6.36 (currently shipped with CentOS6) has more improvements to the XFS code. Youtube video on XFS [0] - I believe the kernel version noted is 2.6.39 (watch the video!) [2].
And there's also a Youtube video on BTRFS [1] that was linked to/shared by Fernando.
[0] http://lists.centos.org/pipermail/centos/2012-August/128119.html [1] http://lists.centos.org/pipermail/centos/2012-August/128110.html [2] http://lwn.net/Articles/438671/
---~~.~~--- Mike // SilverTip257 //
On Thu, Sep 27, 2012 at 5:08 AM, John R Pierce pierce@hogranch.com wrote:
On 09/27/12 1:52 AM, Nux! wrote:
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Current CentOS 6 is 2.6.32, not 2.6.36
In that XFS Youtube video, Dave Chinner says upstream 3.0 kernel or RHEL 6.2 [at 45:20 of the video].
Other sources [0] [1] agree.
[0] http://lwn.net/Articles/476616/ [1] http://jira.funtoo.org/browse/FL-38
---~~.~~--- Mike // SilverTip257 //
On Thu, Sep 27, 2012 at 8:46 AM, SilverTip257 silvertip257@gmail.com wrote:
Definitely shoot for CentOS 6.3 ...
XFS with a kernel _more recent_ than 2.6.36 (currently shipped with CentOS6) has more improvements to the XFS code. Youtube video on XFS [0] - I believe the kernel version noted is 2.6.39 (watch the video!) [2].
And there's also a Youtube video on BTRFS [1] that was linked to/shared by Fernando.
[0] http://lists.centos.org/pipermail/centos/2012-August/128119.html [1] http://lists.centos.org/pipermail/centos/2012-August/128110.html [2] http://lwn.net/Articles/438671/
---~~.~~--- Mike // SilverTip257 //
On Thu, Sep 27, 2012 at 5:08 AM, John R Pierce pierce@hogranch.com wrote:
On 09/27/12 1:52 AM, Nux! wrote:
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello,
One day our servers farm rebooted unexpectedly (power fail happened) and on centos 6.3 with up2date kernel we lost few hundred files (which probably was opened for reading, NOT writing) on XFS.
Unexpected power lost follow to situation when some files get a zero size.
On Fri, Sep 28, 2012 at 2:34 AM, SilverTip257 silvertip257@gmail.com wrote:
Current CentOS 6 is 2.6.32, not 2.6.36
In that XFS Youtube video, Dave Chinner says upstream 3.0 kernel or RHEL 6.2 [at 45:20 of the video].
Other sources [0] [1] agree.
[0] http://lwn.net/Articles/476616/ [1] http://jira.funtoo.org/browse/FL-38
---~~.~~--- Mike // SilverTip257 //
On Thu, Sep 27, 2012 at 8:46 AM, SilverTip257 silvertip257@gmail.com wrote:
Definitely shoot for CentOS 6.3 ...
XFS with a kernel _more recent_ than 2.6.36 (currently shipped with CentOS6) has more improvements to the XFS code. Youtube video on XFS [0] - I believe the kernel version noted is 2.6.39 (watch the video!) [2].
And there's also a Youtube video on BTRFS [1] that was linked to/shared by Fernando.
[0] http://lists.centos.org/pipermail/centos/2012-August/128119.html [1] http://lists.centos.org/pipermail/centos/2012-August/128110.html [2] http://lwn.net/Articles/438671/
---~~.~~--- Mike // SilverTip257 //
On Thu, Sep 27, 2012 at 5:08 AM, John R Pierce pierce@hogranch.com wrote:
On 09/27/12 1:52 AM, Nux! wrote:
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 09/28/12 12:09 PM, Ilyas -- wrote:
Hello,
One day our servers farm rebooted unexpectedly (power fail happened) and on centos 6.3 with up2date kernel we lost few hundred files (which probably was opened for reading, NOT writing) on XFS.
Unexpected power lost follow to situation when some files get a zero size.
what sort of physical storage? are you sure it is write-safe ? write-back caches without battery backup are often a cause of data loss.
Backend storage is 2 SATA directly attached disks. No any caches on SATA controller. Both disks run in mdraid mirror.
Zeroed files have written many days (some files was written and closed 2 weeks ago) ago before power fail.
On Sat, Sep 29, 2012 at 12:19 AM, John R Pierce pierce@hogranch.com wrote:
On 09/28/12 12:09 PM, Ilyas -- wrote:
Hello,
One day our servers farm rebooted unexpectedly (power fail happened) and on centos 6.3 with up2date kernel we lost few hundred files (which probably was opened for reading, NOT writing) on XFS.
Unexpected power lost follow to situation when some files get a zero size.
what sort of physical storage? are you sure it is write-safe ? write-back caches without battery backup are often a cause of data loss.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 09/29/12 5:19 AM, Ilyas -- wrote:
Backend storage is 2 SATA directly attached disks. No any caches on SATA controller. Both disks run in mdraid mirror.
Zeroed files have written many days (some files was written and closed 2 weeks ago) ago before power fail.
How do 2 sata disks in a mirror make 11TB ?!?
On Saturday, September 29, 2012 11:56:04 AM John R Pierce wrote:
On 09/29/12 5:19 AM, Ilyas -- wrote:
Backend storage is 2 SATA directly attached disks. No any caches on SATA controller. Both disks run in mdraid mirror.
Zeroed files have written many days (some files was written and closed 2 weeks ago) ago before power fail.
How do 2 sata disks in a mirror make 11TB ?!?
They don't, John. Ilyas is not the OP.
The point was showing XFS corruption with a fairly simple setup, I think. But Ilyas is welcome to post if I'm wrong...
On 2012-09-28, Ilyas -- umask00@gmail.com wrote:
One day our servers farm rebooted unexpectedly (power fail happened) and on centos 6.3 with up2date kernel we lost few hundred files (which probably was opened for reading, NOT writing) on XFS.
Unexpected power lost follow to situation when some files get a zero size.
No filesystem can fully protect against power failures--that's not its job. That's why higher-end RAID controllers have battery backups, and why important servers should be on a UPS. If you are really paranoid, you can probably tweak the kernel (e.g., using sysctl) to flush disk writes more frequently, but then you might drag down performance with it.
IOW, it's nice to have fsck, but it's better to take steps to avoid needing it.
That being said, through a series of unfortunate events, I've lost power on some of my larger XFS filesystems, and in those rare events I have not seen or heard about any files lost. So I strongly suspect other factors in your data loss--if XFS was involved, there were probably also other issues involved as well.
--keith
On Friday, September 28, 2012 04:29:55 PM Keith Keller wrote:
No filesystem can fully protect against power failures--that's not its job. That's why higher-end RAID controllers have battery backups, and why important servers should be on a UPS. If you are really paranoid, you can probably tweak the kernel (e.g., using sysctl) to flush disk writes more frequently, but then you might drag down performance with it.
As far as UPS's are concerned, even those won't protect you from a BRS event.
BRS = Big Red Switch, aka EPO, or Emergency Power Off. NEC Article 645 (IIRC) mandates this for Information Technology rooms that use the relaxed rules of that article (and virtually all IT rooms do so, in my experience). The EPO is supposed to take *everything* down hard (including the DC to the UPS's, if the UPS is in the room, and shunt trip the breakers feeding the room so that the room is completely dead), and the fire suppression system is supposed to be tied in to it. And the EPO has to be a push to activate, and it has to be accessible, and people have hit the switch before.
Caching controllers are only part of the equation; in a BRS event, the battery is likely to have let go of the cache contents by the time things are back up, depending upon what caused the BRS event. This is a case where you should test this with a server and make see just how long the battery will hold the cache.
In the case of EMC Clariions, the write cache (there is only one, mirrored between the storage processors) on the storage processors is flushed to the 'vault' disks in an EPO event; there is a small UPS built in to the rack that keeps the vault disks up long enough to do this, and the SP's can then do an orderly shutdown. Takes about 90 seconds with a medium sized write cache and fast vault drives. Then, when the system boots back up, the vault contents are flushed out to the LUN's.
Now, to make this reliable, EMC has custom firmware loaded on their drives that doesn't do any write caching on the drive itself, and that is part of the design of their systems. Drive enclosures (DAE, in EMC's terminology) other than the DAE with the OS and vault disks, can go down hard and the array won't lose data, thanks to the vault and the EMC software. The EMC software periodically tests the battery backup units, and will disable the write cache (and flush it to disk) if the battery faults during the test. It is amazing how much performance is due to good (and large) write caches; modern SATA drives owe much of their performance to their write caches.
No if the sprinkler system is what caused the EPO, well, it may not matter how good the write cache vault is, depending on how wet things get...... but that's part of the DR plan, or should be....
----- Original Message ----- | Hello, | | One day our servers farm rebooted unexpectedly (power fail happened) | and on centos 6.3 with up2date kernel we lost few hundred files | (which | probably was opened for reading, NOT writing) on XFS. | | Unexpected power lost follow to situation when some files get a zero | size.
This is not uncommon with a file system like XFS, where the file system makes EXTENSIVE use of file system caching and memory and internal semantics that will make your head spin. Fact of the matter is, that in spite of this "possibility" of loss, XFS is by far the best file system for large volumes at the moment and especially during initialization time. You *can* use EXT4 with you can speed this up if you use the -E lazy_itable_init=1 -O dir_index,extent,flex_bg,uninit_bg options.
XFS + battery backed RAID controller is not way to protect your data.
Very easy way to understand it is run server farm with 1000+ nodes. This is enough quantity of servers for make representative sample.
There are problems: 1. bugs in RAID controllers (problems with BBU, cache memory, hardware, firmware etc) which follow to errors in data writes or even follow to freeze server which require cold reboots. 2. problems with hardware (cpu, memory, mainboard etc) which follow to system hangups too.
Almost every system hangup makes XFS broken
In my case of problems with XFS I have 2 uninvestigated issues: 1. Why XFS zeroed files which was written and closed 2 weeks ago (it happened on server with few terabytes mdraid1)? 2. Why xfs_check never works (even on systems with 32G of ram) on filesystems with 40TB storage? Yes, I have to use xfs_repair, but anyway xfs_check everytime killed by OOM even when run after xfs_repair (this problem happened with 40TB storage with 40k files on it). This fact forced me to store some part of backups on ext4. To do it I have rebuilt latest version of e2fsprogs for rhel6 because vendor version does not support so big ext4 filesystems.
On Sat, Sep 29, 2012 at 3:30 AM, James A. Peltier jpeltier@sfu.ca wrote:
----- Original Message ----- | Hello, | | One day our servers farm rebooted unexpectedly (power fail happened) | and on centos 6.3 with up2date kernel we lost few hundred files | (which | probably was opened for reading, NOT writing) on XFS. | | Unexpected power lost follow to situation when some files get a zero | size.
This is not uncommon with a file system like XFS, where the file system makes EXTENSIVE use of file system caching and memory and internal semantics that will make your head spin. Fact of the matter is, that in spite of this "possibility" of loss, XFS is by far the best file system for large volumes at the moment and especially during initialization time. You *can* use EXT4 with you can speed this up if you use the -E lazy_itable_init=1 -O dir_index,extent,flex_bg,uninit_bg options.
-- James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices http://blogs.sfu.ca/people/jpeltier
Success is to be measured not so much by the position that one has reached in life but as by the obstacles they have overcome. - Booker T. Washington _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 2012-09-27, John R Pierce pierce@hogranch.com wrote:
XFS is fairly memory intensive. 11TB file systems tend to mean millions and millions of files.
frankly, I wouldn't run this on CentOS 5.6, I would upgrade to CentOS 6.latest and then I would use XFS.... support for EXT4 and XFS is rather sketchy with the old kernel in 5.x (and why aren't you at 5.8 or whatever is current in the 5 series anyways?!?)
I have a ~20TB XFS filesystem on CentOS 5. Support for xfs in the CentOS 5 kernels is now built-in, so you don't have to rely on the old buggy XFS modules from centosplus. (I have yet to xfs_repair this filesystem; I did repair it back when it was ~12TB, and it ran fine.)
I have also run xfs_repair on a 17TB XFS filesystem on a machine with about 4GB of memory. It ran fine in less than one hour (~30m IIRC; that filesystem is on CentOS 6).
I definitely agree that CentOS 6 is a better way to go, but XFS can be done on CentOS 5 too. Just make sure you are completely up to date.
For the OP, what are the fsck times currently like for your ext4 filesystem? If they are already less than one hour, you may not see any benefit from switching.
--keith
On 09/27/12 11:15 AM, Keith Keller wrote:
I have also run xfs_repair on a 17TB XFS filesystem on a machine with about 4GB of memory. It ran fine in less than one hour (~30m IIRC; that filesystem is on CentOS 6).
with XFS at least (and probably ext4) what counts is how many files are in the file system more than the absolute size. if you have 17000 1gb files, its one thing, if you have 17,000,000,000 1K files, its another thing entirely.
On 2012-09-27, John R Pierce pierce@hogranch.com wrote:
On 09/27/12 11:15 AM, Keith Keller wrote:
I have also run xfs_repair on a 17TB XFS filesystem on a machine with about 4GB of memory. It ran fine in less than one hour (~30m IIRC; that filesystem is on CentOS 6).
with XFS at least (and probably ext4) what counts is how many files are in the file system more than the absolute size. if you have 17000 1gb files, its one thing, if you have 17,000,000,000 1K files, its another thing entirely.
Good point. Just for a data point, I've probably got on the order of a few dozen million files of widely varying sizes on this particular filesystem. (So more than 10 million files but fewer than 100 million.)
--keith
Which other mature and stable filesystem can you recommend for such large storage?
Never had to deal with such a large filesystem, yet, but I'd try XFS on it.
Alternatively you can look at less supported filesystems such as BTRFS. Or even http://zfsonlinux.org/.
Since its for production, i would avoid both zfs and btrfs. But i guess there aren't many options available a.t.m. Best is to wait for btrfs to be production ready.
Wouldn't splitting the 11TB filesystem to smaller filesystems work ? You wont able able to avoid the fsck or disruption in service, but atleast you can bring up critical mounts faster.
- jb
On Thu, Sep 27, 2012 at 5:52 AM, Nux! nux@li.nux.ro wrote:
Alternatively you can look at less supported filesystems such as BTRFS.
What do you mean by "less suported" ?
https://events.linuxfoundation.org/events/linuxcon-japan/bo --- LinuxCon Japan 2012 | Presentations "On The Way to a Healthy Btrfs Towards Enterprise" by Liu Bo, Fujitsu ---
Let me quote: "Btrfs has been on full development for about 5 years and it does make lots of progress on both features and performance, but why does everybody keep tagging it with ""experimental""? And why do people still think of it as a vulnerable one for production use? As a goal of production use, we have been strengthening several features, making improvements on performance and keeping fixing bugs to make btrfs stable, for instance, ""snapshot aware defrag"", ""extent buffer cache"", ""rbtree lock contention"", etc. This talk will cover the above" ---
From its web "Liu Bo has been working on linux kernel development
since late 2010 as a Fujitsu engineer. He has been working on filesystem field and he's now focusing on btrfs development".
RHEL 7 to get Btrfs support http://www.h-online.com/open/imgs/45/8/8/4/6/5/1/43-6b4e69889ee000ca.png
"RHEL 7 will support ext4, XFS, and Btrfs (boot and data)"
Then you have SuSE: https://www.suse.com/releasenotes/x86_64/SUSE-SLES/11-SP2/
"With SUSE Linux Enterprise 11 SP2, the btrfs file system joins ext3, reiserfs, xfs and ocfs2 as *commercially supported file systems*. Each file system offers disctinct advantages. While the installation default is ext3, we recommend xfs when maximizing data performance is desired, and *btrfs as a root file system when snapshotting and rollback capabilities are required. Btrfs is supported as a root file system (i.e. the file system for the operating system) across all architectures of SUSE Linux Enterprise 11 SP2*. "
https://blogs.oracle.com/wim/entry/oracle_linux_6_update_3
"OL6.3 that boots up uek (2.6.39-200.24.1) as install kernel and uses btrfs as the default filesystem for installation. So latest and greatest direct access to btrfs, a modern well-tested, current kernel, freely available. "
So, again, what´dya mean by "less supported"?. It´s in the mainline kernel since February so with the adoption by RHEL 7, it´ll become mainstream sooner rather than later...
Just my $0.02... FC
On 9/27/12, Fernando Cassia fcassia@gmail.com wrote:
So, again, what´dya mean by "less supported"?. It´s in the mainline kernel since February so with the adoption by RHEL 7, it´ll become mainstream sooner rather than later...
Just my $0.02...
Thats the whole point isn't it. Until RHEL includes its (rather than as a technology preview), you probably shouldn't use it as a production file system and definitely not with the 5.x CentOS the OP is using
- jb
You should upgrade to a newer kernel - there are lots of improvements to ext4 since the rhel5 kernel...
rhel/centos 6 is a start but if you don't need rhel/centos you could try Ubuntu 12.04 to see how the 3.2.x kernel handles it. cheers
On 27 September 2012 10:47, joel billy jbilly2002@gmail.com wrote:
On 9/27/12, Fernando Cassia fcassia@gmail.com wrote:
So, again, what´dya mean by "less supported"?. It´s in the mainline kernel since February so with the adoption by RHEL 7, it´ll become mainstream sooner rather than later...
Just my $0.02...
Thats the whole point isn't it. Until RHEL includes its (rather than as a technology preview), you probably shouldn't use it as a production file system and definitely not with the 5.x CentOS the OP is using
- jb
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Am 27.09.2012 um 10:10 schrieb Rafał Radecki:
Hi All.
I have a CentOS server:
CentOS 5.6 x86_64 2.6.18-238.12.1.el5.centos.plus e4fsprogs-1.41.12-2.el5.x86_64
which has a 11TB ext4 filesystem. I have problems with running fsck on it and would like to change the filesystem because I do not like the possibility of running long fsck on it, it's a production machine. Also I have some problems with running fsck (not enough RAM, problem with scratch_files option) and if the filesystem will need intervention I will be in a problematic situation.
Which other mature and stable filesystem can you recommend for such large storage?
what about:
$ man tune2fs
"maximum / mount count / time" can be changed.
and to boot "faster" just do
$ touch /fastboot $ reboot
-- LF
On Thu, Sep 27, 2012 at 10:10 AM, Rafał Radecki radecki.rafal@gmail.comwrote:
Which other mature and stable filesystem can you recommend for such large storage?
I recommend XFS
BR Bent
----- Original Message ----- | Hi All. | | I have a CentOS server: | | CentOS 5.6 x86_64 | 2.6.18-238.12.1.el5.centos.plus | e4fsprogs-1.41.12-2.el5.x86_64 | | which has a 11TB ext4 filesystem. I have problems with running fsck | on it | and would like to change the filesystem because I do not like the | possibility of running long fsck on it, it's a production machine. | Also I | have some problems with running fsck (not enough RAM, problem with | scratch_files option) and if the filesystem will need intervention I | will | be in a problematic situation. | | Which other mature and stable filesystem can you recommend for such | large | storage? | | Best regards, | Rafal Radecki. | _______________________________________________ | CentOS mailing list | CentOS@centos.org | http://lists.centos.org/mailman/listinfo/centos |
As someone who is working with 15-30TB volumes, use XFS, but be sure you have a lot of memory. 48GB at least and more if you have directories with 10s of thousands of files in them.