[CentOS] Is glusterfs ready?

Wed Sep 5 15:07:45 UTC 2012
Dennis Jacobfeuerborn <dennisml at conversis.de>

On 09/05/2012 07:14 AM, Bob Hepple wrote:
> David C. Miller <millerdc at ...> writes:
> 
>>
>>
>> ----- Original Message -----
>>> From: "John Doe" <jdmls at ...>
>>> To: "Cent O Smailinglist" <centos at ...>
>>> Sent: Tuesday, August 28, 2012 3:14:29 AM
>>> Subject: [CentOS] Is glusterfs ready?
>>>
>>> Hey,
>>>
>>> since RH took control of glusterfs, I've been looking to convert our
>>> old independent RAID storage servers to several non RAID glustered
>>> ones.
>>>
>>> The thing is that I, here and there, heard a few frightening stories
>>> from some users (even with latest release).
>>> Any one has experienced with it long enough to think one can blindly
>>> trust it or if it is almost there but not yet ready?
>>>
> 
> Heya,
> 
> Well I guess I'm one of the frightening stories, or at least a
> previous employer was. They had a mere 0.1 petabyte store over 6
> bricks yet they had incredible performance and reliability
> difficulties. I'm talking about a mission critical system being
> unavailable for weeks at a time. At least it wasn't customer
> facing (there was another set of servers for that).
> 
> The system was down more than it was up. Reading was generally
> OK (but very slow) but multiple threads writing caused mayhem -
> I'm talking lost files and file system accesses going into the
> multiple minutes.
> 
> In the end I implemented a 1-Tb store to be fuse-unioned over the top
> of the thing to take the impact of multiple threads writing to it. A
> single thread (overnight) brought the underlying glusterfs up to date.
> 
> That got us more or less running but the darned thing spent most of
> its time re-indexing and balancing rather than serving files.
> 
> To be fair, some of the problems were undoubtedly of their own making
> as 2 nodes were centos and 4 were fedora-12 - apparently the engineer
> couldn't find the installation CD for the 2 new nodes and 'made do'
> with what he had! I recall that a difference in the system 'sort'
> command gave all sorts of grief until it was discovered, never mind
> different versions of the gluster drivers.

That is the problem with most of these stories in that the setups tend to
be of the "adventurous" kind. Not only was the setup very asymmetrical but
Fedora 12 was long outdated even 6 months ago.
This kind of setup should be categorized as "highly experimental" and not
something you actually use in production.

> I'd endorse Johnny's comments about it not handling large numbers of
> small files well (ie <~ 10 Mb). I believe it was designed for large
> multi-media files such as clinical X-Rays. ie a small number of large
> files.

That's a problem with all distributed filesystems. For a few large files
the additional time needed for round-trips is usually dwarfed by the actual
I/O requests themselves so you don't notice it (much). With a ton of small
files you incur lots of metadata fetching round-trips for every few kbyte
read/written which slows things down by a great deal.
So basically if you want top performance for lot of small files don't use
distributed filesystems.

> Another factor is that the available space is the physical space
> divided by 4 due to the replication across the nodes on top of the
> nodes being RAID'd themselves.

That really depends on your setup. I'm not sure what you mean by the nodes
being raided themselves. If you run a four node cluster and keep two copies
of each file you would probably create two pairs of nodes where one node is
replicated to the other and then create a stripe over these two pairs which
should actually improve performance. This would mean your available space
would be cut in half and not be divided by 4.

Regards,
  Dennis