[CentOS] who uses Lustre in production with virtual machines?

Tue Aug 3 19:16:24 UTC 2010
Emmanuel Noobadmin <centos.admin at gmail.com>

On 8/4/10, Rudi Ahlers <Rudi at softdux.com> wrote:
> With lustre, from what I understand, I could use say 3 or 5 or 50
> servers to spread the load across the server and thus have higher IO.
> We mainly host shared hosting clients, who often have hundreds &
> thousands of files in one account. So if their files were "scattered"
> across multiple servers then the access to those files would be
> quicker.

One of the problem with Lustre's style of distributed storage which
Gluster points out is that the bottleneck is the meta server which
tells clients where to find the actual data. Gluster supposedly scales
with every client machine added because it doesn't use a meta server,
file locations are determined using some kind of computed hash.

> In terms of high availability, I'm thinking that if I use more servers
> and thus have less load on each server, then the rate of failure would
> also be less. I see they have a high availability option, but would

The drives would be constantly spinning anyway, so the increase in
failure rate probably won't be significant as a result of that. Better
to assume things will fail and have a system that's designed to handle
that kind of situation with minimum disruptions :)

> also be interested to know what was said about it. Would you can to
> point me to the specific conversation about this?

I don't have a link because it's in my inbox but you might be able to
find it "Question on Lustre redundancy/failure features" in Lustre
mailing list archive around 28 Jun 2010.

The general gist of it is that Lustre is basically network RAID 0, it
relies entirely on the underlying device (e.g. RAID 1 storage node)
for redundancy. If a storage device fails, access to the data is
blocked until the device is replaced/rebuilt.

For HPC parallel workload, it probably makes sense, since each work
unit is independent, you can just wait for the node to be replaced
before processing those data. But in our situation, that's bad idea,
just imagine if just one data block each from 50 VM happens to be on
that failed node :D