Hello there,
I believe that unfsd ( http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I am using it on CentOS 6.2 and it seems to become all but unusable when more then 3-4 users connect to it. Is that normal? What sort of experience have other people had?
Is there a way to parametrically tune it, by the way?
Thanks.
Boris.
On 06/01/2012 10:27 PM, Boris Epstein wrote:
Hello there,
I believe that unfsd ( http://unfs3.sourceforge.net/ )
whats wrong with the nfs server included in the distro ?
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I am using it on CentOS 6.2 and it seems to become all but unusable when more then 3-4 users connect to it. Is that normal? What sort of experience have other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
On Fri, Jun 01, 2012 at 03:36:09PM -0700, John R Pierce wrote:
maybe in 2003, when Linux NFS was sketchy, this made sense.
Unlikely back then, either. It's a userland implementation, subject to all the same scheduling issues as any other userland app; filesystems should not be implemented in userland for efficiency reasons.
John
On Fri, Jun 1, 2012 at 6:46 PM, John R. Dennison jrd@gerdesas.com wrote:
On Fri, Jun 01, 2012 at 03:36:09PM -0700, John R Pierce wrote:
maybe in 2003, when Linux NFS was sketchy, this made sense.
Unlikely back then, either. It's a userland implementation, subject to all the same scheduling issues as any other userland app; filesystems should not be implemented in userland for efficiency reasons.
John
John,
A process implemented in the userland may not be as efficient as one implemented as part of the kernel - but that doesn't mean it can't scale well, does it?
Boris.
On Sat, Jun 2, 2012 at 9:59 AM, Boris Epstein borepstein@gmail.com wrote:
Unlikely back then, either. It's a userland implementation, subject to all the same scheduling issues as any other userland app; filesystems should not be implemented in userland for efficiency reasons.
A process implemented in the userland may not be as efficient as one implemented as part of the kernel - but that doesn't mean it can't scale well, does it?
Anything that needs atomic operations is difficult to scale. Throw in distributed components and an extra user/kernel layer and there are lots of ways to go wrong.
A process implemented in the userland may not be as efficient as one
implemented as part of the kernel - but that doesn't mean it can't scale well, does it?
Anything that needs atomic operations is difficult to scale. Throw in distributed components and an extra user/kernel layer and there are lots of ways to go wrong.
-- Les Mikesell lesmikesell@gmail.com
Les, what doesn't need atomic operations? And how doing things in kernel makes your program more scalable - it is the algorithm that matters, not the execution space, IMO.
Boris.
On Sat, Jun 2, 2012 at 1:31 PM, Boris Epstein borepstein@gmail.com wrote:
Anything that needs atomic operations is difficult to scale. Throw in distributed components and an extra user/kernel layer and there are lots of ways to go wrong.
Les, what doesn't need atomic operations?
That's the driving force behind 'nosql' databases. Riak, for example allows concurrent conflicting operations on a potentially partitioned cluster and figures it out after the fact. But in anything resembling a filesystem the directory operations have to be atomic. If you open a file, you need to know if that name already exists, and no matter how many processes try to create a new file or link, only one can succeed. So, you pretty much have to lock all directory operations while any of them complete. Do you happen to have a lot of files in a single directory?
And how doing things in kernel makes your program more scalable - it is the algorithm that matters, not the execution space, IMO.
It is hard enough to do atomic operations and locks at a single layer - it becomes next to impossible when distributed, and adding layers with different scheduling concepts has to make it worse. The kernel has its own way of locking, but anything in userland is going to need a trip into the kernel and back just for that.
On Sat, Jun 02, 2012 at 10:59:13AM -0400, Boris Epstein wrote:
A process implemented in the userland may not be as efficient as one implemented as part of the kernel - but that doesn't mean it can't scale well, does it?
Depends on ones definition of scale I suppose. I consider efficiency and performance one factor of scaling. To be completely honest about this I must admit that I've not spent a lot of time benchmarking any user space implementation in a large deployment but I wouldn't expect performance to ramp up based on scale.
I've always had a strong aversion to file systems implemented in user space versus kernel space as I've (personally) never found such an implementation that had what I considered good performance.
My needs, however, are not yours. If your requirements give you leeway for higher latency and slower overall performance perhaps a userland file system will work perfectly fine for you. As with all else in the IT sector use what works best for you :)
John
On Sat, Jun 2, 2012 at 2:50 PM, John R. Dennison jrd@gerdesas.com wrote:
On Sat, Jun 02, 2012 at 10:59:13AM -0400, Boris Epstein wrote:
A process implemented in the userland may not be as efficient as one implemented as part of the kernel - but that doesn't mean it can't scale well, does it?
Depends on ones definition of scale I suppose. I consider efficiency and performance one factor of scaling. To be completely honest about this I must admit that I've not spent a lot of time benchmarking any user space implementation in a large deployment but I wouldn't expect performance to ramp up based on scale.
I've always had a strong aversion to file systems implemented in user space versus kernel space as I've (personally) never found such an implementation that had what I considered good performance.
My needs, however, are not yours. If your requirements give you leeway for higher latency and slower overall performance perhaps a userland file system will work perfectly fine for you. As with all else in the IT sector use what works best for you :)
John
-- Human beings hardly ever learn from the experience of others. They learn; when they do, which isn't often, on their own, the hard way.
-- Robert Heinlein (1907-1988), American science fiction writer, Time Enough for Love (1973)
John,
To be specific, I use UNFSD to export a MooseFS file system. MooseFS, by the way, is userland-process based too.
Be that as it may, I've seen situations where a comparably configured MooseFS client get to read at, say, 40 MB/s - which is fine - but the UNFSD at the same time reads at 40K/s(!) Why would that be? I mean, some degradation I can dig but 3 orders of magnitude? What is with this? Am I doing something wrong?
I can't believe it works the same way for everybody - who would use it if it did?
Boris.
Boris Epstein wrote:
On Sat, Jun 2, 2012 at 2:50 PM, John R. Dennison jrd@gerdesas.com wrote:
On Sat, Jun 02, 2012 at 10:59:13AM -0400, Boris Epstein wrote:
<snip>
To be specific, I use UNFSD to export a MooseFS file system. MooseFS, by the way, is userland-process based too.
Be that as it may, I've seen situations where a comparably configured MooseFS client get to read at, say, 40 MB/s - which is fine - but the UNFSD at the same time reads at 40K/s(!) Why would that be? I mean, some degradation I can dig but 3 orders of magnitude? What is with this? Am I doing something wrong?
<snip> I wonder... what's the architecture of what you're getting these results? I tried opening a bug with upstream over NFS4 and 6.x, and no one ever looked at it, and they closed it.
100% repeatably: unpack a package locally, seconds. unpack it from an NFS mount onto a local drive, about 1 min. unpack it from an NFS mount onto an NFS mount, even when the target is exported FROM THE SAME MACHINE* that the process is running on: 6.5 - 7 MINUTES.
* That is, [server 1] [server 2] /export/thatdir --NFS--> /target/dir /s2/source /source/dir --NFS-->/s2/source and cd [server 2]:/target/dir and unpack from /s2/source
I suppose I'll try logging into upstream's bugzilla using our official licensed id; maybe then they'll assign someone to look at it....
mark
On Wed, Jun 13, 2012 at 10:11 AM, m.roth@5-cent.us wrote:
Boris Epstein wrote:
On Sat, Jun 2, 2012 at 2:50 PM, John R. Dennison jrd@gerdesas.com
wrote:
On Sat, Jun 02, 2012 at 10:59:13AM -0400, Boris Epstein wrote:
<snip> > To be specific, I use UNFSD to export a MooseFS file system. MooseFS, by > the way, is userland-process based too. > > Be that as it may, I've seen situations where a comparably configured > MooseFS client get to read at, say, 40 MB/s - which is fine - but the > UNFSD at the same time reads at 40K/s(!) Why would that be? I mean, some > degradation I can dig but 3 orders of magnitude? What is with this? Am I > doing something wrong? <snip> I wonder... what's the architecture of what you're getting these results? I tried opening a bug with upstream over NFS4 and 6.x, and no one ever looked at it, and they closed it.
100% repeatably: unpack a package locally, seconds. unpack it from an NFS mount onto a local drive, about 1 min. unpack it from an NFS mount onto an NFS mount, even when the target is exported FROM THE SAME MACHINE* that the process is running on: 6.5 - 7 MINUTES.
- That is, [server 1] [server 2] /export/thatdir --NFS--> /target/dir /s2/source /source/dir --NFS-->/s2/source and cd [server 2]:/target/dir and unpack from /s2/source
I suppose I'll try logging into upstream's bugzilla using our official licensed id; maybe then they'll assign someone to look at it....
mark
Mark,
Thanks, my architecture is extremely similar to yours, except that in my case the "second layer", if I may say so, is MooseFS ( http://www.moosefs.org/ ), not NFS. MooseFS itself is blazing, by the way.
So the diagram in my case would look something like this:
/export/thatdir --NFS--> /target/dir /s2/source /source/dir -- MooseFS mount (mfsmount) -->/s2/source
The discrepancy in the resultant performance is comparable.
Thanks.
Boris.
On Thu, Jun 14, 2012 at 1:15 PM, Boris Epstein borepstein@gmail.com wrote:
On Wed, Jun 13, 2012 at 10:11 AM, m.roth@5-cent.us wrote:
Boris Epstein wrote:
On Sat, Jun 2, 2012 at 2:50 PM, John R. Dennison jrd@gerdesas.com
wrote:
On Sat, Jun 02, 2012 at 10:59:13AM -0400, Boris Epstein wrote:
<snip> > To be specific, I use UNFSD to export a MooseFS file system. MooseFS, by > the way, is userland-process based too. > > Be that as it may, I've seen situations where a comparably configured > MooseFS client get to read at, say, 40 MB/s - which is fine - but the > UNFSD at the same time reads at 40K/s(!) Why would that be? I mean, some > degradation I can dig but 3 orders of magnitude? What is with this? Am I > doing something wrong? <snip> I wonder... what's the architecture of what you're getting these results? I tried opening a bug with upstream over NFS4 and 6.x, and no one ever looked at it, and they closed it.
100% repeatably: unpack a package locally, seconds. unpack it from an NFS mount onto a local drive, about 1 min. unpack it from an NFS mount onto an NFS mount, even when the target is exported FROM THE SAME MACHINE* that the process is running on: 6.5 - 7 MINUTES.
- That is, [server 1] [server 2] /export/thatdir --NFS--> /target/dir /s2/source /source/dir --NFS-->/s2/source and cd [server 2]:/target/dir and unpack from /s2/source
I suppose I'll try logging into upstream's bugzilla using our official licensed id; maybe then they'll assign someone to look at it....
mark
Mark,
Thanks, my architecture is extremely similar to yours, except that in my case the "second layer", if I may say so, is MooseFS ( http://www.moosefs.org/ ), not NFS. MooseFS itself is blazing, by the way.
So the diagram in my case would look something like this:
/export/thatdir --NFS--> /target/dir /s2/source /source/dir -- MooseFS mount (mfsmount)
-->/s2/source
The discrepancy in the resultant performance is comparable.
Thanks.
Boris.
I may have discovered a fix. Still don't know why it is a fix - but for what it's worth...
OK, if you put your UNFSD daemon on a completely different physical machine - i.e., with no MooseFS component running on it - it seems to work just fine. For a single client I got a performance of about 70 MB/s over 1 Gbit/s network. When multiple (up to 5) clients) do their reads the performance seems to degrade roughly proportionally.
And this is strange. I've got MooseFS currently confined to just one machine (8 cores, 48 GB RAM): master server, meta server, chunk server, the whole thing. And that works fine. Add UNFSD - and it still works, and the load is still low (under 1) - and yet the UNFSD's performance goes down the drain. Why? I have no idea.
By the way, the autonomous UNFSD server is far from a powerful piece of software - all it is is a P5-class 2-core machine with 2 GB of RAM. So go figure...
Boris.
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when more then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in our case MooseFS ( http://www.moosefs.org/ ).
Boris.
On 06/01/2012 10:26 PM, Boris Epstein wrote:
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when more then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in our case MooseFS ( http://www.moosefs.org/ ).
You might take a look at GlusterFS for your distributed file system if most of your nodes are on the same 100mbit or 1Gbit network. GlusterFS is the new "big thing" that Red Hat is going to support and we use it on the CentOS infrastructure and like it quite well. It is also very easy to maintain and you can mount it via the glusterfs client or via NFS. It does not work real well across a slower internet like in multiple datacenters, but if your machines are all on a fast network with each other, I highly recommend it.
On Sat, Jun 2, 2012 at 6:16 AM, Johnny Hughes johnny@centos.org wrote:
On 06/01/2012 10:26 PM, Boris Epstein wrote:
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com
wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when
more
then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in our case MooseFS ( http://www.moosefs.org/ ).
You might take a look at GlusterFS for your distributed file system if most of your nodes are on the same 100mbit or 1Gbit network. GlusterFS is the new "big thing" that Red Hat is going to support and we use it on the CentOS infrastructure and like it quite well. It is also very easy to maintain and you can mount it via the glusterfs client or via NFS. It does not work real well across a slower internet like in multiple datacenters, but if your machines are all on a fast network with each other, I highly recommend it.
John,
I agree with you that GlusterFS is not bad - though neither is MooseFs, based on all accounts, and MooseFS is very simple and lightweight, which was why we chose it. At any rate, at this point this is what we are using. All we need is an NFS gateway that would scale to 10-20 sessions without losing too much performance.
And yes, it could be that it is my MooseFS that is underperforming - I am studying that possibility too.
Thanks!
Boris.
On 06/02/2012 02:16 PM, Boris Epstein wrote:
On Sat, Jun 2, 2012 at 6:16 AM, Johnny Hughes johnny@centos.org wrote:
On 06/01/2012 10:26 PM, Boris Epstein wrote:
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com
wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when
more
then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in our case MooseFS ( http://www.moosefs.org/ ).
You might take a look at GlusterFS for your distributed file system if most of your nodes are on the same 100mbit or 1Gbit network. GlusterFS is the new "big thing" that Red Hat is going to support and we use it on the CentOS infrastructure and like it quite well. It is also very easy to maintain and you can mount it via the glusterfs client or via NFS. It does not work real well across a slower internet like in multiple datacenters, but if your machines are all on a fast network with each other, I highly recommend it.
John,
I agree with you that GlusterFS is not bad - though neither is MooseFs, based on all accounts, and MooseFS is very simple and lightweight, which was why we chose it. At any rate, at this point this is what we are using. All we need is an NFS gateway that would scale to 10-20 sessions without losing too much performance.
And yes, it could be that it is my MooseFS that is underperforming - I am studying that possibility too.
MooseFS is really only designed to host large files and to be useful if you care about throughput but not latency. GlusterFS is going to perform much better as a regular filesystem due to its consistent hashing approach and is just as simple and lightweigt as MooseFS.
But why can't you mount MooseFS locally and then export it using the regular nfs implementation?
Regards, Dennis
PS: You might also take a look at Ceph at ceph.com and Sheepdog at www.osrg.net/sheepdog. Both two very interesting contenders. You can find some interesting benchmarks for a 1000 node Sheepdog cluster here: http://sheepdog.taobao.org/people/zituan/sheepdog1k.html
Regards, Dennis
On Sat, Jun 2, 2012 at 8:50 AM, Dennis Jacobfeuerborn <dennisml@conversis.de
wrote:
On 06/02/2012 02:16 PM, Boris Epstein wrote:
On Sat, Jun 2, 2012 at 6:16 AM, Johnny Hughes johnny@centos.org wrote:
On 06/01/2012 10:26 PM, Boris Epstein wrote:
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com
wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well
scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when
more
then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in
our
case MooseFS ( http://www.moosefs.org/ ).
You might take a look at GlusterFS for your distributed file system if most of your nodes are on the same 100mbit or 1Gbit network. GlusterFS is the new "big thing" that Red Hat is going to support and we use it on the CentOS infrastructure and like it quite well. It is also very easy to maintain and you can mount it via the glusterfs client or via NFS. It does not work real well across a slower internet like in multiple datacenters, but if your machines are all on a fast network with each other, I highly recommend it.
John,
I agree with you that GlusterFS is not bad - though neither is MooseFs, based on all accounts, and MooseFS is very simple and lightweight, which was why we chose it. At any rate, at this point this is what we are
using.
All we need is an NFS gateway that would scale to 10-20 sessions without losing too much performance.
And yes, it could be that it is my MooseFS that is underperforming - I am studying that possibility too.
MooseFS is really only designed to host large files and to be useful if you care about throughput but not latency. GlusterFS is going to perform much better as a regular filesystem due to its consistent hashing approach and is just as simple and lightweigt as MooseFS.
But why can't you mount MooseFS locally and then export it using the regular nfs implementation?
Regards, Dennis
PS: You might also take a look at Ceph at ceph.com and Sheepdog at www.osrg.net/sheepdog. Both two very interesting contenders. You can find some interesting benchmarks for a 1000 node Sheepdog cluster here: http://sheepdog.taobao.org/people/zituan/sheepdog1k.html
Regards, Dennis _______________________________________________
Dennis,
Thanks for a thoughtful reply.
I believe the regular NFS does not allow you to export non-local directories. That was so a few years ago; I didn't even check for myself this time around as people are saying this is still the case. Perhaps I should check.
When you are saying that MooseFS is high latency - what sort of latency should I expect when accessing a file, though? There's a whole community of happy MooseFS users out there; I am not sure they'd be so happy if you had to wait for 30 seconds to just start reading a file. We could tolerate some latency here, by the way.
Boris.
On Sat, Jun 2, 2012 at 8:50 AM, Dennis Jacobfeuerborn <dennisml@conversis.de
wrote:
On 06/02/2012 02:16 PM, Boris Epstein wrote:
On Sat, Jun 2, 2012 at 6:16 AM, Johnny Hughes johnny@centos.org wrote:
On 06/01/2012 10:26 PM, Boris Epstein wrote:
On Fri, Jun 1, 2012 at 6:36 PM, John R Pierce pierce@hogranch.com
wrote:
On 06/01/12 2:27 PM, Boris Epstein wrote:
I believe that unfsd (http://unfs3.sourceforge.net/ ) now does have multi-threaded capability and as such should be fairly well
scalable. I
am
using it on CentOS 6.2 and it seems to become all but unusable when
more
then 3-4 users connect to it. Is that normal? What sort of experience
have
other people had?
yeesh, wtf ?
latest version: 0.9.22 2009-01-05
WHY?!??! what problem is this supposed to solve over the built in native Linux NFS, which supports a lot more than just NFSv3?
maybe in 2003, when Linux NFS was sketchy, this made sense.
-- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
The native NFS only supports the local file system (on the local disk). What we have here is an NFS gateway to a distributed file system, in
our
case MooseFS ( http://www.moosefs.org/ ).
You might take a look at GlusterFS for your distributed file system if most of your nodes are on the same 100mbit or 1Gbit network. GlusterFS is the new "big thing" that Red Hat is going to support and we use it on the CentOS infrastructure and like it quite well. It is also very easy to maintain and you can mount it via the glusterfs client or via NFS. It does not work real well across a slower internet like in multiple datacenters, but if your machines are all on a fast network with each other, I highly recommend it.
John,
I agree with you that GlusterFS is not bad - though neither is MooseFs, based on all accounts, and MooseFS is very simple and lightweight, which was why we chose it. At any rate, at this point this is what we are
using.
All we need is an NFS gateway that would scale to 10-20 sessions without losing too much performance.
And yes, it could be that it is my MooseFS that is underperforming - I am studying that possibility too.
MooseFS is really only designed to host large files and to be useful if you care about throughput but not latency. GlusterFS is going to perform much better as a regular filesystem due to its consistent hashing approach and is just as simple and lightweigt as MooseFS.
But why can't you mount MooseFS locally and then export it using the regular nfs implementation?
Regards, Dennis
Dennis,
I just tried exporting a MooseFS partition using regular NFS and got the following:
Jun 2 10:32:24 fs1 rpc.mountd[2500]: Cannot export /mfs/mfs1, possibly unsupported filesystem or fsid= required
Boris.