Aleksandar Milivojevic alex@milivojevic.org wrote:
Exactly, that was what got myself confused too. SAN doesn't provide "safe" concurent access to device by itself,
Agreed. But it does provide the ability for multiple hosts to target the same space, and handle _some_ that coherency on the storage end. That doesn't replace what needs to go on the host-end, but it can work in conjunction with it.
According to those I spoke to at CORAID, you could not have 2 systems accessing the same space. If you try to access space while it believes another is accessing it (such as a failed node), it won't work.
If this has now changed, please let me know. But the last time I discussed this, they did not implement certain features that you will find in multi-targettable SCSI, iSCSI and FC solutions.
They expect 100% host-side resolution of everything. E.g., there is a reason why SCSI-2 (SCSI/SAS), TCP (iSCSI), etc... is "safer" than Ethernet -- there are acknowledgements. With CORAID's solution, the hosts have to do extra checking to confirm buffers have been written, etc..., and it's not exactly fool-proof -- unlike SCSI-2, TCP, etc...
AoE does not address many of these issues from what I read just a few months ago. Things that SCSI-2 and TCP do!
you need to have cluster-aware file system running on top of it.
Of course. I never argued otherwise. I merely stated that the more you can address at the target, the less the host and the more efficient, higher-performance and "safer" the clustering can be.
From all the _lack_ of features in AoE, it doesn't leave me
with a warm'n fuzzy feeling. Every single rep I spoke to basically said consider AoE little better than Oracle's FireWire hack. They recommended I _never_ have 2 system use the same area, not even in a cluster setup, if I wanted SCSI/iSCSI/FC like switchover.
With SAN, one would always configure zones (on the switch) and/or LUN masking (on storage device) to prevent clients fighting for the storage and corrupting data.
And the CORAID does that too.
But at the same time, most multi-targettable SCSI/SAN solutions define various functions to ensure acknowledgement of buffer commits to disk, watchdog services to check if a node is no longer accessing the area (freeing up the lock so the failover system can mount read/write), etc... Others offer multiple read mounts to the same area from multiple systems, etc...
There is just a _darth_ of features in CORAID's protocol versus SCSI-2, TCP, etc... IMHO. Those drastically affect the ability to do "well designed clustering/fail-over" IMHO. If you press the CORAID people on them, they'll admit areas where they are deficient as a storage solution for a fail-over cluster.
As I said before, it's almost as bad as using Oracle's FireWire hack. It isn't anything like a typical SAN designed for fail-over as a target from multiple-hosts.
NAS offers safe concurent access (generally, there might be some NAS devices outthere that do not). NAS device will manage file system internally, and export it over NFS or SMB protocols to the clients.
Such NAS' are a combined host+storage aka "filer." They have many advantages over SAN -- especially in their fail-over and/or load-balancing capabilities.
It's going to be slower and less efficient than SAN device though (because of the upper protocol overhead),
Oh, it all depends on the design of the NAS. NetApp does a pretty damn fine job with their designs (long story).
and the set of features offered by file system might not be what would be available if file system was managed by client's operating system itself.
But there's many other benefits. But that is a larger discussion.
All I wanted people to know is that AoE doesn't have a lot of features you'll find in SCSI-2, TCP, etc... when it comes to using it as a fail-over storage solution. I would highly recommend you not use it as such.