Hey everyone,
Is anybody aware of a /true/ active/active multi-head and multi-target clustered iSCSI daemon?
IE:
Server 1: Hostname: host1.test.com IP Address: 10.0.0.1
Server 2: Hostname: host2.test.com IP Address: 10.0.0.2
Then they would utilize a CLVM disk between them, let's call that VG "disk" and then directly map each LUN (1,2,3,4,etc) to LV's named 1,2,3,4,... and so on.
That would essentially make both host1 and host2 identical iSCSI targets. So I could hook up 30 servers to each LUN on each host1 and host2 (assuming those block devices had a cluster aware filesystem on them).
However, what I wan't to accomplish is true active/active MPIO utilizing the dm driver to combine LUN 1 from both host1 and host2 as a single /dev/dm-0 device in active/active mode.
This would allow me to multipath across multiple fabrics, develop true high availability and increase my throughput significantly.
Any idea of an iscsi tgtd that supports this? As far as I can tell, none do (that I can find). I know some proprietary vendors have this type of functionality, which may or may not be using iSCSI code (but that's a whole set of arguments for later..).
Ultimately, my goal is this.
I want to take my existing Ceph cluster, expose a handful of RBDs from it to two iSCSI heads running C6 or RHEL6 or what have you, so I can use my inexpensive storage built on Linux -- for my Windows 2008 machines.
Thoughts?
Steven Crothers steven.crothers@gmail.com
On 11/16/2012 10:02 PM, Steven Crothers wrote:
Any idea of an iscsi tgtd that supports this? As far as I can tell, none do (that I can find). I know some proprietary vendors have this type of functionality, which may or may not be using iSCSI code (but that's a whole set of arguments for later..).
There's a reason that those proprietary vendors are able to charge big $$$ for this functionality.
On Sat, Nov 17, 2012 at 12:34 AM, Ian Pilcher arequipeno@gmail.com wrote:
There's a reason that those proprietary vendors are able to charge big $$$ for this functionality.
That's the truth... I was hoping they were based off some open source implementation of iSCSI somewhere.
I mean I could probably dedicate a single machine to run iSCSI and just schedule downtime, but that's something I wanted to avoid.
I've been looking at something like Open-E, but it's active/passive with what is essentially a DRBD link between them. Again, not ideal. Speaking of which, why do people rely so much on DRBD for LAN deployments lately? Everyone always seems to cheap out and setup DRBD/Pacemaker/Heartbeat/*insert some HA software here* instead of using proper clustered file systems. DRBD to me has always screamed WAN replication. Maybe I just don't put enough value in it, who knows.
Anyway, back to my hunt for a way to implement my Ceph cluster on Windows 2008.
On 11/17/2012 06:08 PM, Steven Crothers wrote:
On Sat, Nov 17, 2012 at 12:34 AM, Ian Pilcher arequipeno@gmail.com wrote:
There's a reason that those proprietary vendors are able to charge big $$$ for this functionality.
That's the truth... I was hoping they were based off some open source implementation of iSCSI somewhere.
I mean I could probably dedicate a single machine to run iSCSI and just schedule downtime, but that's something I wanted to avoid.
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or rgmanager and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding with each leg on two different switches and you get network HA as well.
I've done this to provide storage to a cluster of VMs and I could even fail the primary node and the backup would take over without losing any of my VMs.
I didn't speak up earlier because of all the other features you asked for, but this will at least give you your HA requirements.
I've been looking at something like Open-E, but it's active/passive with what is essentially a DRBD link between them. Again, not ideal. Speaking of which, why do people rely so much on DRBD for LAN deployments lately? Everyone always seems to cheap out and setup DRBD/Pacemaker/Heartbeat/*insert some HA software here* instead of using proper clustered file systems. DRBD to me has always screamed WAN replication. Maybe I just don't put enough value in it, who knows.
Anyway, back to my hunt for a way to implement my Ceph cluster on Windows 2008.
Clustered filesystems like GFS2 and OCFS2 come at a non-trivial performance hit. It's usually a case of avoiding them when possible. Using DRBD is not "cheaping out". I prefer it to fancy SANs as it's more HA than a SAN.
On Sat, Nov 17, 2012 at 6:23 PM, Digimer lists@alteeve.ca wrote:
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or rgmanager and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding with each leg on two different switches and you get network HA as well.
There is nothing active/active about DRBD though, it also doesn't solve the problem of trying to utilize two heads.
It's just failover. Nothing more.
I'm looking for an active/active failover scenario, to utilize the multiple physical paths for additional throughput and bandwidth. Yes, I know I can add more nics. More nics doesn't provide failover of the physical node.
On 11/17/12 6:58 PM, Steven Crothers wrote:
On Sat, Nov 17, 2012 at 6:23 PM, Digimerlists@alteeve.ca wrote:
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or rgmanager and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding with each leg on two different switches and you get network HA as well.
There is nothing active/active about DRBD though, it also doesn't solve the problem of trying to utilize two heads.
It's just failover. Nothing more.
I'm looking for an active/active failover scenario, to utilize the multiple physical paths for additional throughput and bandwidth. Yes, I know I can add more nics. More nics doesn't provide failover of the physical node
any sort of active-active storage system has difficult issues with concurrent operations ...
On 11/17/2012 10:40 PM, John R Pierce wrote:
On 11/17/12 6:58 PM, Steven Crothers wrote:
On Sat, Nov 17, 2012 at 6:23 PM, Digimerlists@alteeve.ca wrote:
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or rgmanager and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding with each leg on two different switches and you get network HA as well.
There is nothing active/active about DRBD though, it also doesn't solve the problem of trying to utilize two heads.
It's just failover. Nothing more.
I'm looking for an active/active failover scenario, to utilize the multiple physical paths for additional throughput and bandwidth. Yes, I know I can add more nics. More nics doesn't provide failover of the physical node
any sort of active-active storage system has difficult issues with concurrent operations ...
Exactly what is discussed here, as linked in my other reply;
http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath...
DRBD is not active/active. I cannot utilize both server's as an active session. DRBD replication latency will, in-fact, break my storage.
I do not want active/passive or "hot-standby" failover... DRBD is offtopic from my original post, as it is not the correct solution.
Steven Crothers steven.crothers@gmail.com
On Sun, Nov 18, 2012 at 12:07 AM, Digimer lists@alteeve.ca wrote:
On 11/17/2012 10:40 PM, John R Pierce wrote:
On 11/17/12 6:58 PM, Steven Crothers wrote:
On Sat, Nov 17, 2012 at 6:23 PM, Digimerlists@alteeve.ca wrote:
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or
rgmanager
and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding
with
each leg on two different switches and you get network HA as well.
There is nothing active/active about DRBD though, it also doesn't solve
the
problem of trying to utilize two heads.
It's just failover. Nothing more.
I'm looking for an active/active failover scenario, to utilize the
multiple
physical paths for additional throughput and bandwidth. Yes, I know I
can
add more nics. More nics doesn't provide failover of the physical node
any sort of active-active storage system has difficult issues with concurrent operations ...
Exactly what is discussed here, as linked in my other reply;
http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath...
-- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 11/17/2012 09:46 PM, Steven Crothers wrote:
DRBD is off topic from my original post, as it is not the correct solution.
That may be, but the link you were given was not off topic, entirely:
"iSCSI is a stateful protocol, there is more to it that than just reads and writes. To run multipath (or multi-connections per session) against distinct targets on separate nodes you’d need to have cluster aware iSCSI targets which coordinate with each other in some fashion. To my knowledge, this does not exist (not for Linux, anyways)."
It sounds like that answer is relevant to your question. What you want does not currently exist.
On 11/17/2012 09:58 PM, Steven Crothers wrote:
On Sat, Nov 17, 2012 at 6:23 PM, Digimer <lists@alteeve.ca mailto:lists@alteeve.ca> wrote:
You could take two nodes, setup DRBD to replicate the data (synchronously), manage a floating/virtual IP in pacemaker or rgmanager and export the DRBD storage as an iSCSI LUN using tgtd. Then you can migrate to the backup node, take down the primary node for maintenance and restore with minimal/no downtime. Run this over mode=1 bonding with each leg on two different switches and you get network HA as well.
There is nothing active/active about DRBD though, it also doesn't solve the problem of trying to utilize two heads.
It's just failover. Nothing more.
I'm looking for an active/active failover scenario, to utilize the multiple physical paths for additional throughput and bandwidth. Yes, I know I can add more nics. More nics doesn't provide failover of the physical node.
First, you can run DRBD in dual-primary (aka, Active/Active) just fine. It will faithfully replicate in real time and in both directions. Of course, then you need something to synchronize the data at the logical level (DRBD is just a block device), and that is where GFS2 or OCFS2 comes in, though the performance hit will go counter to your goals.
You could do multi-path to both nodes, technically, but it's not wise because the cache on the storage can cause problems[1].
Also, you will note that I suggested mode=1, which is Active/Passive bonding, which provides no aggregated bandwidth. This was on purpose; I've tested all modes and *only* mode=1 failed and recovered without interruption reliably.
As for failover, if you run DRBD in dual-primary, but keep access through one node at a time only, the only thing that is needed to migrate after the failure of the node that had the IP is to fence the node, take over the IP and start tgtd. This can happen quickly and, in my tests, iSCSI on the clients recovered fine. In my case, I had the LUNs acting as PVs in a clustered LVM with each LV backing a VM. None of the VMs failed or needed to be rebooted.
So for what I can gather of your needs, you can get everything you want from open-source. The only caveat is that if you need more speed, you need to beef up your network, not aggregate (for reasons not related to HA), If this is not good enough, then there are plenty of commercial products ready to lighten your wallet by good measure.
Digimer
1. http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath...
On 11/17/12, Steven Crothers steven.crothers@gmail.com wrote:
Hey everyone,
Is anybody aware of a /true/ active/active multi-head and multi-target clustered iSCSI daemon?
Hi Steven,
If i'm correct you are looking for a shared storage clustering setup with HA. Does this help http://www.quadstor.com/tech-library/135-high-availability-with-shared-stora...
Please note that i have only used their software in a stand alone installation and cannot give feedback on the clustering feature. Also i'm not sure if its works LVM - check with them.
jb
IE:
Server 1: Hostname: host1.test.com IP Address: 10.0.0.1
Server 2: Hostname: host2.test.com IP Address: 10.0.0.2
Then they would utilize a CLVM disk between them, let's call that VG "disk" and then directly map each LUN (1,2,3,4,etc) to LV's named 1,2,3,4,... and so on.
That would essentially make both host1 and host2 identical iSCSI targets. So I could hook up 30 servers to each LUN on each host1 and host2 (assuming those block devices had a cluster aware filesystem on them).
However, what I wan't to accomplish is true active/active MPIO utilizing the dm driver to combine LUN 1 from both host1 and host2 as a single /dev/dm-0 device in active/active mode.
This would allow me to multipath across multiple fabrics, develop true high availability and increase my throughput significantly.
Any idea of an iscsi tgtd that supports this? As far as I can tell, none do (that I can find). I know some proprietary vendors have this type of functionality, which may or may not be using iSCSI code (but that's a whole set of arguments for later..).
Ultimately, my goal is this.
I want to take my existing Ceph cluster, expose a handful of RBDs from it to two iSCSI heads running C6 or RHEL6 or what have you, so I can use my inexpensive storage built on Linux -- for my Windows 2008 machines.
Thoughts?
Steven Crothers steven.crothers@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos