[CentOS] Keeping data on 2 servers in sync !

Mon Dec 5 23:13:45 UTC 2005

Denis Croombs <denis at croombs.org> wrote:
> I want to build 2 servers (both running samba) to provide
> file storage to 2 offices (approx 100 miles apart, linked
> via DSL) but all data writen to 1 server must also be saved
> to the other server. 

What do you want to synchronize?
Directory services?
User authentication?
Or files?

Directory and user authentication are very doable.
File services are typically not, or limited.

> Both servers would also allow users to access the data via
> a VPN thus allowing 1 office with a failed server to access
> the other server via the vpn and still see the data from
both
> offices.

You're going to _saturate_ your link between the two offices
if you start synchronizing files in real-time.  Unless you
have something like a 4+ T-1s, or a 6+Mbps SDSL (i.e.,
synchronous DSL -- 6Mbps upload as well as download), you're
going to find you're not going be able to do this in
"real-time."

At the best, you can use rsync to synchronize files at
several points in the day -- maybe lunch-time and middle of
the night.  To get real-time, you're going to find the load
on your network will be self-defeating.

> I currently have 1 server working but we want to add the
> second office to the system. (Currently 1 office has 10
users
> and the second office has 1 user connected via VPN ) but
the
> second office will have 20 within 12 months and the first
> will have 35 soon ))
> Has anyone done anything like this ?

Yes and no.

Yes, I've done it.  I've done it at high speed in the same
closet (or nearby closet with SAN) with the same storage. 
NFS/Samba with failover c/o Mission Critical Linux (now part
of RHEL).  You need multi-targetable hardware (not cheap).

But no, I haven't done it (nor would I do it) over a VPN
link.  You'll saturate it quickly.  File server clustering is
designed for high speed connections between servers and their
storage.

> I am currently reading & unpicking http://www.linux-ha.org/
> to see what that gives me.

"High-availability" (HA) doesn't necessarily mean "failover."
 Most HA implementations are for read-only web services,
destination NAT (DNAT), etc...

When you start talking failover of network file servers, then
you start talking mega-$$$, lots of bandwidth/synchronization
requirements, hardware, etc... to do it real-time.  GFS
reduced the $$$ and same closet requirement, but it also
expoentially increases bandwidth and other costs.

At best, you'll have to do it non-real-time, using something
like rsync.  It won't be fail-over at all.

> Any clues/comments very welcome, even if you think I am mad
> !

I don't think you're mad.  But I don't think you're aware of
what is all involved with real-time failover of network file
services.  And it's really going to be near impossible over
low-throughput Internet connections.

I'd look to non-real-time rsync instead, running off-hours. 
That's the best you can do unless you have a lot of bandwidth
and a lot of money.  The majority of the HA stuff will _not_
apply.  ;->

-- 
Bryan J. Smith                | Sent from Yahoo Mail
mailto:b.j.smith at ieee.org     |  (please excuse any
http://thebs413.blogspot.com/ |   missing headers)