cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Weller <swel...@ena.com>
Subject Re: Multi-Datacenter Deployment
Date Wed, 07 Jan 2015 20:57:37 GMT
Regions are designed to be completely separate from one other, so no, as far as I'm aware there
is no way to sync secondary storage data between them. I don't think you'd want to do that
anyway, as it defeats the purpose of maintaining an isolated cloud region from another.

- Si


________________________________________
From: Logan Barfield <lbarfield@tqhosting.com>
Sent: Wednesday, January 07, 2015 2:00 PM
To: dev@cloudstack.apache.org
Cc: users@cloudstack.apache.org
Subject: Re: Multi-Datacenter Deployment

A followup here:  You can't have secondary storage that spans regions (e.g,
templates/snapshots in sync), even with S3/Swift correct?  If not that's
another downside to regions on top of the account sync.

It seems like the best solution to prevent weird split-brain/HA issues
would be to have at least 3 databases set up as master/master/master with
quorum.  That way if two sites lose contact and re-establish there's a 2/1
majority saying the hosts are all reachable.  Would hopefully prevent the
ones that lost contact from kicking off HA immediately.  I don't even know
how feasible that would be; maybe with Galera?

Even then it would have to be on a table level since there would be a
conflict, for instance:
- Given sites 1, 2, and 3, where site 1 loses contact with site 2 and comes
back up
- Site 1: Thinks site 1 is up and site 2 is down
- Site 2: Thinks site 2 is up and 1 is down.
- Site 3: Thinks all sites are up.

In the above case the least harmful thing would be to push site 3 to the
other two, but since all three sites have different data it may just hang
instead.

This is going to drive me nuts. :D


Thank You,

Logan Barfield
Tranquil Hosting

On Wed, Jan 7, 2015 at 12:57 PM, Simon Weller <sweller@ena.com> wrote:

> See inline.
> ________________________________________
> From: Logan Barfield <lbarfield@tqhosting.com>
> Sent: Wednesday, January 07, 2015 11:43 AM
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Subject: Re: Multi-Datacenter Deployment
>
> I appreciate the explanation.  That seems to confirm what I was thinking,
> that until regions are working 100% we'll just have to make sure the
> DC-to-DC links are as stable/redundant as possible to prevent HA issues.
> If we increase the HA delay it shouldn't be a major issue, and it will
> still be better than nothing.
>
> For us is probably also makes sense to not worry about having management
> servers in each DC for now.  If we have a big enough outage in our primary
> DC to affect access to the management server we probably have bigger
> problems to worry about.
>
> > Yeah, I agree. Even with Mgmt down, it's not going to stop any existing
> services from running or functioning as long as the clusters are healthy.
>
> - Si
>
> Much appreciated!
>
>
> Thank You,
>
> Logan Barfield
> Tranquil Hosting
>
> On Wed, Jan 7, 2015 at 12:15 PM, Simon Weller <sweller@ena.com> wrote:
>
> > Logan,
> >
> > We currently run CS in multiple geographically separate DCs, and may be
> > able to give you a little insight into things.
> >
> > We run KVM in advanced networking mode, with CLVM clusters backed onto
> > Dell Compellent SANs. We currently have different DCs running different
> > zones per DC, in a single region. We've been running CS in production now
> > since 4.0 prior to regions, so that functionality (along with its
> > limitations) hasn't been something we've adopted yet. We run our
> Management
> > (With Multiple clustered nodes) out of 1 DC, and have a backup set of
> > Management Nodes in another DC should we need to invoke BCDR in the event
> > the primary Management nodes became unavailable.
> >
> > Your concerns regarding HA problems are founded. We run our own
> nationwide
> > MPLS backbone, and therefore have multiple high capacity bandwidth paths
> > between our different DCs, and even with that capacity and fault tolerant
> > design, we've seen issues where Management has attempted to invoke HA due
> > to brief loss of connectivity (typically due to maintenance or grooming
> > activity), and this can be quite problematic. VPN tunnels are going to be
> > very challenging for you, and you really need to look at VPLS or some
> other
> > technology that can layer on top of a resilient infrastructure with
> > multiple paths and fast failover (e.g. MPLS Fast Reroute).
> >
> > Ideally, regions should solve this with dedicated local management nodes,
> > but until the syncing is sorted out, and those newer releases are stable,
> > there isn't much option short of using a single region right now, short
> of
> > setting up a completely separate CS instances per DC.
> >
> > Hope this helps a little.
> >
> > - Si
> >
> > ________________________________________
> > From: Logan Barfield <lbarfield@tqhosting.com>
> > Sent: Tuesday, January 06, 2015 1:45 PM
> > To: dev@cloudstack.apache.org; users@cloudstack.apache.org
> > Subject: Multi-Datacenter Deployment
> >
> > We are currently running a single location CloudStack deployment:
> > - 1 Hardware firewall
> > - 1 Mangement/Database Server
> > - 1 NFS staging store (for S3 secondary storage)
> > - Ceph RBD for primary storage
> > - 4 Hypervisors
> > - 1 Zone/Pod/Cluster
> >
> > We are looking to expand our deployment to other datacenters, and I'm
> > trying to determine the best way to go about it.  The documentation is a
> > bit lacking for multi-site deployments.
> >
> > Our goal for the multi-site deployment is to have a zone for each site
> > (E.G. US East, US West, Europe) that our customers can use to deploy
> > instances in their preferred geographic area.
> >
> > Since we don't want to have different accounts for every datacenter, I
> > don't think using Regions makes sense for us (and I'm not sure what
> they're
> > actually good for without keeping accounts/users/domains in sync).
> >
> > Right now I'm thinking our setup will be as follows:
> > - Firewall, Management Server, NFS staging server, primary storage, and
> > Hypervisors in each datacenter.
> > - All Management servers will be on the same management network.
> > - Management servers will be connected via site-to-site VPN links over
> WAN.
> > - MySQL replication (Percona?) will be set up on the management servers.
> > Having an odd number of servers to protect against split brain, and
> keeping
> > redundant database backups.
> > - One region (default)
> > - One zone for each datacenter
> > - Geo-enabled DNS to direct customers to the nearest Management server
> > - Object storage for secondary storage across cloud.
> >
> > My primary concerns with this setup are:
> > - I haven't really seen multi-site deployments details anywhere.
> > - Potential for split-brain.
> > - How will HA be handled (e.g., if a VPN link goes down and one of the
> > remote management servers can't contact a host, will it try to initiate
> > HA?) - This sort of goes along with the split brain problem.
> >
> > Are my assumptions here sound, or is there a standard recommended way of
> > doing multi-site deployments?
> >
> > Any suggestions are much appreciated.
> >
>

Mime
View raw message