incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Joining distinct clusters with the same schema together
Date Fri, 21 Jun 2013 07:57:37 GMT
> > Question 2: is this a sane strategy?
> 
> On its face my answer is "not... really"? 
I'd go with a solid no. 

Just because the the three independent clusters have a schema that looks the same does not
make them the same. The schema is a versioned document, you will not be able to merge them
by merging the DC's later without downtime. 

It will be easier to go with a multi DC setup from the start. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 6:36 AM, Eric Stevens <mightye@gmail.com> wrote:

> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
> 
> Doing replication manually sounds like a recipe for the DC's eventually getting subtly
out of sync with each other.  If a connection goes down between DC's, and you are taking data
at both, how will you catch each other up?  C* already offers that resolution for you, and
you'd have to work pretty hard to reproduce it for no obvious benefit that I can see.  
> 
> For minimum effort, definitely rely on Cassandra's well-tested codebase for this.
> 
> 
> 
> 
> On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli <rcoli@eventbrite.com> wrote:
> On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
> <fsareshwala@quantcast.com> wrote:
> > Each datacenter will have a cassandra cluster with a separate set of seeds
> > specific to that datacenter. However, the cluster name will be the same.
> >
> > Question 1: is this enough to guarentee that the three datacenters will have
> > distinct cassandra clusters as well? Or will one node in datacenter A still
> > somehow be able to join datacenter B's ring.
> 
> If they have network connectivity and the same cluster name, they are
> the same logical cluster. However if your nodes share tokens and you
> have auto_bootstrap=yes (the implicit default) the second node you
> attempt to start will refuse to start because you are trying to
> bootstrap it into the range of a live node.
> 
> > For now, we are planning on using our own relay mechanism to transfer
> > data changes from one datacenter to another.
> 
> Are you planning to use the streaming commitlog functionality for
> this? Not sure how you would capture all changes otherwise, except
> having your app just write the same thing to multiple places? Unless
> data timestamps are identical between clusters, otherwise identical
> data will not merge properly, as cassandra uses data timestamps to
> merge.
> 
> > Question 2: is this a sane strategy?
> 
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
> 
> > Question 3: eventually, we want to turn all these cassandra clusters into one
> > large multi-datacenter cluster. What's the best practice to do this? Should I
> > just add nodes from all datacenters to the list of seeds and let cassandra
> > resolve differences? Is there another way I don't know about?
> 
> If you are using NetworkTopologyStrategy and have the same cluster
> name for your isolated clusters, all you need to do is :
> 
> 1) configure NTS to store replicas on a per-datacenter basis
> 2) ensure that your nodes are in different logical data centers (by
> default, all nodes are in DC1/rack1)
> 3) ensure that clusters are able to reach each other
> 4) ensure that tokens do not overlap between clusters (the common
> technique with manual token assignment is that each node gets a range
> which is off-by-one)
> 5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
> 6) rolling restart (so the new seed list is picked up)
> 7) repair ("should" only be required if writes have not replicated via
> your out of band mechanism)
> 
> Vnodes change the picture slightly because the chance of your clusters
> having conflicting tokens increases with the number of token ranges
> you have.
> 
> =Rob
> 


Mime
View raw message