cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jeff.ji...@crowdstrike.com>
Subject Re: Replicating Data Between Separate Data Centres
Date Mon, 14 Dec 2015 23:18:52 GMT

There is research into causal consistency and cassandra ( http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html
, for example), though you’ll note that it uses a fork ( https://github.com/wlloyd/eiger
) which is unlikely something you’d ever want to consider in production. Let’s pretend
like it doesn’t exist, and won’t in the near future. 

The typical approach here is to have multiple active datacenters and EACH_QUORUM writes, which
gives you the ability to have a full DC failure without impact. This also solves your fail-back
problem, because when the primary DC is restored, you simply run a repair. What part of EACH_QUORUM
is insufficient for your needs? The failure scenarios when the WAN link breaks and it impacts
local writes? 

Short of that, your ‘occasional snapshots and restore in case of emergency’ is going to
be your next-best-thing.


From:  Philip Persad
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, December 14, 2015 at 3:11 PM
To:  Cassandra Users
Subject:  Re: Replicating Data Between Separate Data Centres

Hi Jim, 

Thanks for taking the time to answer.  By Causal Consistency, what I mean is that I need strict
ordering of all related events which might have a causal relationship.  For example (albeit
slightly contrived), if we are looking at recording an event stream, it is very important
that the event creating a user be visible before the event which assigns a permissions to
a user.  However, I don't care at all about the ordering of the creation of two different
users.  This is what I mean by Causal Consistency.

This reason why LOCAL_QUORUM replication does not work for me, is because, while I can get
ordering guarantees about the order in which writes will become visible in the Primary DC,
I cannot get those guarantees about the Secondary DC.  As a result (to user another slightly
contrived example), if a user is created and then takes an action shortly before the failure
of the Primary DC, there are four possible situations with respect to what will be visible
in the Secondary DC:

1) Both events are visible in the Secondary DC
2) Neither event will be visible in the Secondary DC
3) The creation event is visible in the Secondary DC, but the action event is not
4) The action event is visible Secondary DC, but the creation event is not

States 1, 2, and 3 are all acceptable.  State 4 is not.  However, if I understand Cassandra
asynchronous DC replication correctly, I do not believe I get any guarantees that situation
4 will not happen.  Eventual Consistency promises to "eventually" settle into State 1.  However
"eventually" does me very little good if Godzilla steps on my Primary DC.  I'm willing to
accept loss of data which was created near to a disaster (States 2 and 3), but I cannot accept
the inconsistent history of events in State 4.

I have a mechanism outside of normal Cassandra replication which can give me the consistency
I need.  My problem is effectively with setting up a new recovery DC after the failure of
the primary.  How do I go about getting all of my data into a new, cluster?

Thanks,

-Phil

On Mon, Dec 14, 2015 at 1:06 PM, Jim Ancona <jim@anconafamily.com> wrote:
Could you define what you mean by Casual Consistency and explain why you think you won't have
that when using LOCAL_QUORUM? I ask because LOCAL_QUORUM and multiple data centers are the
way many of us handle DR, so I'd like to understand why it doesn't work for you.

I'm afraid I don't understand your scenario. Are you planning on building out a new recovery
DC *after* the primary has failed, or keeping two DCs in sync so that you can switch over
after a failure?

Jim

On Mon, Dec 14, 2015 at 2:59 PM, Philip Persad <philip.persad@gmail.com> wrote:
Hi, 

I'm currently looking at Cassandra in the context of Disaster Recovery.  I have 2 Data Centres,
one is the Primary and the other acts as a Standby.  There is a Cassandra cluster in each
Data Centre.  For the time being I'm running Cassandra 2.0.9.  Unfortunately, due to the nature
of my data, the consistency levels that I would get out of LOCAL_QUORUM writes followed by
asynchronous replication to the secondary data centre are insufficient.  In the event of a
failure, it is acceptable to lose some data, but I need Casual Consistency to be maintained.
 Since I don't have the luxury of performing nodetool repairs after Godzilla steps on my primary
data centre, I use more strictly ordered means of transporting events between the Data Centres
(Kafka for anyone who cares about that detail).

What I'm not sure about, is how to go about copying all the data in one Cassandra cluster
to a new cluster, either to bring up a new Standby Data Centre or as part of failing back
to the Primary after I pick up the pieces.  I'm thinking that I should either:

1. Do a snapshot (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html),
and then restore that snapshot on my new cluster (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html)

2. Join the new data centre to the existing cluster (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html).
 Then separate the two data centres into two individual clusters by doing . . . something???

Does anyone have any advice about how to tackle this problem?

Many thanks,

-Phil




Mime
View raw message