incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Backup strategies in a multi DC cluster
Date Tue, 26 Mar 2013 17:51:15 GMT
> Assume you have four nodes and a snapshot is taken.  The following day if a node goes
down and data is corrupt through user error then how do you use the previouus nights snapshots?

> 
Not sure what is corrupt, the snapshot/backup or the data is incorrect through application
error.   

> Would you replace the faulty node first and then restore last nights snapshot?  What
happens if you don't have a replacement node? You won't be able to restore last nights snapshot.
> 

You would need to stop the entire cluster, and restore the snapshots on all nodes. 

If you restored the snapshot on just one node, new or old HW, it would have some data with
an older timestamp than the other nodes. Cassandra would see this as an inconsistency, that
the restored node missed some writes, and resolve the consistency be the most recent values.


> However if a virtual datacenter consisting of a backup node is used then the backup node
could be used regardless of the number of nodes in the datacentre.
> 

It depends on the failure scenario and what you are trying to protect against. 

If you have 4 nodes and one node fails the best thing to do is start a new node and let cassandra
stream the data from the other nodes. The new node could have the same token as the previous
failed node. So long as the /var/lib/cassandra/data/system dir is empty (and the node is not
a seed) it will join the cluster and ask the others for data. 

If you want to ensure availability then consider bigger clusters, e.g. 6 nodes with rf 3 allows
you to lose up to 2 nodes and stay up. Or a higher RF. (see http://thelastpickle.com/2011/06/13/Down-For-Me/)

It's tricky to protect agains application error creating bad data using just backups. You
may need to look at how you can replay events in your system and consider which parts of your
data model should be directly  mutates and which should be indirectly mutated by recording
changes in another part of the model. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/03/2013, at 8:19 AM, Jabbar Azam <ajazam@gmail.com> wrote:

> Thanks Aaron. I have a hypothetical question.
> 
> Assume you have four nodes and a snapshot is taken.  The following day if a node goes
down and data is corrupt through user error then how do you use the previouus nights snapshots?

> 
> Would you replace the faulty node first and then restore last nights snapshot?  What
happens if you don't have a replacement node? You won't be able to restore last nights snapshot.
> 
> However if a virtual datacenter consisting of a backup node is used then the backup node
could be used regardless of the number of nodes in the datacentre. Would there be any disadvantages
approach?  Sorry for the questions I want to understand all the options.
> 
> On 24 Mar 2013 17:45, "aaron morton" <aaron@thelastpickle.com> wrote:
>> There are advantages and disadvantages in both approaches. What are people doing
in their production systems?
> Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to get things
off node. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/03/2013, at 4:37 AM, Jabbar Azam <ajazam@gmail.com> wrote:
> 
>> Hello,
>> 
>> I've been experimenting with cassandra for quite a while now.
>> 
>> It's time for me to look at backups but I'm not sure what the best practice is. I
want to be able to recover the data to a point in time before any user or software errors.
>> 
>> We will have two datacentres with 4 servers and RF=3.
>> 
>> Each datacentre will have at most 1.6 TB(includes replication, LZ4 compression, using
test data) of data. That is ten years of data after which we will start purging. This amounts
to about 400MB of data generation per day.
>> 
>> I've read about users doing snapshots of individual nodes to S3(Netflix) and I've
read  about creating virtual datacentres (http://www.datastax.com/dev/blog/multi-datacenter-replication)
where each virtual datacentre contains a backup node.
>> 
>> There are advantages and disadvantages in both approaches. What are people doing
in their production systems?
>> 
>> 
>> 
>> 
>> -- 
>> Thanks
>> 
>> Jabbar Azam
> 


Mime
View raw message