incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jabbar Azam <aja...@gmail.com>
Subject Re: Backup strategies in a multi DC cluster
Date Tue, 26 Mar 2013 18:23:00 GMT
Thank you for your feedback.  I'll speak to the dev guys and come up with
something appropriate.
On 26 Mar 2013 17:51, "aaron morton" <aaron@thelastpickle.com> wrote:

> Assume you have four nodes and a snapshot is taken.  The following day if
> a node goes down and data is corrupt through user error then how do you use
> the previouus nights snapshots?
>
> Not sure what is corrupt, the snapshot/backup or the data is incorrect
> through application error.
>
> Would you replace the faulty node first and then restore last nights
> snapshot?  What happens if you don't have a replacement node? You won't be
> able to restore last nights snapshot.
>
> You would need to stop the entire cluster, and restore the snapshots on
> all nodes.
> If you restored the snapshot on just one node, new or old HW, it would
> have some data with an older timestamp than the other nodes. Cassandra
> would see this as an inconsistency, that the restored node missed some
> writes, and resolve the consistency be the most recent values.
>
> However if a virtual datacenter consisting of a backup node is used then
> the backup node could be used regardless of the number of nodes in the
> datacentre.
>
>
> It depends on the failure scenario and what you are trying to protect
> against.
>
> If you have 4 nodes and one node fails the best thing to do is start a new
> node and let cassandra stream the data from the other nodes. The new node
> could have the same token as the previous failed node. So long as the
> /var/lib/cassandra/data/system dir is empty (and the node is not a seed) it
> will join the cluster and ask the others for data.
>
> If you want to ensure availability then consider bigger clusters, e.g. 6
> nodes with rf 3 allows you to lose up to 2 nodes and stay up. Or a higher
> RF. (see http://thelastpickle.com/2011/06/13/Down-For-Me/)
>
> It's tricky to protect agains application error creating bad data using
> just backups. You may need to look at how you can replay events in your
> system and consider which parts of your data model should be directly
>  mutates and which should be indirectly mutated by recording changes in
> another part of the model.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/03/2013, at 8:19 AM, Jabbar Azam <ajazam@gmail.com> wrote:
>
> Thanks Aaron. I have a hypothetical question.
>
> Assume you have four nodes and a snapshot is taken.  The following day if
> a node goes down and data is corrupt through user error then how do you use
> the previouus nights snapshots?
>
> Would you replace the faulty node first and then restore last nights
> snapshot?  What happens if you don't have a replacement node? You won't be
> able to restore last nights snapshot.
>
> However if a virtual datacenter consisting of a backup node is used then
> the backup node could be used regardless of the number of nodes in the
> datacentre. Would there be any disadvantages approach?  Sorry for the
> questions I want to understand all the options.
> On 24 Mar 2013 17:45, "aaron morton" <aaron@thelastpickle.com> wrote:
>
>> There are advantages and disadvantages in both approaches. What are
>> people doing in their production systems?
>>
>> Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to
>> get things off node.
>>
>> Cheers
>>
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 23/03/2013, at 4:37 AM, Jabbar Azam <ajazam@gmail.com> wrote:
>>
>> Hello,
>>
>> I've been experimenting with cassandra for quite a while now.
>>
>> It's time for me to look at backups but I'm not sure what the best
>> practice is. I want to be able to recover the data to a point in time
>> before any user or software errors.
>>
>> We will have two datacentres with 4 servers and RF=3.
>>
>> Each datacentre will have at most 1.6 TB(includes replication, LZ4
>> compression, using test data) of data. That is ten years of data after
>> which we will start purging. This amounts to about 400MB of data generation
>> per day.
>>
>> I've read about users doing snapshots of individual nodes to S3(Netflix)
>> and I've read  about creating virtual datacentres (
>> http://www.datastax.com/dev/blog/multi-datacenter-replication) where
>> each virtual datacentre contains a backup node.
>>
>> There are advantages and disadvantages in both approaches. What are
>> people doing in their production systems?
>>
>>
>>
>>
>> --
>> Thanks
>>
>> Jabbar Azam
>>
>>
>>
>

Mime
View raw message