Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 129A5F94E for ; Sun, 24 Mar 2013 19:19:57 +0000 (UTC) Received: (qmail 59668 invoked by uid 500); 24 Mar 2013 19:19:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59612 invoked by uid 500); 24 Mar 2013 19:19:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59603 invoked by uid 99); 24 Mar 2013 19:19:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Mar 2013 19:19:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ajazam@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Mar 2013 19:19:49 +0000 Received: by mail-la0-f44.google.com with SMTP id eb20so10147378lab.3 for ; Sun, 24 Mar 2013 12:19:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=esTyYHRQFSEbgnPHpbweeufGR9l5ewC0HdJdyhzhTXM=; b=wjItW74wo17y26bcfQc3I8vBI+rDYexfCigIUsFdPQU++Hx0rUBjA0L3/LUB4o4Zyk eeGqfgVSx2xHTBOdiIVRP/RiK+ZSMjZ9izBK036lLtcRpOjTK+jxx0v6AyK5MuurPRfW gPunhN1OqntD2SuD/ghA03dsJ3K5Y66Vy8DI9s9Sn+dv0qIyIqBIXmr8TnngdDOq3yQc ggWD5IJQxZIPtQjyvRpZFyGsviW6rMRtd5Rr75Ko5b8+n7vw7WZlOyns0wXCaUZvtaLP YGb7vGGH3xaECGhxBv4tb6mzyO8QFplCMHV1iFRinJAKB/BI3FgpWOKja96aqeG1pfYU qAoA== MIME-Version: 1.0 X-Received: by 10.112.88.5 with SMTP id bc5mr4678453lbb.50.1364152768174; Sun, 24 Mar 2013 12:19:28 -0700 (PDT) Received: by 10.112.19.10 with HTTP; Sun, 24 Mar 2013 12:19:27 -0700 (PDT) Received: by 10.112.19.10 with HTTP; Sun, 24 Mar 2013 12:19:27 -0700 (PDT) In-Reply-To: <3CF90685-47E5-46B8-83C5-C68E849D8463@thelastpickle.com> References: <3CF90685-47E5-46B8-83C5-C68E849D8463@thelastpickle.com> Date: Sun, 24 Mar 2013 19:19:27 +0000 Message-ID: Subject: Re: Backup strategies in a multi DC cluster From: Jabbar Azam To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d04016b492aa34a04d8b091eb X-Virus-Checked: Checked by ClamAV on apache.org --f46d04016b492aa34a04d8b091eb Content-Type: text/plain; charset=ISO-8859-1 Thanks Aaron. I have a hypothetical question. Assume you have four nodes and a snapshot is taken. The following day if a node goes down and data is corrupt through user error then how do you use the previouus nights snapshots? Would you replace the faulty node first and then restore last nights snapshot? What happens if you don't have a replacement node? You won't be able to restore last nights snapshot. However if a virtual datacenter consisting of a backup node is used then the backup node could be used regardless of the number of nodes in the datacentre. Would there be any disadvantages approach? Sorry for the questions I want to understand all the options. On 24 Mar 2013 17:45, "aaron morton" wrote: > There are advantages and disadvantages in both approaches. What are people > doing in their production systems? > > Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to > get things off node. > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 23/03/2013, at 4:37 AM, Jabbar Azam wrote: > > Hello, > > I've been experimenting with cassandra for quite a while now. > > It's time for me to look at backups but I'm not sure what the best > practice is. I want to be able to recover the data to a point in time > before any user or software errors. > > We will have two datacentres with 4 servers and RF=3. > > Each datacentre will have at most 1.6 TB(includes replication, LZ4 > compression, using test data) of data. That is ten years of data after > which we will start purging. This amounts to about 400MB of data generation > per day. > > I've read about users doing snapshots of individual nodes to S3(Netflix) > and I've read about creating virtual datacentres ( > http://www.datastax.com/dev/blog/multi-datacenter-replication) where each > virtual datacentre contains a backup node. > > There are advantages and disadvantages in both approaches. What are people > doing in their production systems? > > > > > -- > Thanks > > Jabbar Azam > > > --f46d04016b492aa34a04d8b091eb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Thanks Aaron. I have a hypothetical question.

Assume you have four nodes and a snapshot is taken.=A0 The f= ollowing day if a node goes down and data is corrupt through user error the= n how do you use the previouus nights snapshots?=A0

Would you replace the faulty node first and then restore las= t nights snapshot?=A0 What happens if you don't have a replacement node= ? You won't be able to restore last nights snapshot.

However if a virtual datacenter consisting of a backup node = is used then the backup node could be used regardless of the number of node= s in the datacentre. Would there be any disadvantages approach?=A0 Sorry fo= r the questions I want to understand all the options.

On 24 Mar 2013 17:45, "aaron morton" &= lt;aaron@thelastpickle.com&g= t; wrote:
There are advantages and disadvantages in both approaches. What are peo= ple doing in their production systems?
Generally a mix of snapshots+rsync or=A0https://github.com/synack/tablesnap=A0to= get things off node.=A0

Cheers


-----------------
Aaron Morton
Freelance Cassandra= Consultant
New Zealand


On 23/03/2013, at 4:37 AM, Jabbar Azam <ajazam@gmail.com> wrote:
Hello,
=
I've been experimenting with cassandra for quite a while now.
=
It's time for me to look at backups but I'm not sure what= the best practice is. I want to be able to recover the data to a point in = time before any user or software errors.

We will have two datacentres with 4 servers and RF=3D3.

Each datacentre will have at most 1.6 TB(includes replication, LZ4= compression, using test data) of data. That is ten years of data after whi= ch we will start purging. This amounts to about 400MB of data generation pe= r day.

I've read about users doing snapshots of individual node= s to S3(Netflix) and I've read=A0 about creating virtual datacentres (<= a href=3D"http://www.datastax.com/dev/blog/multi-datacenter-replication" ta= rget=3D"_blank">http://www.datastax.com/dev/blog/multi-datacenter-replicati= on) where each virtual datacentre contains a backup node.

There are advantages and disadvantages in both approaches. W= hat are people doing in their production systems?




--
Thanks

Jabbar Azam

--f46d04016b492aa34a04d8b091eb--