Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of ajazam@gmail.com designates
 209.85.215.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <3CF90685-47E5-46B8-83C5-C68E849D8463@thelastpickle.com>
References: 
 <CAPqEvGH_tNvhipRFMmEzhpS6Q1v9=w1+gZiUC68Z4cYm-A=vjQ@mail.gmail.com>
	<3CF90685-47E5-46B8-83C5-C68E849D8463@thelastpickle.com>
Date: Sun, 24 Mar 2013 19:19:27 +0000
Message-ID: 
 <CAPqEvGHW4PWtbMpUHc4L0TrAcZ0J-KSSQX2jiuDt+7oUuyEkPQ@mail.gmail.com>
Subject: Re: Backup strategies in a multi DC cluster
From: Jabbar Azam <ajazam@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d04016b492aa34a04d8b091eb

--f46d04016b492aa34a04d8b091eb
Content-Type: text/plain; charset=ISO-8859-1

Thanks Aaron. I have a hypothetical question.

Assume you have four nodes and a snapshot is taken.  The following day if a
node goes down and data is corrupt through user error then how do you use
the previouus nights snapshots?

Would you replace the faulty node first and then restore last nights
snapshot?  What happens if you don't have a replacement node? You won't be
able to restore last nights snapshot.

However if a virtual datacenter consisting of a backup node is used then
the backup node could be used regardless of the number of nodes in the
datacentre. Would there be any disadvantages approach?  Sorry for the
questions I want to understand all the options.
On 24 Mar 2013 17:45, "aaron morton" <aaron@thelastpickle.com> wrote:

> There are advantages and disadvantages in both approaches. What are people
> doing in their production systems?
>
> Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to
> get things off node.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/03/2013, at 4:37 AM, Jabbar Azam <ajazam@gmail.com> wrote:
>
> Hello,
>
> I've been experimenting with cassandra for quite a while now.
>
> It's time for me to look at backups but I'm not sure what the best
> practice is. I want to be able to recover the data to a point in time
> before any user or software errors.
>
> We will have two datacentres with 4 servers and RF=3.
>
> Each datacentre will have at most 1.6 TB(includes replication, LZ4
> compression, using test data) of data. That is ten years of data after
> which we will start purging. This amounts to about 400MB of data generation
> per day.
>
> I've read about users doing snapshots of individual nodes to S3(Netflix)
> and I've read  about creating virtual datacentres (
> http://www.datastax.com/dev/blog/multi-datacenter-replication) where each
> virtual datacentre contains a backup node.
>
> There are advantages and disadvantages in both approaches. What are people
> doing in their production systems?
>
>
>
>
> --
> Thanks
>
> Jabbar Azam
>
>
>

--f46d04016b492aa34a04d8b091eb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Thanks Aaron. I have a hypothetical question. </p>
<p dir=3D"ltr">Assume you have four nodes and a snapshot is taken.=A0 The f=
ollowing day if a node goes down and data is corrupt through user error the=
n how do you use the previouus nights snapshots?=A0 </p>
<p dir=3D"ltr">Would you replace the faulty node first and then restore las=
t nights snapshot?=A0 What happens if you don&#39;t have a replacement node=
? You won&#39;t be able to restore last nights snapshot. </p>
<p dir=3D"ltr">However if a virtual datacenter consisting of a backup node =
is used then the backup node could be used regardless of the number of node=
s in the datacentre. Would there be any disadvantages approach?=A0 Sorry fo=
r the questions I want to understand all the options. </p>

<div class=3D"gmail_quote">On 24 Mar 2013 17:45, &quot;aaron morton&quot; &=
lt;<a href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&g=
t; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word"><blockquote type=3D"cite"><div dir=3D"l=
tr">There are advantages and disadvantages in both approaches. What are peo=
ple doing in their production systems?<br></div></blockquote><div><div dir=
=3D"ltr">
Generally a mix of snapshots+rsync or=A0<a href=3D"https://github.com/synac=
k/tablesnap" target=3D"_blank">https://github.com/synack/tablesnap</a>=A0to=
 get things off node.=A0</div></div><div dir=3D"ltr"><br></div><div dir=3D"=
ltr">Cheers</div>
<div dir=3D"ltr"><br></div><div dir=3D"ltr"><br></div><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://ww=
w.thelastpickle.com</a></div>
</div></span></div></span></div></span></div></span></div></div>
</div>

<br><div><div>On 23/03/2013, at 4:37 AM, Jabbar Azam &lt;<a href=3D"mailto:=
ajazam@gmail.com" target=3D"_blank">ajazam@gmail.com</a>&gt; wrote:</div><b=
r><blockquote type=3D"cite"><div dir=3D"ltr"><div><div><div><div>Hello,<br>=
<br>
</div>I&#39;ve been experimenting with cassandra for quite a while now.<br>=
<br></div>It&#39;s time for me to look at backups but I&#39;m not sure what=
 the best practice is. I want to be able to recover the data to a point in =
time before any user or software errors.<br>

<br></div>We will have two datacentres with 4 servers and RF=3D3.<br><br></=
div><div>Each datacentre will have at most 1.6 TB(includes replication, LZ4=
 compression, using test data) of data. That is ten years of data after whi=
ch we will start purging. This amounts to about 400MB of data generation pe=
r day.<br>

<br></div><div>I&#39;ve read about users doing snapshots of individual node=
s to S3(Netflix) and I&#39;ve read=A0 about creating virtual datacentres (<=
a href=3D"http://www.datastax.com/dev/blog/multi-datacenter-replication" ta=
rget=3D"_blank">http://www.datastax.com/dev/blog/multi-datacenter-replicati=
on</a>) where each virtual datacentre contains a backup node.<br>

<br></div><div>There are advantages and disadvantages in both approaches. W=
hat are people doing in their production systems?<br></div><div><br><br></d=
iv><div><br clear=3D"all"><div><br>-- <br><div dir=3D"ltr">Thanks<br>
<br>Jabbar Azam<br></div>
</div></div></div>
</blockquote></div><br></div></blockquote></div>

--f46d04016b492aa34a04d8b091eb--