Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAHO4itw2u=2SGi2ZqGkF-gaaFxybU6dvdUW-71uO9tqyyX2u8w@mail.gmail.com>
References: <1C31524A-3CDD-4B22-AC2D-7BAEA1908AE2@s1mbi0se.com.br>
	<CAEDUwd1FbaSHxTOMBNK-NxDQDpSRCoip=rLDFGShinmT-od+6w@mail.gmail.com>
	<CAHO4itw2u=2SGi2ZqGkF-gaaFxybU6dvdUW-71uO9tqyyX2u8w@mail.gmail.com>
Date: Sat, 7 Dec 2013 11:23:35 +0100
Message-ID: 
 <CAGUi11kNC6k+RhsLgSGMEfk6wOOeifxmu6NbboZ2JmDsCV0Wxg@mail.gmail.com>
Subject: Re: help on backup muiltinode cluster
From: Andre Sprenger <andre.sprenger@getanet.de>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8ffbac1dc7025504ecef2796

--e89a8ffbac1dc7025504ecef2796
Content-Type: text/plain; charset=ISO-8859-1

If you lose RF + 1 nodes the data that is replicated to only these nodes is
gone, good idea to have a recent backup than. Another situation is when you
deploy a bug in the software and start writing crap data to Cassandra.
Replication does not help and depending on the situation you need to
restore the backup.


2013/12/7 Jason Wee <peichieh@gmail.com>

> Hmm... cassandra fundamental key features like fault tolerant, durable and
> replication. Just out of curiousity, why would you want to do backup?
>
> /Jason
>
>
> On Sat, Dec 7, 2013 at 3:31 AM, Robert Coli <rcoli@eventbrite.com> wrote:
>
>> On Fri, Dec 6, 2013 at 6:41 AM, Amalrik Maia <amalrik@s1mbi0se.com.br>wrote:
>>
>>> hey guys, I'm trying to take backups of a multi-node cassandra and save
>>> them on S3.
>>> My idea is simply doing ssh to each server and use nodetool to create
>>> the snapshots then push then to S3.
>>>
>>
>> https://github.com/synack/tablesnap
>>
>> So is this approach recommended? my concerns are about inconsistencies
>>> that this approach can lead, since the snapshots are taken one by one and
>>> not in parallel.
>>> Should i worry about it or cassandra finds a way to deal with
>>> inconsistencies when doing a restore?
>>>
>>
>> The backup is as consistent as your cluster is at any given moment, which
>> is "not necessarily". Manual repair brings you closer to consistency, but
>> only on data present when the repair started.
>>
>> =Rob
>>
>
>

--e89a8ffbac1dc7025504ecef2796
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><br></div>If you lose RF<span style=3D"color:rgb(68,6=
8,68);font-family:arial,sans-serif;line-height:14.545454025268555px">=A0+ 1=
 nodes the data that is replicated to only these nodes is gone, good idea t=
o have a recent backup than. Another situation is when you deploy a bug in =
the software and start writing crap data to Cassandra. Replication does not=
 help and depending on the situation you need to restore the backup.</span>=
</div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/12/7 Jas=
on Wee <span dir=3D"ltr">&lt;<a href=3D"mailto:peichieh@gmail.com" target=
=3D"_blank">peichieh@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex">
<div dir=3D"ltr">Hmm... cassandra fundamental key features like fault toler=
ant, durable and replication. Just out of curiousity, why would you want to=
 do backup?<div><br></div><div>/Jason</div></div><div class=3D"gmail_extra"=
>

<br><br><div class=3D"gmail_quote">On Sat, Dec 7, 2013 at 3:31 AM, Robert C=
oli <span dir=3D"ltr">&lt;<a href=3D"mailto:rcoli@eventbrite.com" target=3D=
"_blank">rcoli@eventbrite.com</a>&gt;</span> wrote:<br><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex">

<div dir=3D"ltr">On Fri, Dec 6, 2013 at 6:41 AM, Amalrik Maia <span dir=3D"=
ltr">&lt;<a href=3D"mailto:amalrik@s1mbi0se.com.br" target=3D"_blank">amalr=
ik@s1mbi0se.com.br</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div=
 class=3D"gmail_quote">


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div style=3D"word-wrap:break-word"><div style=3D"margin:0=
px;font-size:12px">


hey guys, I&#39;m trying to take backups of a multi-node cassandra and save=
 them on S3.=A0</div><div style=3D"margin:0px;font-size:12px">My idea is si=
mply doing ssh to each server and use nodetool to create the snapshots then=
 push then to S3.=A0</div>


</div></blockquote><div><br></div><div><a href=3D"https://github.com/synack=
/tablesnap" target=3D"_blank">https://github.com/synack/tablesnap</a><br></=
div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border=
-left-style:solid;padding-left:1ex">


<div style=3D"word-wrap:break-word"><div style=3D"margin:0px;font-size:12px=
">So is this approach recommended? my concerns are about inconsistencies th=
at this approach can lead, since the snapshots are taken one by one and not=
 in parallel.=A0=A0</div>


<div style=3D"margin:0px;font-size:12px">Should i worry about it or cassand=
ra finds a way to deal with inconsistencies when doing a restore?</div></di=
v></blockquote><div><br></div><div>The backup is as consistent as your clus=
ter is at any given moment, which is &quot;not necessarily&quot;. Manual re=
pair brings you closer to consistency, but only on data present when the re=
pair started.</div>


<div><br></div><div>=3DRob=A0</div></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></div>

--e89a8ffbac1dc7025504ecef2796--