Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of chris.were@gmail.com
 designates 209.85.211.191 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:from:date:message-id
         :subject:to:content-type;
        b=b3EDk6mysYRmXICcKLmD8fsUv30qCLhT8rkFY+5h6wNKTromdy9bifKGREJ17HgMan
         m1qngI/PELyf8CkSXSY3CXz6ZeSljuPdvAtgs3PhAeOKyAKhmbm/QhBonLl4NH+h76eA
         rdDFyov1G4boEQJQqJMDrtfjx5ISul2d7n100=
MIME-Version: 1.0
Reply-To: chris@chriswere.com
In-Reply-To: <e06563880910061445v3692f42ej21439adbdfbab61f@mail.gmail.com>
References: <f05b8c590910042252q4b026d1ape3fa13a3c243f1d@mail.gmail.com>
	<e06563880910050522n2852c1d6pa765a53abe438c3b@mail.gmail.com>
	<c715e640910050605g180df53cue78ace13105b30c3@mail.gmail.com>
	<e06563880910050609h1a9f05d3s7dce13e2218b77e5@mail.gmail.com>
	<4ACA1D6D.9000303@rightscale.com>
 <e06563880910051015w7fb741e4v7251b6ce6313c692@mail.gmail.com>
	<f05b8c590910061128p7504918cpa965e77bf0014bcf@mail.gmail.com>
	<c715e640910061342r1c22b4d1q4b24315bbcd9af1c@mail.gmail.com>
	<e06563880910061445v3692f42ej21439adbdfbab61f@mail.gmail.com>
From: Chris Were <chris.were@gmail.com>
Date: Thu, 29 Oct 2009 20:50:10 +1030
Message-ID: <35bb42690910290320g1538e9dfveedd9e46c08852c7@mail.gmail.com>
Subject: Re: backing up data from cassandra
To: cassandra-user@incubator.apache.org
Content-Type: multipart/alternative; boundary=0016361e7b48c577a80477104270

--0016361e7b48c577a80477104270
Content-Type: text/plain; charset=ISO-8859-1

Is it possible to only backup selected column families?

On Wed, Oct 7, 2009 at 8:15 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> I don't really see "nodeprobe snapshot" and "mv snapshotdir/* livedir"
> as all that much harder, but maybe that's just me.
>
> for a cluster, just add dsh.
>
> -Jonathan
>
> On Tue, Oct 6, 2009 at 3:42 PM, Joe Van Dyk <joevandyk@gmail.com> wrote:
> > Sure not as easy as a "pg_dump db > dump.sql" and "psql db < dump.sql"
> > though.  Oh well.
> >
> >
> >
> > On Tue, Oct 6, 2009 at 11:28 AM, Edmond Lau <edmond@ooyala.com> wrote:
> >> Thanks for the replies guys.  It sounds like restoration via snapshots
> >> + some application-side logic to sanity check/repair any data around
> >> the snapshot time is the way to go.
> >>
> >> Edmond
> >>
> >> On Mon, Oct 5, 2009 at 10:15 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>> On Mon, Oct 5, 2009 at 11:23 AM, Thorsten von Eicken <
> tve@rightscale.com> wrote:
> >>>> Isn't the question about how you back up a cassandra cluster, not a
> >>>> single node?
> >>>
> >>> Sure, but the generalization is straightforward. :)
> >>>
> >>>> Can you snapshot the various nodes at different times or do
> >>>> they need to be synchronized?
> >>>
> >>> The closer the synchronization, the more consistent they will be.
> >>> (Since Cassandra is designed around eventual consistency, there's some
> >>> flexibility here.  Conversely, there's no way to tell the system
> >>> "don't accept any more writes until the snapshot is done.")
> >>>
> >>>> Is there a minimal set of nodes that are
> >>>> sufficient to back up?
> >>>
> >>> Assuming your replication is 100% up to date, backing up every N nodes
> >>> where N is the replication factor could be adequate in theory, but I
> >>> wouldn't recommend trying to be clever like that, since if you
> >>> "restored" from backup like that your system would be in a degraded
> >>> state and vulnerable to any of the restored nodes failing.
> >>>
> >>> -Jonathan
> >>>
> >>
> >
> >
> >
> > --
> > Joe Van Dyk
> > http://fixieconsulting.com
> >
>

--0016361e7b48c577a80477104270
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>Is it possible to only backup selected column families?</div><br><div =
class=3D"gmail_quote">On Wed, Oct 7, 2009 at 8:15 AM, Jonathan Ellis <span =
dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@gmail.com</a>&=
gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">I don&#39;t really see &quot;nodeprobe snap=
shot&quot; and &quot;mv snapshotdir/* livedir&quot;<br>
as all that much harder, but maybe that&#39;s just me.<br>
<br>
for a cluster, just add dsh.<br>
<font color=3D"#888888"><br>
-Jonathan<br>
</font><div><div></div><div class=3D"h5"><br>
On Tue, Oct 6, 2009 at 3:42 PM, Joe Van Dyk &lt;<a href=3D"mailto:joevandyk=
@gmail.com">joevandyk@gmail.com</a>&gt; wrote:<br>
&gt; Sure not as easy as a &quot;pg_dump db &gt; dump.sql&quot; and &quot;p=
sql db &lt; dump.sql&quot;<br>
&gt; though. =A0Oh well.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Oct 6, 2009 at 11:28 AM, Edmond Lau &lt;<a href=3D"mailto:edmo=
nd@ooyala.com">edmond@ooyala.com</a>&gt; wrote:<br>
&gt;&gt; Thanks for the replies guys. =A0It sounds like restoration via sna=
pshots<br>
&gt;&gt; + some application-side logic to sanity check/repair any data arou=
nd<br>
&gt;&gt; the snapshot time is the way to go.<br>
&gt;&gt;<br>
&gt;&gt; Edmond<br>
&gt;&gt;<br>
&gt;&gt; On Mon, Oct 5, 2009 at 10:15 AM, Jonathan Ellis &lt;<a href=3D"mai=
lto:jbellis@gmail.com">jbellis@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt; On Mon, Oct 5, 2009 at 11:23 AM, Thorsten von Eicken &lt;<a hr=
ef=3D"mailto:tve@rightscale.com">tve@rightscale.com</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt; Isn&#39;t the question about how you back up a cassandra c=
luster, not a<br>
&gt;&gt;&gt;&gt; single node?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Sure, but the generalization is straightforward. :)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Can you snapshot the various nodes at different times or d=
o<br>
&gt;&gt;&gt;&gt; they need to be synchronized?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; The closer the synchronization, the more consistent they will =
be.<br>
&gt;&gt;&gt; (Since Cassandra is designed around eventual consistency, ther=
e&#39;s some<br>
&gt;&gt;&gt; flexibility here. =A0Conversely, there&#39;s no way to tell th=
e system<br>
&gt;&gt;&gt; &quot;don&#39;t accept any more writes until the snapshot is d=
one.&quot;)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Is there a minimal set of nodes that are<br>
&gt;&gt;&gt;&gt; sufficient to back up?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Assuming your replication is 100% up to date, backing up every=
 N nodes<br>
&gt;&gt;&gt; where N is the replication factor could be adequate in theory,=
 but I<br>
&gt;&gt;&gt; wouldn&#39;t recommend trying to be clever like that, since if=
 you<br>
&gt;&gt;&gt; &quot;restored&quot; from backup like that your system would b=
e in a degraded<br>
&gt;&gt;&gt; state and vulnerable to any of the restored nodes failing.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; -Jonathan<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Joe Van Dyk<br>
&gt; <a href=3D"http://fixieconsulting.com" target=3D"_blank">http://fixiec=
onsulting.com</a><br>
&gt;<br>
</div></div></blockquote></div><br>

--0016361e7b48c577a80477104270--