Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: 209.85.214.44 is neither permitted
 nor denied by domain of oberman@civicscience.com)
MIME-Version: 1.0
In-Reply-To: <05CEA178DD88EE4FA89EED77C245F8490E8E9BCE@msex85.morningstar.com>
References: <BANLkTimvjQ5=59aNMn5h=K5q=jJ3LZFRbQ@mail.gmail.com>
 <AB029B33-2FA4-49D7-AFD5-631EDD9443C6@gmx.net>
 <BANLkTinYrVeMcBSYtUr8nfeWgViTc7WxRQ@mail.gmail.com>
 <08F21AB3-7C1E-4E8E-A5A9-72E3EF8DBB38@thelastpickle.com>
 <05CEA178DD88EE4FA89EED77C245F8490E8E9BCE@msex85.morningstar.com>
From: William Oberman <oberman@civicscience.com>
Date: Sat, 30 Apr 2011 10:44:23 -0400
Message-ID: <BANLkTikuwfKF8ar4p6O7s-ve1Ae2UTiqLQ@mail.gmail.com>
Subject: Re: best way to backup
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e6d6454cb8e60304a223d466

--0016e6d6454cb8e60304a223d466
Content-Type: text/plain; charset=ISO-8859-1

Thanks, I think I'm getting some of the file layout/data structures now, so
that helps with the backup strategy.  I might still start simple, as it's
usually harder to screw up simple, but at least I'll know where I can go
with something more clever.

will

On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan <
JEREMIAH.JORDAN@morningstar.com> wrote:

>  The files inside the keyspace folders are the SSTable.
>
>  ------------------------------
> *From:* aaron morton [mailto:aaron@thelastpickle.com]
> *Sent:* Friday, April 29, 2011 4:49 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: best way to backup
>
> William,
> Some info on the sstables from me
> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
>  <http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/>If you
> want to know more check out the BigTable and original Facebook papers,
> linked from the wiki
>
>  <http://wiki.apache.org/cassandra/ArchitectureOverview>Aaron
>
>  On 29 Apr 2011, at 23:43, William Oberman wrote:
>
> Dumb question, but referenced twice now: which files are the SSTables and
> why is backing them up incrementally a win?
>
> Or should I not bother to understand internals, and instead just roll with
> the "backup my keyspace(s) and system in a compressed tar" strategy, as
> while it may be excessive, it's guaranteed to work and work easily (which I
> like, a great deal).
>
> will
>
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <
> daniel.doubleday@gmx.net> wrote:
>
>> What we are about to set up is a time machine like backup. This is more
>> like an add on to the s3 backup.
>>
>> Our boxes have an additional larger drive for local backup. We create a
>> new backup snaphot every x hours which hardlinks the files in the previous
>> snapshot (bit like cassandras incremental_backups thing) and than we sync
>> that snapshot dir with the cassandra data dir. We can do archiving / backup
>> to external system from there without impacting the main data raid.
>>
>> But the main reason to do this is to have an 'omg we screwed up big time
>> and deleted / corrupted data' recovery.
>>
>>  On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
>>
>>   Even with N-nodes for redundancy, I still want to have backups.  I'm an
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and
>> messing with nodeutil, it looks like each new snapshot contains the previous
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid
>> excessive disk use).  When does that pattern break down?
>>
>> I'm basically debating if I can do a "rsync" like backup, or if I should
>> do a compressed tar backup.  And I obviously want multiple points in time.
>> S3 does allow file versioning, if a file or file name is changed/resused
>> over time (only matters in the rsync case).  My only concerns with
>> compressed tars is I'll have to have free space to create the archive and I
>> get no "delta" space savings on the backup (the former is solved by not
>> allowing the disk space to get so low and/or adding more nodes to bring down
>> the space, the latter is solved by S3 being really cheap anyways).
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) oberman@civicscience.com
>>
>>
>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com
>
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

--0016e6d6454cb8e60304a223d466
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks, I think I&#39;m getting some of the file layout/data structures now=
, so that helps with the backup strategy. =A0I might still start simple, as=
 it&#39;s usually harder to screw up simple, but at least I&#39;ll know whe=
re I can go with something more clever.<div>

<br></div><div>will<br><br><div class=3D"gmail_quote">On Sat, Apr 30, 2011 =
at 9:15 AM, Jeremiah Jordan <span dir=3D"ltr">&lt;<a href=3D"mailto:JEREMIA=
H.JORDAN@morningstar.com">JEREMIAH.JORDAN@morningstar.com</a>&gt;</span> wr=
ote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">


<div style=3D"word-wrap:break-word">
<div dir=3D"ltr" align=3D"left"><span><font face=3D"Verdana" color=3D"#0000=
ff" size=3D"2">The files inside the keyspace folders are the=20
SSTable.</font></span></div><br>
<div lang=3D"en-us" dir=3D"ltr" align=3D"left">
<hr>
<font face=3D"Tahoma" size=3D"2"><b>From:</b> aaron morton=20
[mailto:<a href=3D"mailto:aaron@thelastpickle.com" target=3D"_blank">aaron@=
thelastpickle.com</a>] <br><b>Sent:</b> Friday, April 29, 2011 4:49=20
PM<br><b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_bl=
ank">user@cassandra.apache.org</a><br><b>Subject:</b> Re: best way to=20
backup<br></font><br></div><div><div></div><div class=3D"h5">
<div></div>William,=A0
<div><span style=3D"white-space:pre-wrap"></span>Some info on the=20
sstables from me=A0<a href=3D"http://thelastpickle.com/2011/04/28/Forces-of=
-Write-and-Read/" target=3D"_blank">http://thelastpickle.com/2011/04/28/For=
ces-of-Write-and-Read/</a></div>
<div><br></div>
<div><a href=3D"http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Rea=
d/" target=3D"_blank"></a><span style=3D"white-space:pre-wrap"></span>If yo=
u want to know more=20
check out the BigTable and original Facebook papers, linked from the wiki</=
div>
<div><br></div>
<div><a href=3D"http://wiki.apache.org/cassandra/ArchitectureOverview" targ=
et=3D"_blank"></a>Aaron</div>
<div><br>
<div>
<div>On 29 Apr 2011, at 23:43, William Oberman wrote:</div><br>
<blockquote type=3D"cite">Dumb question, but referenced twice now: which fi=
les=20
  are the SSTables and why is backing them up incrementally a win?
  <div><br></div>
  <div>Or should I not bother to understand internals, and instead just rol=
l=20
  with the &quot;backup my keyspace(s) and system in a compressed tar&quot;=
 strategy, as=20
  while it may be excessive, it&#39;s=A0guaranteed=A0to work and work easil=
y=20
  (which I like, a great deal).</div>
  <div><br></div>
  <div>will<br><br>
  <div class=3D"gmail_quote">On Fri, Apr 29, 2011 at 4:58 AM, Daniel Double=
day <span dir=3D"ltr">&lt;<a href=3D"mailto:daniel.doubleday@gmx.net" targe=
t=3D"_blank">daniel.doubleday@gmx.net</a>&gt;</span>=20
  wrote:<br>
  <blockquote class=3D"gmail_quote" style=3D"padding-left:1ex;margin:0px 0p=
x 0px 0.8ex;border-left:#ccc 1px solid">
    <div style=3D"word-wrap:break-word">What we are about to set up is a ti=
me=20
    machine like backup. This is more like an add on to the s3 backup.
    <div><br></div>
    <div>Our boxes have an additional larger drive for local backup. We cre=
ate a=20
    new backup snaphot every x hours which hardlinks the files in the previ=
ous=20
    snapshot (bit like cassandras incremental_backups thing) and than we sy=
nc=20
    that snapshot dir with the cassandra data dir. We can do archiving / ba=
ckup=20
    to external system from there without impacting the main data raid.</di=
v>
    <div><br></div>
    <div>But the main reason to do this is to have an &#39;omg we screwed u=
p big=20
    time and deleted / corrupted data&#39; recovery.</div>
    <div><br>
    <div>
    <div>
    <div>On Apr 28, 2011, at 9:53 PM, William Oberman wrote:</div><br></div=
>
    <div>
    <div></div>
    <div>
    <blockquote type=3D"cite">Even with N-nodes for redundancy, I still wan=
t to=20
      have backups.=A0 I&#39;m an amazon person, so naturally I&#39;m think=
ing=20
      S3.=A0 Reading over the docs, and messing with nodeutil, it looks lik=
e=20
      each new snapshot contains the previous snapshot as a subset (and I&#=
39;ve=20
      read how cassandra uses hard links to avoid excessive disk use).=A0=
=20
      When does that pattern break down?=A0 <br><br>I&#39;m basically debat=
ing if=20
      I can do a &quot;rsync&quot; like backup, or if I should do a compres=
sed tar=20
      backup.=A0 And I obviously want multiple points in time.=A0 S3 does=
=20
      allow file versioning, if a file or file name is changed/resused over=
 time=20
      (only matters in the rsync case).=A0 My only concerns with compressed=
=20
      tars is I&#39;ll have to have free space to create the archive and I =
get no=20
      &quot;delta&quot; space savings on the backup (the former is solved b=
y not allowing=20
      the disk space to get so low and/or adding more nodes to bring down t=
he=20
      space, the latter is solved by S3 being really cheap anyways).<br cle=
ar=3D"all"><br>-- <br>Will Oberman<br>Civic Science, Inc.<br>3030 Penn=20
      Avenue., First Floor<br>Pittsburgh, PA 15201<br>(M) <a href=3D"tel:41=
2-480-7835" value=3D"+14124807835" target=3D"_blank">412-480-7835</a><br>(E=
) <a href=3D"mailto:oberman@civicscience.com" target=3D"_blank">oberman@civ=
icscience.com</a><br>

</blockquote></div></div></div><br></div></div></blockquote></div><br><br c=
lear=3D"all"><br>-- <br>Will Oberman<br>Civic Science, Inc.<br>3030 Penn Av=
enue.,=20
  First Floor<br>Pittsburgh, PA 15201<br>(M) <a href=3D"tel:412-480-7835" v=
alue=3D"+14124807835" target=3D"_blank">412-480-7835</a><br>(E) <a href=3D"=
mailto:oberman@civicscience.com" target=3D"_blank">oberman@civicscience.com=
</a><br>

</div></blockquote></div><br></div></div></div></div>
</blockquote></div><br><br clear=3D"all"><br>-- <br>Will Oberman<br>Civic S=
cience, Inc.<br>3030 Penn Avenue., First Floor<br>Pittsburgh, PA 15201<br>(=
M) 412-480-7835<br>(E) <a href=3D"mailto:oberman@civicscience.com">oberman@=
civicscience.com</a><br>


</div>

--0016e6d6454cb8e60304a223d466--