Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25D983477 for ; Sat, 30 Apr 2011 14:45:13 +0000 (UTC) Received: (qmail 55416 invoked by uid 500); 30 Apr 2011 14:45:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55392 invoked by uid 500); 30 Apr 2011 14:45:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55384 invoked by uid 99); 30 Apr 2011 14:45:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Apr 2011 14:45:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.214.44 is neither permitted nor denied by domain of oberman@civicscience.com) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Apr 2011 14:45:05 +0000 Received: by bwz13 with SMTP id 13so4335657bwz.31 for ; Sat, 30 Apr 2011 07:44:43 -0700 (PDT) Received: by 10.204.74.68 with SMTP id t4mr3222232bkj.22.1304174683284; Sat, 30 Apr 2011 07:44:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.49.23 with HTTP; Sat, 30 Apr 2011 07:44:23 -0700 (PDT) X-Originating-IP: [24.131.19.240] In-Reply-To: <05CEA178DD88EE4FA89EED77C245F8490E8E9BCE@msex85.morningstar.com> References: <08F21AB3-7C1E-4E8E-A5A9-72E3EF8DBB38@thelastpickle.com> <05CEA178DD88EE4FA89EED77C245F8490E8E9BCE@msex85.morningstar.com> From: William Oberman Date: Sat, 30 Apr 2011 10:44:23 -0400 Message-ID: Subject: Re: best way to backup To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d6454cb8e60304a223d466 --0016e6d6454cb8e60304a223d466 Content-Type: text/plain; charset=ISO-8859-1 Thanks, I think I'm getting some of the file layout/data structures now, so that helps with the backup strategy. I might still start simple, as it's usually harder to screw up simple, but at least I'll know where I can go with something more clever. will On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan < JEREMIAH.JORDAN@morningstar.com> wrote: > The files inside the keyspace folders are the SSTable. > > ------------------------------ > *From:* aaron morton [mailto:aaron@thelastpickle.com] > *Sent:* Friday, April 29, 2011 4:49 PM > *To:* user@cassandra.apache.org > *Subject:* Re: best way to backup > > William, > Some info on the sstables from me > http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ > > If you > want to know more check out the BigTable and original Facebook papers, > linked from the wiki > > Aaron > > On 29 Apr 2011, at 23:43, William Oberman wrote: > > Dumb question, but referenced twice now: which files are the SSTables and > why is backing them up incrementally a win? > > Or should I not bother to understand internals, and instead just roll with > the "backup my keyspace(s) and system in a compressed tar" strategy, as > while it may be excessive, it's guaranteed to work and work easily (which I > like, a great deal). > > will > > On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday < > daniel.doubleday@gmx.net> wrote: > >> What we are about to set up is a time machine like backup. This is more >> like an add on to the s3 backup. >> >> Our boxes have an additional larger drive for local backup. We create a >> new backup snaphot every x hours which hardlinks the files in the previous >> snapshot (bit like cassandras incremental_backups thing) and than we sync >> that snapshot dir with the cassandra data dir. We can do archiving / backup >> to external system from there without impacting the main data raid. >> >> But the main reason to do this is to have an 'omg we screwed up big time >> and deleted / corrupted data' recovery. >> >> On Apr 28, 2011, at 9:53 PM, William Oberman wrote: >> >> Even with N-nodes for redundancy, I still want to have backups. I'm an >> amazon person, so naturally I'm thinking S3. Reading over the docs, and >> messing with nodeutil, it looks like each new snapshot contains the previous >> snapshot as a subset (and I've read how cassandra uses hard links to avoid >> excessive disk use). When does that pattern break down? >> >> I'm basically debating if I can do a "rsync" like backup, or if I should >> do a compressed tar backup. And I obviously want multiple points in time. >> S3 does allow file versioning, if a file or file name is changed/resused >> over time (only matters in the rsync case). My only concerns with >> compressed tars is I'll have to have free space to create the archive and I >> get no "delta" space savings on the backup (the former is solved by not >> allowing the disk space to get so low and/or adding more nodes to bring down >> the space, the latter is solved by S3 being really cheap anyways). >> >> -- >> Will Oberman >> Civic Science, Inc. >> 3030 Penn Avenue., First Floor >> Pittsburgh, PA 15201 >> (M) 412-480-7835 >> (E) oberman@civicscience.com >> >> >> > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) oberman@civicscience.com > > > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) oberman@civicscience.com --0016e6d6454cb8e60304a223d466 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks, I think I'm getting some of the file layout/data structures now= , so that helps with the backup strategy. =A0I might still start simple, as= it's usually harder to screw up simple, but at least I'll know whe= re I can go with something more clever.

will

On Sat, Apr 30, 2011 = at 9:15 AM, Jeremiah Jordan <JEREMIAH.JORDAN@morningstar.com> wr= ote:
The files inside the keyspace folders are the=20 SSTable.


From: aaron morton=20 [mailto:aaron@= thelastpickle.com]
Sent: Friday, April 29, 2011 4:49=20 PM
To: user@cassandra.apache.org
Subject: Re: best way to=20 backup

William,=A0

If yo= u want to know more=20 check out the BigTable and original Facebook papers, linked from the wiki

Aaron

On 29 Apr 2011, at 23:43, William Oberman wrote:

Dumb question, but referenced twice now: which fi= les=20 are the SSTables and why is backing them up incrementally a win?

Or should I not bother to understand internals, and instead just rol= l=20 with the "backup my keyspace(s) and system in a compressed tar"= strategy, as=20 while it may be excessive, it's=A0guaranteed=A0to work and work easil= y=20 (which I like, a great deal).

will

On Fri, Apr 29, 2011 at 4:58 AM, Daniel Double= day <daniel.doubleday@gmx.net>=20 wrote:
What we are about to set up is a ti= me=20 machine like backup. This is more like an add on to the s3 backup.

Our boxes have an additional larger drive for local backup. We cre= ate a=20 new backup snaphot every x hours which hardlinks the files in the previ= ous=20 snapshot (bit like cassandras incremental_backups thing) and than we sy= nc=20 that snapshot dir with the cassandra data dir. We can do archiving / ba= ckup=20 to external system from there without impacting the main data raid.

But the main reason to do this is to have an 'omg we screwed u= p big=20 time and deleted / corrupted data' recovery.

On Apr 28, 2011, at 9:53 PM, William Oberman wrote:

Even with N-nodes for redundancy, I still wan= t to=20 have backups.=A0 I'm an amazon person, so naturally I'm think= ing=20 S3.=A0 Reading over the docs, and messing with nodeutil, it looks lik= e=20 each new snapshot contains the previous snapshot as a subset (and I&#= 39;ve=20 read how cassandra uses hard links to avoid excessive disk use).=A0= =20 When does that pattern break down?=A0

I'm basically debat= ing if=20 I can do a "rsync" like backup, or if I should do a compres= sed tar=20 backup.=A0 And I obviously want multiple points in time.=A0 S3 does= =20 allow file versioning, if a file or file name is changed/resused over= time=20 (only matters in the rsync case).=A0 My only concerns with compressed= =20 tars is I'll have to have free space to create the archive and I = get no=20 "delta" space savings on the backup (the former is solved b= y not allowing=20 the disk space to get so low and/or adding more nodes to bring down t= he=20 space, the latter is solved by S3 being really cheap anyways).

--
Will Oberman
Civic Science, Inc.
3030 Penn=20 Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E= ) oberman@civ= icscience.com




--
Will Oberman
Civic Science, Inc.
3030 Penn Av= enue.,=20 First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com=




--
Will Oberman
Civic S= cience, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(= M) 412-480-7835
(E) oberman@= civicscience.com
--0016e6d6454cb8e60304a223d466--