cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject Re: best way to backup
Date Sat, 30 Apr 2011 14:44:23 GMT
Thanks, I think I'm getting some of the file layout/data structures now, so
that helps with the backup strategy.  I might still start simple, as it's
usually harder to screw up simple, but at least I'll know where I can go
with something more clever.

will

On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan <
JEREMIAH.JORDAN@morningstar.com> wrote:

>  The files inside the keyspace folders are the SSTable.
>
>  ------------------------------
> *From:* aaron morton [mailto:aaron@thelastpickle.com]
> *Sent:* Friday, April 29, 2011 4:49 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: best way to backup
>
> William,
> Some info on the sstables from me
> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
>  <http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/>If you
> want to know more check out the BigTable and original Facebook papers,
> linked from the wiki
>
>  <http://wiki.apache.org/cassandra/ArchitectureOverview>Aaron
>
>  On 29 Apr 2011, at 23:43, William Oberman wrote:
>
> Dumb question, but referenced twice now: which files are the SSTables and
> why is backing them up incrementally a win?
>
> Or should I not bother to understand internals, and instead just roll with
> the "backup my keyspace(s) and system in a compressed tar" strategy, as
> while it may be excessive, it's guaranteed to work and work easily (which I
> like, a great deal).
>
> will
>
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <
> daniel.doubleday@gmx.net> wrote:
>
>> What we are about to set up is a time machine like backup. This is more
>> like an add on to the s3 backup.
>>
>> Our boxes have an additional larger drive for local backup. We create a
>> new backup snaphot every x hours which hardlinks the files in the previous
>> snapshot (bit like cassandras incremental_backups thing) and than we sync
>> that snapshot dir with the cassandra data dir. We can do archiving / backup
>> to external system from there without impacting the main data raid.
>>
>> But the main reason to do this is to have an 'omg we screwed up big time
>> and deleted / corrupted data' recovery.
>>
>>  On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
>>
>>   Even with N-nodes for redundancy, I still want to have backups.  I'm an
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and
>> messing with nodeutil, it looks like each new snapshot contains the previous
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid
>> excessive disk use).  When does that pattern break down?
>>
>> I'm basically debating if I can do a "rsync" like backup, or if I should
>> do a compressed tar backup.  And I obviously want multiple points in time.
>> S3 does allow file versioning, if a file or file name is changed/resused
>> over time (only matters in the rsync case).  My only concerns with
>> compressed tars is I'll have to have free space to create the archive and I
>> get no "delta" space savings on the backup (the former is solved by not
>> allowing the disk space to get so low and/or adding more nodes to bring down
>> the space, the latter is solved by S3 being really cheap anyways).
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) oberman@civicscience.com
>>
>>
>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com
>
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Mime
View raw message