cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: best way to backup
Date Thu, 28 Apr 2011 20:35:42 GMT
one thing we're looking at doing is watching the cassandra data directory and backing up the
sstables to s3 when they are created.  Some guys at simplegeo started tablesnap that does
this:
https://github.com/simplegeo/tablesnap

What it does is for every sstable that is pushed to s3, it also copies a json file with the
current files in the directory, so you can know what to restore in that event (as far as I
understand).

On Apr 28, 2011, at 2:53 PM, William Oberman wrote:

> Even with N-nodes for redundancy, I still want to have backups.  I'm an amazon person,
so naturally I'm thinking S3.  Reading over the docs, and messing with nodeutil, it looks
like each new snapshot contains the previous snapshot as a subset (and I've read how cassandra
uses hard links to avoid excessive disk use).  When does that pattern break down?  
> 
> I'm basically debating if I can do a "rsync" like backup, or if I should do a compressed
tar backup.  And I obviously want multiple points in time.  S3 does allow file versioning,
if a file or file name is changed/resused over time (only matters in the rsync case).  My
only concerns with compressed tars is I'll have to have free space to create the archive and
I get no "delta" space savings on the backup (the former is solved by not allowing the disk
space to get so low and/or adding more nodes to bring down the space, the latter is solved
by S3 being really cheap anyways).
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com


Mime
View raw message