cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: best way to backup
Date Fri, 29 Apr 2011 21:49:05 GMT
William, 
	Some info on the sstables from me http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

	If you want to know more check out the BigTable and original Facebook papers, linked from
the wiki

Aaron

On 29 Apr 2011, at 23:43, William Oberman wrote:

> Dumb question, but referenced twice now: which files are the SSTables and why is backing
them up incrementally a win?
> 
> Or should I not bother to understand internals, and instead just roll with the "backup
my keyspace(s) and system in a compressed tar" strategy, as while it may be excessive, it's
guaranteed to work and work easily (which I like, a great deal).
> 
> will
> 
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <daniel.doubleday@gmx.net> wrote:
> What we are about to set up is a time machine like backup. This is more like an add on
to the s3 backup.
> 
> Our boxes have an additional larger drive for local backup. We create a new backup snaphot
every x hours which hardlinks the files in the previous snapshot (bit like cassandras incremental_backups
thing) and than we sync that snapshot dir with the cassandra data dir. We can do archiving
/ backup to external system from there without impacting the main data raid.
> 
> But the main reason to do this is to have an 'omg we screwed up big time and deleted
/ corrupted data' recovery.
> 
> On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
> 
>> Even with N-nodes for redundancy, I still want to have backups.  I'm an amazon person,
so naturally I'm thinking S3.  Reading over the docs, and messing with nodeutil, it looks
like each new snapshot contains the previous snapshot as a subset (and I've read how cassandra
uses hard links to avoid excessive disk use).  When does that pattern break down?  
>> 
>> I'm basically debating if I can do a "rsync" like backup, or if I should do a compressed
tar backup.  And I obviously want multiple points in time.  S3 does allow file versioning,
if a file or file name is changed/resused over time (only matters in the rsync case).  My
only concerns with compressed tars is I'll have to have free space to create the archive and
I get no "delta" space savings on the backup (the former is solved by not allowing the disk
space to get so low and/or adding more nodes to bring down the space, the latter is solved
by S3 being really cheap anyways).
>> 
>> -- 
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) oberman@civicscience.com
> 
> 
> 
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com


Mime
View raw message