cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <>
Subject Re: best way to backup
Date Fri, 29 Apr 2011 11:43:03 GMT
Dumb question, but referenced twice now: which files are the SSTables and
why is backing them up incrementally a win?

Or should I not bother to understand internals, and instead just roll with
the "backup my keyspace(s) and system in a compressed tar" strategy, as
while it may be excessive, it's guaranteed to work and work easily (which I
like, a great deal).


On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday

> What we are about to set up is a time machine like backup. This is more
> like an add on to the s3 backup.
> Our boxes have an additional larger drive for local backup. We create a new
> backup snaphot every x hours which hardlinks the files in the previous
> snapshot (bit like cassandras incremental_backups thing) and than we sync
> that snapshot dir with the cassandra data dir. We can do archiving / backup
> to external system from there without impacting the main data raid.
> But the main reason to do this is to have an 'omg we screwed up big time
> and deleted / corrupted data' recovery.
> On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
> Even with N-nodes for redundancy, I still want to have backups.  I'm an
> amazon person, so naturally I'm thinking S3.  Reading over the docs, and
> messing with nodeutil, it looks like each new snapshot contains the previous
> snapshot as a subset (and I've read how cassandra uses hard links to avoid
> excessive disk use).  When does that pattern break down?
> I'm basically debating if I can do a "rsync" like backup, or if I should do
> a compressed tar backup.  And I obviously want multiple points in time.  S3
> does allow file versioning, if a file or file name is changed/resused over
> time (only matters in the rsync case).  My only concerns with compressed
> tars is I'll have to have free space to create the archive and I get no
> "delta" space savings on the backup (the former is solved by not allowing
> the disk space to get so low and/or adding more nodes to bring down the
> space, the latter is solved by S3 being really cheap anyways).
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E)

Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835

View raw message