incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <>
Subject Re: best way to backup
Date Sat, 30 Apr 2011 14:44:23 GMT
Thanks, I think I'm getting some of the file layout/data structures now, so
that helps with the backup strategy.  I might still start simple, as it's
usually harder to screw up simple, but at least I'll know where I can go
with something more clever.


On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan <> wrote:

>  The files inside the keyspace folders are the SSTable.
>  ------------------------------
> *From:* aaron morton []
> *Sent:* Friday, April 29, 2011 4:49 PM
> *To:*
> *Subject:* Re: best way to backup
> William,
> Some info on the sstables from me
>  <>If you
> want to know more check out the BigTable and original Facebook papers,
> linked from the wiki
>  <>Aaron
>  On 29 Apr 2011, at 23:43, William Oberman wrote:
> Dumb question, but referenced twice now: which files are the SSTables and
> why is backing them up incrementally a win?
> Or should I not bother to understand internals, and instead just roll with
> the "backup my keyspace(s) and system in a compressed tar" strategy, as
> while it may be excessive, it's guaranteed to work and work easily (which I
> like, a great deal).
> will
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <
>> wrote:
>> What we are about to set up is a time machine like backup. This is more
>> like an add on to the s3 backup.
>> Our boxes have an additional larger drive for local backup. We create a
>> new backup snaphot every x hours which hardlinks the files in the previous
>> snapshot (bit like cassandras incremental_backups thing) and than we sync
>> that snapshot dir with the cassandra data dir. We can do archiving / backup
>> to external system from there without impacting the main data raid.
>> But the main reason to do this is to have an 'omg we screwed up big time
>> and deleted / corrupted data' recovery.
>>  On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
>>   Even with N-nodes for redundancy, I still want to have backups.  I'm an
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and
>> messing with nodeutil, it looks like each new snapshot contains the previous
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid
>> excessive disk use).  When does that pattern break down?
>> I'm basically debating if I can do a "rsync" like backup, or if I should
>> do a compressed tar backup.  And I obviously want multiple points in time.
>> S3 does allow file versioning, if a file or file name is changed/resused
>> over time (only matters in the rsync case).  My only concerns with
>> compressed tars is I'll have to have free space to create the archive and I
>> get no "delta" space savings on the backup (the former is solved by not
>> allowing the disk space to get so low and/or adding more nodes to bring down
>> the space, the latter is solved by S3 being really cheap anyways).
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E)
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E)

Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835

View raw message