incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: best way to backup
Date Fri, 29 Apr 2011 15:32:58 GMT
Good point - we plan to do regular testing to restore the cluster.  Also we might spin up a
snapshot of the cluster for testing as well.

Also I wonder how much time compression will save when it comes to restores.  I'll have to
run some tests on that.  Thanks for posting.


On Apr 28, 2011, at 4:15 PM, Adrian Cockcroft wrote:

> Netflix has also gone down this path, we run a regular full backup to
> S3 of a compressed tar, and we have scripts that restore everything
> into the right place on a different cluster (it needs the same node
> count). We also pick up the SSTables as they are created, and drop
> them in S3.
> Whatever you do, make sure you have a regular process to restore the
> data and verify that it contains what you think it should...
> Adrian
> On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna
> <> wrote:
>> one thing we're looking at doing is watching the cassandra data directory and backing
up the sstables to s3 when they are created.  Some guys at simplegeo started tablesnap that
does this:
>> What it does is for every sstable that is pushed to s3, it also copies a json file
with the current files in the directory, so you can know what to restore in that event (as
far as I understand).
>> On Apr 28, 2011, at 2:53 PM, William Oberman wrote:
>>> Even with N-nodes for redundancy, I still want to have backups.  I'm an amazon
person, so naturally I'm thinking S3.  Reading over the docs, and messing with nodeutil, it
looks like each new snapshot contains the previous snapshot as a subset (and I've read how
cassandra uses hard links to avoid excessive disk use).  When does that pattern break down?
>>> I'm basically debating if I can do a "rsync" like backup, or if I should do a
compressed tar backup.  And I obviously want multiple points in time.  S3 does allow file
versioning, if a file or file name is changed/resused over time (only matters in the rsync
case).  My only concerns with compressed tars is I'll have to have free space to create the
archive and I get no "delta" space savings on the backup (the former is solved by not allowing
the disk space to get so low and/or adding more nodes to bring down the space, the latter
is solved by S3 being really cheap anyways).
>>> --
>>> Will Oberman
>>> Civic Science, Inc.
>>> 3030 Penn Avenue., First Floor
>>> Pittsburgh, PA 15201
>>> (M) 412-480-7835
>>> (E)

View raw message