cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <>
Subject Re: cassandra backup
Date Fri, 06 Dec 2013 16:05:48 GMT
I believe SSTables are written to a temporary file then moved.  If I
remember correctly, tools like tablesnap listen for the inotify event
IN_MOVED_TO.  This should handle the "try to back up sstable while in
mid-write" issue.

On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux <> wrote:

> Hi Marcelo,
> Cassandra provides and eventually consistent model for backups.  You can
> do staggered backups of data, with the idea that if you restore a node, and
> then do a repair, your data will be once again consistent.  Cassandra will
> not automatically copy the data to other nodes (other than via hinted
> handoff).  You should manually run repair after restoring a node.
> You should take snapshots when doing a backup, as it keeps the data you
> are backing up relevant to a single point in time, otherwise compaction
> could add/delete files one you mid-backup, or worse, I imagine attempt to
> access a SSTable mid-write.  Snapshots work by using links, and don't take
> additional storage to perform.  In our process we create the snapshot,
> perform the backup, and then clear the snapshot.
> One thing to keep in mind in your S3 cost analysis is that, even though
> storage is cheap, reads/writes to S3 are not (especially writes).  If you
> are using LeveledCompaction, or otherwise have a ton of SSTables, some
> people have encountered increased costs moving the data to S3.
> Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync
> data too.  Thus far this has worked very well for us.
> -Mike
>   On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle <
>> wrote:
>  Hello everyone,
>     I am trying to create backups of my data on AWS. My goal is to store
> the backups on S3 or glacier, as it's cheap to store this kind of data. So,
> if I have a cluster with N nodes, I would like to copy data from all N
> nodes to S3 and be able to restore later. I know Priam does that (we were
> using it), but I am using the latest cassandra version and we plan to use
> DSE some time, I am not sure Priam fits this case.
>     I took a look at the docs:
>     And I am trying to understand if it's really needed to take a snapshot
> to create my backup. Suppose I do a flush and copy the sstables from each
> node, 1 by one, to s3. Not all at the same time, but one by one.
>     When I try to restore my backup, data from node 1 will be older than
> data from node 2. Will this cause problems? AFAIK, if I am using a
> replication factor of 2, for instance, and Cassandra sees data from node X
> only, it will automatically copy it to other nodes, right? Is there any
> chance of cassandra nodes become corrupt somehow if I do my backups this
> way?
> Best regards,
> Marcelo Valle.

Jon Haddad
skype: rustyrazorblade

View raw message