incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Menon <ra...@apigee.com>
Subject Re: cassandra backup
Date Fri, 06 Dec 2013 15:38:17 GMT
You should look at this - https://github.com/amorton/cassback i dont
believe its setup to use 1.2.10 and above but i believe is just small
tweeks to get it running.

Thanks
Rahul


On Fri, Dec 6, 2013 at 7:09 PM, Michael Theroux <mtheroux2@yahoo.com> wrote:

> Hi Marcelo,
>
> Cassandra provides and eventually consistent model for backups.  You can
> do staggered backups of data, with the idea that if you restore a node, and
> then do a repair, your data will be once again consistent.  Cassandra will
> not automatically copy the data to other nodes (other than via hinted
> handoff).  You should manually run repair after restoring a node.
>
> You should take snapshots when doing a backup, as it keeps the data you
> are backing up relevant to a single point in time, otherwise compaction
> could add/delete files one you mid-backup, or worse, I imagine attempt to
> access a SSTable mid-write.  Snapshots work by using links, and don't take
> additional storage to perform.  In our process we create the snapshot,
> perform the backup, and then clear the snapshot.
>
> One thing to keep in mind in your S3 cost analysis is that, even though
> storage is cheap, reads/writes to S3 are not (especially writes).  If you
> are using LeveledCompaction, or otherwise have a ton of SSTables, some
> people have encountered increased costs moving the data to S3.
>
> Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync
> data too.  Thus far this has worked very well for us.
>
> -Mike
>
>
>   On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>   Hello everyone,
>
>     I am trying to create backups of my data on AWS. My goal is to store
> the backups on S3 or glacier, as it's cheap to store this kind of data. So,
> if I have a cluster with N nodes, I would like to copy data from all N
> nodes to S3 and be able to restore later. I know Priam does that (we were
> using it), but I am using the latest cassandra version and we plan to use
> DSE some time, I am not sure Priam fits this case.
>     I took a look at the docs:
> http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html
>
>     And I am trying to understand if it's really needed to take a snapshot
> to create my backup. Suppose I do a flush and copy the sstables from each
> node, 1 by one, to s3. Not all at the same time, but one by one.
>     When I try to restore my backup, data from node 1 will be older than
> data from node 2. Will this cause problems? AFAIK, if I am using a
> replication factor of 2, for instance, and Cassandra sees data from node X
> only, it will automatically copy it to other nodes, right? Is there any
> chance of cassandra nodes become corrupt somehow if I do my backups this
> way?
>
> Best regards,
> Marcelo Valle.
>
>
>

Mime
View raw message