cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rene Kochen <>
Subject Re: Backup solution
Date Mon, 18 Mar 2013 14:33:11 GMT
Hi Aaron,

Thank you for your answer!

My idea was to do the snapshots in the backup DC only. That way the backup
procedure will not affect the live DC. However I'm afraid that a
point-in-time recovery via the snapshots in the second DC (first restore
backup on backup DC and then repair live DC) will take too long. I expect
the data to grow significantly.

It makes more sense to use the second cluster as a hot standby (and make
snapshots on both clusters).


2013/3/16 Aaron Turner <>

> On Fri, Mar 15, 2013 at 10:35 AM, Rene Kochen
> <> wrote:
> > Hi Aaron,
> >
> > We have many deployments, but typically:
> >
> > - Live cluster of six nodes, replication factor = 3.
> > - A node processes more reads than writes (approximately 100 get_slices
> > per/second, narrow rows).
> > - Data per node is about 50 to 100 GBytes.
> > - We should recover within 4 hours.
> >
> > The idea is to put the backup cluster close to the live cluster with a
> > gigabit connection only for Cassandra.
> 100 reads/sec/node doesn't sound like a lot to me... And 100G/node is
> far below the recommended limit.  Sounds to me  you've possibly over
> spec'd your cluster (not a bad thing, just an observation).  Of
> course, if your data set is growing, then...
> That said, I wouldn't consider a single node in a 2nd DC receiving
> updates via Cassandra a "backup".  That's because a bug in cassandra
> which corrupts your data or a user accidentally doing the wrong thing
> (like issuing deletes they shouldn't) means that get's replicated to
> all your nodes- including the one in the other DC.
> A real backup would be to take snapshots on the nodes and then copy
> them off the cluster.
> I'd say replication is good if you want a hot-standby for a disaster
> recovery site so you can quickly recover from a hardware fault.
> Especially if you have a 4hr SLA, how are you going to get your
> primary DC back up after a fire, earthquake, etc in 4 hours?  Heck, a
> switch failure might knock you out for 4 hours depending on how
> quickly you can swap another one in and how recent your config backups
> are.
> Better to have a DR site with a smaller set of nodes with the data
> ready to go.  Maybe they won't be as fast, but hopefully you can make
> sure the most important queries are handled.  But for that, I would
> probably go with something more then just a single node in the DR DC.
> One thing to remember is that compactions will impact the feasible
> single node size to something smaller then you can potentially
> allocate disk space for.   Ie: just because you can build a 4TB disk
> array, doesn't mean you can have a single Cassandra node with 4TB of
> data.  Typically, people around here seem to recommend ~400GB, but
> that depends on hardware.
> Honestly, for the price of a single computer you could test this
> pretty easy.  That's what I'd do.
> --
> Aaron Turner
>         Twitter: @synfinatic
> - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"

View raw message