incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Dunck <>
Subject Re: Digg's data model
Date Sat, 20 Mar 2010 16:10:05 GMT
On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet <> wrote:
>> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you recommend the
backup scenario's could be?
> Worst case scenario (total failure) we opted to do global snapshots every 24 hours. This
creates hard links to SSTables on each node. We copy those SSTables to HDFS on daily basis.
We also wrote a patch to log all events going into the commit log to be written to Scribe
so we can have a rolling commit log into HDFS. So in the event that entire cluster corrupts,
we can take the last 24 hours snapshot + the commit log right after last snapshot and get
the cluster into the last known good state.

Doesn't this leave you open to corruption you don't discover within 24 hours?

View raw message