cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Goffinet <>
Subject Re: Digg's data model
Date Sat, 20 Mar 2010 16:18:20 GMT
On Mar 20, 2010, at 9:10 AM, Jeremy Dunck wrote:

> On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet <> wrote:
>>> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you recommend
the backup scenario's could be?
>> Worst case scenario (total failure) we opted to do global snapshots every 24 hours.
This creates hard links to SSTables on each node. We copy those SSTables to HDFS on daily
basis. We also wrote a patch to log all events going into the commit log to be written to
Scribe so we can have a rolling commit log into HDFS. So in the event that entire cluster
corrupts, we can take the last 24 hours snapshot + the commit log right after last snapshot
and get the cluster into the last known good state.
> Doesn't this leave you open to corruption you don't discover within 24 hours?

No. We aren't storing the actual commit log structure, we have our own.


View raw message