incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Goffinet <goffi...@digg.com>
Subject Re: Digg's data model
Date Sat, 20 Mar 2010 15:40:40 GMT
> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you recommend the backup
scenario's could be?

Worst case scenario (total failure) we opted to do global snapshots every 24 hours. This creates
hard links to SSTables on each node. We copy those SSTables to HDFS on daily basis. We also
wrote a patch to log all events going into the commit log to be written to Scribe so we can
have a rolling commit log into HDFS. So in the event that entire cluster corrupts, we can
take the last 24 hours snapshot + the commit log right after last snapshot and get the cluster
into the last known good state.

-Chris
Mime
View raw message