incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ned Wolpert <ned.wolp...@imemories.com>
Subject Re: Digg's data model
Date Tue, 23 Mar 2010 19:14:53 GMT
I'm curious why you are storing the backups (sstables and commit logs) to
HDFS instead of something like lustre. Are your backups using Hadoop's
map/reduce somehow? Or is it for convenience?

On Sat, Mar 20, 2010 at 8:40 AM, Chris Goffinet <goffinet@digg.com> wrote:

> > 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you
> recommend the backup scenario's could be?
>
> Worst case scenario (total failure) we opted to do global snapshots every
> 24 hours. This creates hard links to SSTables on each node. We copy those
> SSTables to HDFS on daily basis. We also wrote a patch to log all events
> going into the commit log to be written to Scribe so we can have a rolling
> commit log into HDFS. So in the event that entire cluster corrupts, we can
> take the last 24 hours snapshot + the commit log right after last snapshot
> and get the cluster into the last known good state.
>
> -Chris




-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe

Mime
View raw message