Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.161.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=rwJapN/za99T55PML5UilHBBFkXx1ZCAisAXc1dKADBo5SX1GzBSqWOqnWHdidUJQ9
         rN6dpzyqsSNFdTOKE7nOMAt/5aBkv3deAzFmTJOtiYINhg0yiXQaqJrsiTCzq7JoP1YP
         bM9aEfqU1gWYLKfUwzBUPJJ5TJizy4zOLOtgw=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=-Cu2KuYUK=sdSfMML8trPwZnKT_x7H_UHiyz+@mail.gmail.com>
References: <AANLkTi=-Cu2KuYUK=sdSfMML8trPwZnKT_x7H_UHiyz+@mail.gmail.com>
Date: Tue, 9 Nov 2010 12:04:22 -0500
Message-ID: <AANLkTinD_odQXEFi-AfB7zVXcHAYAbcZhsMBfuBrh4S-@mail.gmail.com>
Subject: Re: Backup Strategy
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Nov 9, 2010 at 8:15 AM, Wayne <wav100@gmail.com> wrote:
> I got some very good advice on manual compaction so I thought I would throw
> out another question on raid/backup strategies for production clusters.
>
> We are debating going with raid 0 vs. raid 10 on our nodes for data storage.
> Currently all storage we use is raid 10 as drives always fail and raid 10
> basically makes a drive failure a non event. With Cassandra and a
> replication factor of 3 we start thinking that maybe raid 0 is good enough.
> Also since we are buying a lot more inexpensive servers raid 0 just seems to
> hit that price point a lot more.
>
> The problem now becomes how do we deal with the drives that WILL fail in a
> raid 0 node? We are trying to use snapshots etc. to back up the data but it
> is slow (hours) and slows down the entire node. We assume this will work if
> we backup every 2 days at the least in that hinted handoff/reads could help
> bring the data back into sync. If we can not backup every 1-2 days then we
> are stuck with nodetool repair, decommission, etc. and using some of
> Cassandra's build in capabilities but here things become more out of our
> control and we are "afraid" to trust it. Like many in recent posts we have
> been less than successful in testing this out in the .6.x branch.
>
> Can anyone share their decisions for the same and how they managed to deal
> with these issues? Coming from the relational world raid 10 has been an
> "assumption" for years, and we are not sure whether this assumption should
> be dropped or held on to. Our nodes in dev are currently around 500Gb so for
> us the question is how can we restore a node with this amount of data and
> how long will it take? Drives can and will fail, how can we make recovery a
> non event? What is our total recovery time window? We want it to be in hours
> after drive replacement (which will be in minutes).
>
> Thanks.
>
> Wayne
>

Wayne,

We were more worried about a DR scenario.

Since SSTables are write once they make good candidates for
incremental and/or differential backups. One option is do run
cassandra snapshots and do incremental backups on that directory.

We are doing something somewhat cool that I wanted to share. I hacked
together an application that is something like cassandra/hadoop/rsync.
Essentially take the SSTables from each node that are not in hadoop
and copy them there. Write an index file of what SSTables lived on
that node at time of snapshot. This gives us a couple of days
retention as well.

Snapshots X times daily and off cluster once a day. Makes me feel
safer about our RAID-0

I have seen you mention in two threads that you are looking to do
500GB/node. You have brought up the point yourself "How long will it
take to recover a 500 GB Node?" Good question. Neighbour nodes need to
anti-compact and stream data to the new node. (This is being optimized
in 7.0 but still involves some heavy lifting). You may want to look at
more nodes with less storage per node if you are worried about how
long recovering a RAID-0 node will take. These things can take time
(depending on hardware and load) and pretty much need to restart from
0 if they do not complete.