Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 82211 invoked from network); 9 Nov 2010 17:04:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 17:04:25 -0000 Received: (qmail 67253 invoked by uid 500); 9 Nov 2010 17:04:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 66857 invoked by uid 500); 9 Nov 2010 17:04:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 66841 invoked by uid 99); 9 Nov 2010 17:04:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 17:04:50 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 17:04:42 +0000 Received: by fxm3 with SMTP id 3so1115406fxm.31 for ; Tue, 09 Nov 2010 09:04:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=YEm5/6kdUPunm3Q5h7CmLU0rbFLdJ0Nfe+qQtMNRDwo=; b=P14l696cyoQfSHUV62NXCJet69fFcXJ7uezkIiN1IMSjK6QdioY/5YFw/C3lMvYth3 EPfS51vueMDpF6dCLa6tUSPXAuUU2af/gIAhg6t0pK7i+jLeT25snLezD8q+EMqunzwV qtpyoAxMjohClgAnBmVcz0YiFO5y1YMpfX4IY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=rwJapN/za99T55PML5UilHBBFkXx1ZCAisAXc1dKADBo5SX1GzBSqWOqnWHdidUJQ9 rN6dpzyqsSNFdTOKE7nOMAt/5aBkv3deAzFmTJOtiYINhg0yiXQaqJrsiTCzq7JoP1YP bM9aEfqU1gWYLKfUwzBUPJJ5TJizy4zOLOtgw= MIME-Version: 1.0 Received: by 10.223.74.198 with SMTP id v6mr5310668faj.4.1289322262600; Tue, 09 Nov 2010 09:04:22 -0800 (PST) Received: by 10.223.103.199 with HTTP; Tue, 9 Nov 2010 09:04:22 -0800 (PST) In-Reply-To: References: Date: Tue, 9 Nov 2010 12:04:22 -0500 Message-ID: Subject: Re: Backup Strategy From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Nov 9, 2010 at 8:15 AM, Wayne wrote: > I got some very good advice on manual compaction so I thought I would throw > out another question on raid/backup strategies for production clusters. > > We are debating going with raid 0 vs. raid 10 on our nodes for data storage. > Currently all storage we use is raid 10 as drives always fail and raid 10 > basically makes a drive failure a non event. With Cassandra and a > replication factor of 3 we start thinking that maybe raid 0 is good enough. > Also since we are buying a lot more inexpensive servers raid 0 just seems to > hit that price point a lot more. > > The problem now becomes how do we deal with the drives that WILL fail in a > raid 0 node? We are trying to use snapshots etc. to back up the data but it > is slow (hours) and slows down the entire node. We assume this will work if > we backup every 2 days at the least in that hinted handoff/reads could help > bring the data back into sync. If we can not backup every 1-2 days then we > are stuck with nodetool repair, decommission, etc. and using some of > Cassandra's build in capabilities but here things become more out of our > control and we are "afraid" to trust it. Like many in recent posts we have > been less than successful in testing this out in the .6.x branch. > > Can anyone share their decisions for the same and how they managed to deal > with these issues? Coming from the relational world raid 10 has been an > "assumption" for years, and we are not sure whether this assumption should > be dropped or held on to. Our nodes in dev are currently around 500Gb so for > us the question is how can we restore a node with this amount of data and > how long will it take? Drives can and will fail, how can we make recovery a > non event? What is our total recovery time window? We want it to be in hours > after drive replacement (which will be in minutes). > > Thanks. > > Wayne > Wayne, We were more worried about a DR scenario. Since SSTables are write once they make good candidates for incremental and/or differential backups. One option is do run cassandra snapshots and do incremental backups on that directory. We are doing something somewhat cool that I wanted to share. I hacked together an application that is something like cassandra/hadoop/rsync. Essentially take the SSTables from each node that are not in hadoop and copy them there. Write an index file of what SSTables lived on that node at time of snapshot. This gives us a couple of days retention as well. Snapshots X times daily and off cluster once a day. Makes me feel safer about our RAID-0 I have seen you mention in two threads that you are looking to do 500GB/node. You have brought up the point yourself "How long will it take to recover a 500 GB Node?" Good question. Neighbour nodes need to anti-compact and stream data to the new node. (This is being optimized in 7.0 but still involves some heavy lifting). You may want to look at more nodes with less storage per node if you are worried about how long recovering a RAID-0 node will take. These things can take time (depending on hardware and load) and pretty much need to restart from 0 if they do not complete.