Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC74D4CB9 for ; Mon, 23 May 2011 17:48:57 +0000 (UTC) Received: (qmail 66057 invoked by uid 500); 23 May 2011 17:48:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 66032 invoked by uid 500); 23 May 2011 17:48:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 66024 invoked by uid 99); 23 May 2011 17:48:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2011 17:48:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.216.179 as permitted sender) Received: from [209.85.216.179] (HELO mail-qy0-f179.google.com) (209.85.216.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2011 17:48:48 +0000 Received: by qyk7 with SMTP id 7so3977422qyk.10 for ; Mon, 23 May 2011 10:48:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.222.203 with SMTP id ih11mr1989635qcb.250.1306172906895; Mon, 23 May 2011 10:48:26 -0700 (PDT) Received: by 10.229.249.69 with HTTP; Mon, 23 May 2011 10:48:26 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: <19409DD6-7DAC-446D-9781-AAB7F2FA822F@gmx.net> References: <19409DD6-7DAC-446D-9781-AAB7F2FA822F@gmx.net> Date: Mon, 23 May 2011 19:48:26 +0200 Message-ID: Subject: Re: repair question From: Sylvain Lebresne To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Mon, May 23, 2011 at 7:17 PM, Daniel Doubleday wrote: > Hi all > > I'm a bit lost: I tried a repair yesterday with only one CF and that didn't really work the way I expected but I thought that would be a bug which only affects that special case. > > So I tried again for all CFs. > > I started with a nicely compacted machine with around 320GB of load. Total disc space on this node was 1.1TB. > > After it went out of disc space (meaning I received around 700GB of data) I had a very brief look at the repair code again and it seems to me that the repairing node will get all data for its range from all its neighbors. The repaired node is supposed to get only data from it's neighbors for rows it is not in sync with. That is all supposed to depend on how much the node is out of sync compared to the other nodes. Now there is a number of things that could make it repair more that what you would hope. For instance: 1) even if one column is different for a row, the full row is repaired. If you have a small number of huge rows, that can amount for quite some data useless transfered. 2) The other one is that the merkle tree (that allows to say whether 2 rows are in sync) doesn't necessarily have one hash by row, so in theory one column not in sync may imply the repair of more than one row. 3) https://issues.apache.org/jira/browse/CASSANDRA-2324 (which is fixed in 0.8) Fortunately, the chance to get hit by 1) is proportionally inverse to the change of getting hit by 2) and vice versa. Anyway, the kind of excess data your seeing is not something I would expect unless the node is really completely out of sync with all the other nodes. So in the light of this, do you have more info on your own case ? (do you lots of small row, few of large ones ? Did you expected the node to be widely out of sync with the other nodes ? Etc..) -- Sylvain > > Is that true and if so is it the intended behavior? If so one would rather need 5-6 times of disc space given that compactions that need to run after the sstable rebuild also need temp disc space. > > Cheers, > Daniel