Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9CDC88099 for ; Tue, 13 Sep 2011 22:57:50 +0000 (UTC) Received: (qmail 63461 invoked by uid 500); 13 Sep 2011 22:57:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63221 invoked by uid 500); 13 Sep 2011 22:57:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 63198 invoked by uid 500); 13 Sep 2011 22:57:45 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 63188 invoked by uid 99); 13 Sep 2011 22:57:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 22:57:45 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.41] (HELO mail-vw0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 22:57:37 +0000 Received: by vwm42 with SMTP id 42so1753424vwm.0 for ; Tue, 13 Sep 2011 15:57:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.35.164 with SMTP id i4mr27816vdj.364.1315954636411; Tue, 13 Sep 2011 15:57:16 -0700 (PDT) Sender: scode@scode.org Received: by 10.52.164.193 with HTTP; Tue, 13 Sep 2011 15:57:16 -0700 (PDT) X-Originating-IP: [94.234.170.43] In-Reply-To: References: Date: Wed, 14 Sep 2011 00:57:16 +0200 X-Google-Sender-Auth: 7npKDbyO-mVnCKCSLXwm6cKR1cs Message-ID: Subject: Re: what's the difference between repair CF separately and repair the entire node? From: Peter Schuller To: user@cassandra.apache.org Cc: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > I think it is a serious problem since I can not "repair"..... =C2=A0I am > using cassandra on production servers. is there some way to fix it > without upgrade? =C2=A0I heard of that 0.8.x is still not quite ready in > production environment. It is a serious issue if you really need to repair one CF at the time. However, looking at your original post it seems this is not necessarily your issue. Do you need to, or was your concern rather the overall time repair took? There are other things that are improved in 0.8 that affect 0.7. In particular, (1) in 0.7 compaction, including validating compactions that are part of repair, is non-concurrent so if your repair starts while there is a long-running compaction going it will have to wait, and (2) semi-related is that the merkle tree calculation that is part of repair/anti-entropy may happen "out of synch" if one of the nodes participating happen to be busy with compaction. This in turns causes additional data to be sent as part of repair. That might be why your immediately following repair took a long time, but it's difficult to tell. If you're having issues with repair and large data sets, I would generally say that upgrading to 0.8 is recommended. However, if you're on 0.7.4, beware of https://issues.apache.org/jira/browse/CASSANDRA-3166 --=20 / Peter Schuller (@scode on twitter)