Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: 
 <CAOA66tHpH66i3pUjTxvvr1gLKc4u+j=dmnREany6dK0gdqj1Bw@mail.gmail.com>
References: 
 <CAOA66tG+suc2J20czNMt50z-EHAS-u4Bj4aD1HsjjaRT7Koyag@mail.gmail.com>
	<CAKkz8Q2zgQCt0nTo4Bk337LcVvKK+w_Ka9FH8a_oiVKSPcNOog@mail.gmail.com>
	<CAOA66tEn76Uc8Sce5oPo+tv2rsNkpaM18UV_K+VJ5vDvVO5_xA@mail.gmail.com>
	<CAO5xsd0XpMhn7JjPq53kG3wgDo1ifcrvNFQ4qHasKQWSasNvGw@mail.gmail.com>
	<CAOA66tHpH66i3pUjTxvvr1gLKc4u+j=dmnREany6dK0gdqj1Bw@mail.gmail.com>
Date: Wed, 14 Sep 2011 00:57:16 +0200
Message-ID: 
 <CAO5xsd3f29SUbMxrzNGz1os11SqQo-bpymAij2kVuhDL-cSSjw@mail.gmail.com>
Subject: Re: what's the difference between repair CF separately and repair the
 entire node?
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Cc: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> I think it is a serious problem since I can not "repair"..... =C2=A0I am
> using cassandra on production servers. is there some way to fix it
> without upgrade? =C2=A0I heard of that 0.8.x is still not quite ready in
> production environment.

It is a serious issue if you really need to repair one CF at the time.
However, looking at your original post it seems this is not
necessarily your issue. Do you need to, or was your concern rather the
overall time repair took?

There are other things that are improved in 0.8 that affect 0.7. In
particular, (1) in 0.7 compaction, including validating compactions
that are part of repair, is non-concurrent so if your repair starts
while there is a long-running compaction going it will have to wait,
and (2) semi-related is that the merkle tree calculation that is part
of repair/anti-entropy may happen "out of synch" if one of the nodes
participating happen to be busy with compaction. This in turns causes
additional data to be sent as part of repair.

That might be why your immediately following repair took a long time,
but it's difficult to tell.

If you're having issues with repair and large data sets, I would
generally say that upgrading to 0.8 is recommended. However, if you're
on 0.7.4, beware of
https://issues.apache.org/jira/browse/CASSANDRA-3166

--=20
/ Peter Schuller (@scode on twitter)