Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.210.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAO5xsd2ALo6Mq1CL5zxDRWpEyPELUi+ridLuTmCbpF2m7VOupw@mail.gmail.com>
References: 
 <CACCYQcyG2rmJJzxQ_5YYYp8itxvX1pjey-dtXn1MS35dqjL2Lw@mail.gmail.com>
	<4EDAAF7E.40502@bnl.gov>
	<CAO5xsd3c8jk3BzKj2tsJXziKjO0PtJp53AiHj==+ZvC796JH1w@mail.gmail.com>
	<4EDACE10.6010804@bnl.gov>
	<4EDBB7A9.9060604@bnl.gov>
	<CAO5xsd2YVVKCZB9eHDZY9zZj6wjyhcbYCPKFCy3Yrnu43Yz5gg@mail.gmail.com>
	<4EDBE255.4080905@bnl.gov>
	<CAO5xsd0ZBzQ=XnE933O6sCC8SJwFBSCYX_zgsMSsqbNJJNyfQg@mail.gmail.com>
	<4EDC0783.90509@bnl.gov>
	<CAO5xsd2ALo6Mq1CL5zxDRWpEyPELUi+ridLuTmCbpF2m7VOupw@mail.gmail.com>
Date: Sun, 4 Dec 2011 19:52:03 -0500
Message-ID: 
 <CAENxBwwPKS6hwgafJ3s40umgPogQpHvxJDEOR=cooUGkTC4tkA@mail.gmail.com>
Subject: Re: Repair failure under 0.8.6
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd4b5b223b7a304b34dba0e

--000e0cd4b5b223b7a304b34dba0e
Content-Type: text/plain; charset=ISO-8859-1

You can say the min compaction threshold to 2 and the max Compaction
Threshold to 3. If you have enough disk space for a few minor compaction
this should free up some disk space.

On Sun, Dec 4, 2011 at 7:17 PM, Peter Schuller
<peter.schuller@infidyne.com>wrote:

> > As a side effect of the failed repair (so it seems) the disk usage on the
> > affected node prevents compaction from working. It still works on
> > the remaining nodes (we have 3 total).
> > Is there a way to scrub the extraneous data?
>
> This is one of the reasons why killing an in-process repair is a bad thing
> :(
>
> If you do not have enough disk space for any kind of compaction to
> work, then no, unfortunately there is no easy way to get rid of the
> data.
>
> You can go to extra trouble such as moving the entire node to some
> other machine (e.g. firewalled from the cluster) with more disk and
> run compaction there and then "move it back", but that is kind of
> painful to do. Another option is to decommission the node and replace
> it. However, be aware that (1) that leaves the ring with less capacity
> for a while, and (2) when you decommission, the data you stream from
> that node to others would be artificially inflated due to the repair
> so there is some risk of "infecting" the other nodes with a large data
> set.
>
> I should mention that if you have no traffic running against the
> cluster, one way is to just remove all the data and then run repair
> afterwards. But that implies that you're trusting that (1) no reads
> are going to the cluster (else you might serve reads based on missing
> data) and (2) that you are comfortable with loss of the data on the
> node. (2) might be okay if you're e.g. writing at QUORUM at all times
> and have RF >= 3 (basically, this is as if the node would have been
> lost due to hardware breakage).
>
> A faster way to reconstruct the node would be to delete the data from
> your keyspaces (except the system keyspace), start the node (now
> missing data), and run 'nodetool rebuild' after
> https://issues.apache.org/jira/browse/CASSANDRA-3483 is done. The
> patch attached to that ticket should work for 0.8.6 I suspect (but no
> guarantees). This also assumes you have no reads running against the
> cluster.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

--000e0cd4b5b223b7a304b34dba0e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

You can say the min compaction threshold to 2 and the max Compaction Thresh=
old to 3. If you have enough disk space for a few minor compaction this sho=
uld free up some disk space.<br><br><div class=3D"gmail_quote">On Sun, Dec =
4, 2011 at 7:17 PM, Peter Schuller <span dir=3D"ltr">&lt;<a href=3D"mailto:=
peter.schuller@infidyne.com">peter.schuller@infidyne.com</a>&gt;</span> wro=
te:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">&gt; As a side ef=
fect of the failed repair (so it seems) the disk usage on the<br>
&gt; affected node prevents compaction from working. It still works on<br>
&gt; the remaining nodes (we have 3 total).<br>
&gt; Is there a way to scrub the extraneous data?<br>
<br>
This is one of the reasons why killing an in-process repair is a bad thing =
:(<br>
<br>
If you do not have enough disk space for any kind of compaction to<br>
work, then no, unfortunately there is no easy way to get rid of the<br>
data.<br>
<br>
You can go to extra trouble such as moving the entire node to some<br>
other machine (e.g. firewalled from the cluster) with more disk and<br>
run compaction there and then &quot;move it back&quot;, but that is kind of=
<br>
painful to do. Another option is to decommission the node and replace<br>
it. However, be aware that (1) that leaves the ring with less capacity<br>
for a while, and (2) when you decommission, the data you stream from<br>
that node to others would be artificially inflated due to the repair<br>
so there is some risk of &quot;infecting&quot; the other nodes with a large=
 data<br>
set.<br>
<br>
I should mention that if you have no traffic running against the<br>
cluster, one way is to just remove all the data and then run repair<br>
afterwards. But that implies that you&#39;re trusting that (1) no reads<br>
are going to the cluster (else you might serve reads based on missing<br>
data) and (2) that you are comfortable with loss of the data on the<br>
node. (2) might be okay if you&#39;re e.g. writing at QUORUM at all times<b=
r>
and have RF &gt;=3D 3 (basically, this is as if the node would have been<br=
>
lost due to hardware breakage).<br>
<br>
A faster way to reconstruct the node would be to delete the data from<br>
your keyspaces (except the system keyspace), start the node (now<br>
missing data), and run &#39;nodetool rebuild&#39; after<br>
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-3483" target=3D"=
_blank">https://issues.apache.org/jira/browse/CASSANDRA-3483</a> is done. T=
he<br>
patch attached to that ticket should work for 0.8.6 I suspect (but no<br>
guarantees). This also assumes you have no reads running against the<br>
cluster.<br>
<font color=3D"#888888"><br>
--<br>
/ Peter Schuller (@scode, <a href=3D"http://worldmodscode.wordpress.com" ta=
rget=3D"_blank">http://worldmodscode.wordpress.com</a>)<br>
</font></blockquote></div><br>

--000e0cd4b5b223b7a304b34dba0e--