Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of raj.cassandra@gmail.com
 designates 209.85.220.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <A12E2B34-2E15-43A5-9509-CCF5AF48E839@gmail.com>
References: 
 <CAF+j8akXYAvYcUMLbPTxmXOim9L-tsJne6YbZhcv6_w3wVZE0Q@mail.gmail.com>
	<A12E2B34-2E15-43A5-9509-CCF5AF48E839@gmail.com>
Date: Sun, 29 Apr 2012 06:45:55 -0400
Message-ID: 
 <CAF+j8anpnQQ5PDVUQWqKHYzzrqjCOTSqvMvrS-FLqmFVN2u+fw@mail.gmail.com>
Subject: Re: nodetool repair cassandra 0.8.4 HELP!!!
From: Raj N <raj.cassandra@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0023544477cdcfb67f04becf0a25

--0023544477cdcfb67f04becf0a25
Content-Type: text/plain; charset=ISO-8859-1

I tried it on 1 column family. I believe there is a bug in 0.8* where
repair ignores the cf. I tried this multiple times on different nodes.
Every time the disk util was going uo to 80% on a 500 GB disk. I would
eventually kill the repair. I only have 60GB worth data. I see this JIRA -

https://issues.apache.org/jira/browse/CASSANDRA-2324

But that says it was fixed in 0.8 beta. Is this still broken in 0.8.4?

I also don't understand why the data was inconsistent in the first place. I
read and write at LOCAL_QUORUM.

Thanks
-Raj

On Sun, Apr 29, 2012 at 2:06 AM, Watanabe Maki <watanabe.maki@gmail.com>wrote:

> You should run repair. If the disk space is the problem, try to cleanup
> and major compact before repair.
> You can limit the streaming data by running repair for each column family
> separately.
>
> maki
>
> On 2012/04/28, at 23:47, Raj N <raj.cassandra@gmail.com> wrote:
>
> > I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each
> node. I was bulk loading data over the weekend. But we forgot to turn off
> the weekly nodetool repair job. As a result, repair was interfering when we
> were bulk loading data. I canceled repair by restarting the nodes. But
> unfortunately after the restart it looks like I dont have any data on those
> nodes when I use list on cassandra-cli. I ran repair on one of the effected
> nodes, but repair seems to be taking forever. Disk space has almost
> tripled. I stopped the repair again in fear of running out of disk space.
> After restart, the disk space is at 50% where as the good nodes are at 25%.
> How should I proceed from here.  When I run list on cassandra-cli I do see
> data on the effected node. But how can I be sure I have all the data.
> Should I run repair again. Should I cleanup the disk by clearing snapshots.
> Or should I just drop column families and bulk load the data again?
> >
> > Thanks
> > -Raj
>

--0023544477cdcfb67f04becf0a25
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I tried it on 1 column family. I believe there is a bug in 0.8* where repai=
r ignores the cf. I tried this multiple times on different nodes. Every tim=
e the disk util was going uo to 80% on a 500 GB disk. I would eventually ki=
ll the repair. I only have 60GB worth data. I see this JIRA -<div>
<br></div><div><a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-2=
324">https://issues.apache.org/jira/browse/CASSANDRA-2324</a>=A0</div><div>=
<br></div><div>But that says it was fixed in 0.8 beta. Is this still broken=
 in 0.8.4?</div>
<div><br></div><div>I also don&#39;t understand why the data was inconsiste=
nt in the first place. I read and write at LOCAL_QUORUM.=A0</div><div><br><=
/div><div>Thanks</div><div>-Raj<br><br><div class=3D"gmail_quote">On Sun, A=
pr 29, 2012 at 2:06 AM, Watanabe Maki <span dir=3D"ltr">&lt;<a href=3D"mail=
to:watanabe.maki@gmail.com" target=3D"_blank">watanabe.maki@gmail.com</a>&g=
t;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">You should run repair. If the disk space is =
the problem, try to cleanup and major compact before repair.<br>
You can limit the streaming data by running repair for each column family s=
eparately.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
maki<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
On 2012/04/28, at 23:47, Raj N &lt;<a href=3D"mailto:raj.cassandra@gmail.co=
m">raj.cassandra@gmail.com</a>&gt; wrote:<br>
<br>
&gt; I have a 6 node cassandra cluster DC1=3D3, DC2=3D3 with 60 GB data on =
each node. I was bulk loading data over the weekend. But we forgot to turn =
off the weekly nodetool repair job. As a result, repair was interfering whe=
n we were bulk loading data. I canceled repair by restarting the nodes. But=
 unfortunately after the restart it looks like I dont have any data on thos=
e nodes when I use list on cassandra-cli. I ran repair on one of the effect=
ed nodes, but repair seems to be taking forever. Disk space has almost trip=
led. I stopped the repair again in fear of running out of disk space. After=
 restart, the disk space is at 50% where as the good nodes are at 25%. How =
should I proceed from here. =A0When I run list on cassandra-cli I do see da=
ta on the effected node. But how can I be sure I have all the data. Should =
I run repair again. Should I cleanup the disk by clearing snapshots. Or sho=
uld I just drop column families and bulk load the data again?<br>

&gt;<br>
&gt; Thanks<br>
&gt; -Raj<br>
</div></div></blockquote></div><br></div>

--0023544477cdcfb67f04becf0a25--