Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of springrider@gmail.com
 designates 209.85.212.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <E36CB0F4-39F1-447E-8874-AFFAA1CFAB24@thelastpickle.com>
References: 
 <CAOA66tFRfeGZp7gOipgeaN3RNCuEt-MJVO9gXQtjEhNqYmdk_g@mail.gmail.com>
 <CAOA66tEMbNNX3O8YgvgyPDbJoneh=VfoWsKSGHnr19b9prDVHw@mail.gmail.com>
 <E36CB0F4-39F1-447E-8874-AFFAA1CFAB24@thelastpickle.com>
From: Yan Chunlu <springrider@gmail.com>
Date: Thu, 21 Jul 2011 10:11:51 +0800
Message-ID: 
 <CAOA66tHw2Jnt5v0a+OqjBnGFt=e7LLmMchqNh3+gtMc4FZErJw@mail.gmail.com>
Subject: Re: node repair eat up all disk io and slow down entire cluster(3
 nodes)
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec5014dd780b02d04a88ae094

--bcaec5014dd780b02d04a88ae094
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

thank you very much for the help, I will try to adjust minor compaction and
also dealing with single CF at a time.

On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton <aaron@thelastpickle.com>wrot=
e:

> If you have never run repair also check the section on repair on this pag=
e
> http://wiki.apache.org/cassandra/Operations About how frequently it shoul=
d
> be run.
>
> There is an issue where repair can stream too much data, and this can lea=
d
> to excessive disk use.
>
> My non scientific approach to the never run repair before problem is to
> repair a single CF at a time, starting with the small ones that are less
> likely to have differences as they will stream the smallest amount of dat=
a.
>
> If you really want to conserve disk IO during the repair consider disabli=
ng
> the minor compaction by setting the min and max thresholds to 0 via node
> tool.
>
> hope that helps.
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/07/2011, at 11:46 PM, Yan Chunlu <springrider@gmail.com> wrote:
>
> just found this:
> <https://issues.apache.org/jira/browse/CASSANDRA-2156>
> https://issues.apache.org/jira/browse/CASSANDRA-2156
>
> but seems only available to 0.8 and people submitted a patch for 0.6, I a=
m
> using 0.7.4, do I need to dig into the code and make my own patch?
>
> does add compaction throttle solve the io problem?  thanks!
>
> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < <springrider@gmail.com>
> springrider@gmail.com> wrote:
>
>> at the beginning of using cassandra, I have no idea that I should run
>> "node repair" frequently, so basically, I have 3 nodes with RF=3D3 and h=
ave
>> not run node repair for months, the data size is 20G.
>>
>> the problem is when I start running node repair now, it eat up all disk =
io
>> and the server load became 20+ and increasing, the worst thing is, the
>> entire cluster has slowed down and can not handle request. so I have to =
stop
>> it immediately because it make my web service unavailable.
>>
>> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G
>> memory, with Western Digital WD RE3 WD1002FBYS SATA disk.
>>
>> I really have no idea what to do now, as currently I have already found
>> some data loss, any suggestions would be appreciated.
>>
>
>
>
> --
> =E3=C6=B4=BA=C2=B7
>
>


--=20
=E3=C6=B4=BA=C2=B7

--bcaec5014dd780b02d04a88ae094
Content-Type: text/html; charset=GB2312
Content-Transfer-Encoding: quoted-printable

thank you very much for the help, I will try to adjust minor compaction and=
 also dealing with single CF at a time.<br><br><div class=3D"gmail_quote">O=
n Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton <span dir=3D"ltr">&lt;<a href=
=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt;</span> =
wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div bgcolor=3D"#FFFFFF"><div>If you have n=
ever run repair also check the section on repair on this page&nbsp;<br><a h=
ref=3D"http://wiki.apache.org/cassandra/Operations" target=3D"_blank">http:=
//wiki.apache.org/cassandra/Operations</a> About how frequently it should b=
e run.</div>

<div><br></div><div>There is an issue where repair can stream too much data=
, and this can lead to excessive disk use.</div><div><br></div><div>My non =
scientific approach to the never run repair before problem is to repair a s=
ingle CF at a time, starting with the small ones that are less likely to ha=
ve differences as they will stream the smallest amount of data.&nbsp;</div>

<div><br></div><div>If you really want to conserve disk IO during the repai=
r consider disabling the minor compaction by setting the min and max thresh=
olds to 0 via node tool.</div><div><br></div><div>hope that helps.</div>

<div><br></div><div><br><div>-----------------</div><div>Aaron Morton</div>=
<font color=3D"#888888"><div>Freelance Cassandra Developer</div><div>@aaron=
morton</div><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank"=
>http://www.thelastpickle.com</a></div>

</font></div><div><div></div><div class=3D"h5"><div><br>On 20/07/2011, at 1=
1:46 PM, Yan Chunlu &lt;<a href=3D"mailto:springrider@gmail.com" target=3D"=
_blank">springrider@gmail.com</a>&gt; wrote:<br><br></div><div></div><block=
quote type=3D"cite">

<div>just found this:<div><a href=3D"https://issues.apache.org/jira/browse/=
CASSANDRA-2156" target=3D"_blank"></a><a href=3D"https://issues.apache.org/=
jira/browse/CASSANDRA-2156" target=3D"_blank">https://issues.apache.org/jir=
a/browse/CASSANDRA-2156</a></div>

<div><br></div><div>but seems only available to 0.8 and people submitted a =
patch for 0.6, I am using 0.7.4, do I need to dig into the code and make my=
 own patch?</div>

<div><br></div><div>does add compaction throttle solve the io problem? &nbs=
p;thanks!<br><br><div class=3D"gmail_quote">On Wed, Jul 20, 2011 at 4:44 PM=
, Yan Chunlu <span dir=3D"ltr">&lt;<a href=3D"mailto:springrider@gmail.com"=
 target=3D"_blank"></a><a href=3D"mailto:springrider@gmail.com" target=3D"_=
blank">springrider@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">at the beginning of using cassandra, I have =
no idea that I should run &quot;node repair&quot; frequently, so basically,=
 I have 3 nodes with RF=3D3 and have not run node repair for months, the da=
ta size is 20G.
<div><br></div><div>the problem is when I start running node repair now,  i=
t eat up all disk io and the server load became 20+ and increasing, the wor=
st thing is, the entire cluster has slowed down and can not handle request.=
  so I have to stop it immediately because it make my web service unavailab=
le.</div>


<div><br></div><div> the server has <span style=3D"color:rgb(51, 51, 51);fo=
nt-family:Verdana, Arial, Helvetica, sans-serif;font-size:11px">Intel Xeon-=
Lynnfield 3470-Quadcore [2.93GHz]</span> and 8G memory, with <span style=3D=
"color:rgb(51, 51, 51);font-family:Verdana, Arial, Helvetica, sans-serif;fo=
nt-size:11px">Western Digital WD RE3 WD1002FBYS  SATA disk.</span></div>


<div><span style=3D"color:rgb(51, 51, 51);font-family:Verdana, Arial, Helve=
tica, sans-serif;font-size:11px"><br></span></div><div> I really have no id=
ea what to do now, as currently I have already found some data loss, any su=
ggestions would be appreciated.  </div>


</blockquote></div><br><br clear=3D"all"><br>-- <br>=E3=C6=B4=BA=C2=B7<br>
</div>
</div></blockquote></div></div></div></blockquote></div><br><br clear=3D"al=
l"><br>-- <br>=E3=C6=B4=BA=C2=B7<br>

--bcaec5014dd780b02d04a88ae094--