Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of springrider@gmail.com
 designates 209.85.215.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOA66tEJjV6rfPiqZDXsU8GKq6F=bt5xBRO-_JcQJpuzeVpuMA@mail.gmail.com>
References: 
 <CAOA66tFut5Rrfonk9bb0TEtKMfdSBSY1t_EOuRRSpkMboLCXXA@mail.gmail.com>
 <CAJo5+fkFS=pqmUibwgVXGvGVLm1OnPcT-pqDMusykRppNba=aA@mail.gmail.com>
 <CAOA66tEa4EV0yMP3g1PMRMRsRnvc8=rXKyTujtCfvzFDQ0+AQA@mail.gmail.com>
 <CAOA66tEJjV6rfPiqZDXsU8GKq6F=bt5xBRO-_JcQJpuzeVpuMA@mail.gmail.com>
From: Yan Chunlu <springrider@gmail.com>
Date: Mon, 1 Aug 2011 00:51:41 +0800
Message-ID: 
 <CAOA66tHfWuNYSStKFv0F0qyNYv8pH8iEODOq9WLHzvxSmgAqDQ@mail.gmail.com>
Subject: Re: how to solve one node is in heavy load in unbalanced cluster
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0015174c10fa614bac04a96055d0

--0015174c10fa614bac04a96055d0
Content-Type: text/plain; charset=ISO-8859-1

any help? thanks!

On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu <springrider@gmail.com> wrote:

> and by the way, my RF=3 and the other two nodes have much more capacity,
> why does they always routed the request to node3?
>
> coud I do a rebalance now? before node repair?
>
>
> On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <springrider@gmail.com>wrote:
>
>> add new nodes seems added more pressure  to the cluster?  how about your
>> data size?
>>
>>
>> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <frank@aimatch.com> wrote:
>>
>>> "Dropped read message" might be an indicator of capacity issue. We
>>> experienced the similar issue with 0.7.6.
>>>
>>> We ended up adding two extra nodes and physically rebooted the offending
>>> node(s).
>>>
>>> The entire cluster then calmed down.
>>>
>>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springrider@gmail.com>wrote:
>>>
>>>> I have three nodes and RF=3.here is the current ring:
>>>>
>>>>
>>>> Address Status State Load Owns Token
>>>>
>>>> 84944475733633104818662955375549269696
>>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>>
>>>>
>>>> it is very un-balanced and I would like to re-balance it using
>>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>>> a long time.
>>>>
>>>> aaron suggested it's better to run node repair on every node then
>>>> re-balance it.
>>>>
>>>>
>>>> problem is the node3 is in heavy-load currently, and the entire
>>>> cluster slow down if I start doing node repair. I have to
>>>> disablegossip and disablethrift to stop the repair.
>>>>
>>>> only cassandra running on that server and I have no idea what it was
>>>> doing. the cpu load is about 20+ currently. compcationstats and
>>>> netstats shows it was not doing anything.
>>>>
>>>> I have change client to not to connect to node3, but still, it seems
>>>> in heavy load and io utils is 100%.
>>>>
>>>>
>>>> the log seems normal(although not sure what about the "Dropped read
>>>> message" thing):
>>>>
>>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>>> 2563726360 used; max is 4248829952
>>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>>>  INFO 13:21:38,560 ReadStage                         8      7555
>>>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>>>
>>>>
>>>>
>>>> is there anyway to tell what node3 was doing? or at least is there any
>>>> way to make it not slowdown the whole cluster?
>>>>
>>>
>>>
>>>
>>> --
>>> Frank Duan
>>> aiMatch
>>> frank@aimatch.com
>>> c: 703.869.9951
>>> www.aiMatch.com
>>>
>>>
>>
>

--0015174c10fa614bac04a96055d0
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

any help? thanks!<br><br><div class=3D"gmail_quote">On Fri, Jul 29, 2011 at=
 12:05 PM, Yan Chunlu <span dir=3D"ltr">&lt;<a href=3D"mailto:springrider@g=
mail.com">springrider@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex;">

and by the way, my RF=3D3 and the other two nodes have much more capacity, =
why does they always routed the request to node3?<div><br></div><div>coud I=
 do a rebalance now? before node repair?<div><div></div><div class=3D"h5">
<br>
<br><div class=3D"gmail_quote">On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:springrider@gmail.com" target=3D"_=
blank">springrider@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">add new nodes seems added more pressure =A0t=
o the cluster? =A0how about your data size?<div><div></div><div><br>
<br><div class=3D"gmail_quote">On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan =
<span dir=3D"ltr">&lt;<a href=3D"mailto:frank@aimatch.com" target=3D"_blank=
">frank@aimatch.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div>&quot;Dropped read=A0message&quot; migh=
t be an indicator of capacity issue. We experienced the similar issue with =
0.7.6.</div>


<div><br></div><div>We ended up adding two extra nodes and physically reboo=
ted the offending node(s).</div>
<div><br></div><div>The entire cluster then calmed down.</div><div><div></d=
iv><div><br><div class=3D"gmail_quote">On Thu, Jul 28, 2011 at 2:24 PM, Yan=
 Chunlu <span dir=3D"ltr">&lt;<a href=3D"mailto:springrider@gmail.com" targ=
et=3D"_blank">springrider@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I have three nodes and RF=3D3.here is the cu=
rrent ring:<br>
<br>
<br>
Address Status State Load Owns Token<br>
<br>
84944475733633104818662955375549269696<br>
node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102<br>
node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360<br>
node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696<br>
<br>
<br>
it is very un-balanced and I would like to re-balance it using<br>
&quot;nodetool move&quot; asap. unfortunately I haven&#39;t been run node r=
epair for<br>
a long time.<br>
<br>
aaron suggested it&#39;s better to run node repair on every node then re-ba=
lance it.<br>
<br>
<br>
problem is the node3 is in heavy-load currently, and the entire<br>
cluster slow down if I start doing node repair. I have to<br>
disablegossip and disablethrift to stop the repair.<br>
<br>
only cassandra running on that server and I have no idea what it was<br>
doing. the cpu load is about 20+ currently. compcationstats and<br>
netstats shows it was not doing anything.<br>
<br>
I have change client to not to connect to node3, but still, it seems<br>
in heavy load and io utils is 100%.<br>
<br>
<br>
the log seems normal(although not sure what about the &quot;Dropped read<br=
>
message&quot; thing):<br>
<br>
=A0INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving<br>
<a href=3D"tel:2563726360" value=3D"+12563726360" target=3D"_blank">2563726=
360</a>=A0used; max is <a href=3D"tel:4248829952" value=3D"+14248829952" ta=
rget=3D"_blank">4248829952</a><br>
=A0WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms<br>
=A0INFO 13:21:38,560 Pool Name =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Activ=
e =A0 Pending<br>
=A0INFO 13:21:38,560 ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 8 =A0 =A0 =A07555<br>
=A0INFO 13:21:38,561 RequestResponseStage =A0 =A0 =A0 =A0 =A0 =A0 =A00 =A0 =
=A0 =A0 =A0 0<br>
=A0INFO 13:21:38,561 ReadRepairStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =
=A0 =A0 =A0 =A0 0<br>
<br>
<br>
<br>
is there anyway to tell what node3 was doing? or at least is there any<br>
way to make it not slowdown the whole cluster?<br>
</blockquote></div><br><br clear=3D"all"><br></div></div><font color=3D"#88=
8888">-- <br><div>Frank Duan</div><div>aiMatch</div><div><a href=3D"mailto:=
frank@aimatch.com" target=3D"_blank">frank@aimatch.com</a></div><div>c: <a =
href=3D"tel:703.869.9951" value=3D"+17038699951" target=3D"_blank">703.869.=
9951</a></div>


<div><a href=3D"http://www.aiMatch.com" target=3D"_blank">www.aiMatch.com</=
a></div>
<br>
</font></blockquote></div><br>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br>

--0015174c10fa614bac04a96055d0--