cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: how to solve one node is in heavy load in unbalanced cluster
Date Fri, 29 Jul 2011 04:01:17 GMT
add new nodes seems added more pressure  to the cluster?  how about your
data size?

On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <frank@aimatch.com> wrote:

> "Dropped read message" might be an indicator of capacity issue. We
> experienced the similar issue with 0.7.6.
>
> We ended up adding two extra nodes and physically rebooted the offending
> node(s).
>
> The entire cluster then calmed down.
>
> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springrider@gmail.com> wrote:
>
>> I have three nodes and RF=3.here is the current ring:
>>
>>
>> Address Status State Load Owns Token
>>
>> 84944475733633104818662955375549269696
>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>
>>
>> it is very un-balanced and I would like to re-balance it using
>> "nodetool move" asap. unfortunately I haven't been run node repair for
>> a long time.
>>
>> aaron suggested it's better to run node repair on every node then
>> re-balance it.
>>
>>
>> problem is the node3 is in heavy-load currently, and the entire
>> cluster slow down if I start doing node repair. I have to
>> disablegossip and disablethrift to stop the repair.
>>
>> only cassandra running on that server and I have no idea what it was
>> doing. the cpu load is about 20+ currently. compcationstats and
>> netstats shows it was not doing anything.
>>
>> I have change client to not to connect to node3, but still, it seems
>> in heavy load and io utils is 100%.
>>
>>
>> the log seems normal(although not sure what about the "Dropped read
>> message" thing):
>>
>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>> 2563726360 used; max is 4248829952
>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>  INFO 13:21:38,560 ReadStage                         8      7555
>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>
>>
>>
>> is there anyway to tell what node3 was doing? or at least is there any
>> way to make it not slowdown the whole cluster?
>>
>
>
>
> --
> Frank Duan
> aiMatch
> frank@aimatch.com
> c: 703.869.9951
> www.aiMatch.com
>
>

Mime
View raw message