incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: how to solve one node is in heavy load in unbalanced cluster
Date Sun, 31 Jul 2011 16:51:41 GMT
any help? thanks!

On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu <springrider@gmail.com> wrote:

> and by the way, my RF=3 and the other two nodes have much more capacity,
> why does they always routed the request to node3?
>
> coud I do a rebalance now? before node repair?
>
>
> On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <springrider@gmail.com>wrote:
>
>> add new nodes seems added more pressure  to the cluster?  how about your
>> data size?
>>
>>
>> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <frank@aimatch.com> wrote:
>>
>>> "Dropped read message" might be an indicator of capacity issue. We
>>> experienced the similar issue with 0.7.6.
>>>
>>> We ended up adding two extra nodes and physically rebooted the offending
>>> node(s).
>>>
>>> The entire cluster then calmed down.
>>>
>>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springrider@gmail.com>wrote:
>>>
>>>> I have three nodes and RF=3.here is the current ring:
>>>>
>>>>
>>>> Address Status State Load Owns Token
>>>>
>>>> 84944475733633104818662955375549269696
>>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>>
>>>>
>>>> it is very un-balanced and I would like to re-balance it using
>>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>>> a long time.
>>>>
>>>> aaron suggested it's better to run node repair on every node then
>>>> re-balance it.
>>>>
>>>>
>>>> problem is the node3 is in heavy-load currently, and the entire
>>>> cluster slow down if I start doing node repair. I have to
>>>> disablegossip and disablethrift to stop the repair.
>>>>
>>>> only cassandra running on that server and I have no idea what it was
>>>> doing. the cpu load is about 20+ currently. compcationstats and
>>>> netstats shows it was not doing anything.
>>>>
>>>> I have change client to not to connect to node3, but still, it seems
>>>> in heavy load and io utils is 100%.
>>>>
>>>>
>>>> the log seems normal(although not sure what about the "Dropped read
>>>> message" thing):
>>>>
>>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>>> 2563726360 used; max is 4248829952
>>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>>>  INFO 13:21:38,560 ReadStage                         8      7555
>>>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>>>
>>>>
>>>>
>>>> is there anyway to tell what node3 was doing? or at least is there any
>>>> way to make it not slowdown the whole cluster?
>>>>
>>>
>>>
>>>
>>> --
>>> Frank Duan
>>> aiMatch
>>> frank@aimatch.com
>>> c: 703.869.9951
>>> www.aiMatch.com
>>>
>>>
>>
>

Mime
View raw message