incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Duan <fr...@aimatch.com>
Subject Re: how to solve one node is in heavy load in unbalanced cluster
Date Thu, 28 Jul 2011 20:16:01 GMT
"Dropped read message" might be an indicator of capacity issue. We
experienced the similar issue with 0.7.6.

We ended up adding two extra nodes and physically rebooted the offending
node(s).

The entire cluster then calmed down.

On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <springrider@gmail.com> wrote:

> I have three nodes and RF=3.here is the current ring:
>
>
> Address Status State Load Owns Token
>
> 84944475733633104818662955375549269696
> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>
>
> it is very un-balanced and I would like to re-balance it using
> "nodetool move" asap. unfortunately I haven't been run node repair for
> a long time.
>
> aaron suggested it's better to run node repair on every node then
> re-balance it.
>
>
> problem is the node3 is in heavy-load currently, and the entire
> cluster slow down if I start doing node repair. I have to
> disablegossip and disablethrift to stop the repair.
>
> only cassandra running on that server and I have no idea what it was
> doing. the cpu load is about 20+ currently. compcationstats and
> netstats shows it was not doing anything.
>
> I have change client to not to connect to node3, but still, it seems
> in heavy load and io utils is 100%.
>
>
> the log seems normal(although not sure what about the "Dropped read
> message" thing):
>
>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
> 2563726360 used; max is 4248829952
>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>  INFO 13:21:38,560 Pool Name                    Active   Pending
>  INFO 13:21:38,560 ReadStage                         8      7555
>  INFO 13:21:38,561 RequestResponseStage              0         0
>  INFO 13:21:38,561 ReadRepairStage                   0         0
>
>
>
> is there anyway to tell what node3 was doing? or at least is there any
> way to make it not slowdown the whole cluster?
>



-- 
Frank Duan
aiMatch
frank@aimatch.com
c: 703.869.9951
www.aiMatch.com

Mime
View raw message