incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Load balancing issue with virtual nodes
Date Tue, 29 Apr 2014 07:47:10 GMT
Thanks you Ben for the links




On Tue, Apr 29, 2014 at 3:40 AM, Ben Bromhead <ben@instaclustr.com> wrote:

> Some imbalance is expected and considered normal:
>
> See http://wiki.apache.org/cassandra/VirtualNodes/Balance
>
> As well as
>
> https://issues.apache.org/jira/browse/CASSANDRA-7032
>
> Ben Bromhead
> Instaclustr | www.instaclustr.com | @instaclustr<http://twitter.com/instaclustr>
|
> +61 415 936 359
>
> On 29 Apr 2014, at 7:30 am, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
> Hello all
>
>  Some update about the issue.
>
>  After wiping completely all sstable/commitlog/saved_caches folder and
> restart the cluster from scratch, we still experience weird figures. After
> the restart, nodetool status does not show an exact balance of 50% of data
> for each node :
>
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address  Load Tokens Owns (effective) Host ID Rack
> UN host1 48.57 KB 256 *51.6%*  d00de0d1-836f-4658-af64-3a12c00f47d6 rack1
> UN host2 48.57 KB 256 *48.4%*  e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1
>
>
> As you can see, the % is very close to 50% but not exactly 50%
>
>  What can explain that ? Can it be network connection issue during token
> initial shuffle phase ?
>
> P.S: both host1 and host2 are supposed to have exactly the same hardware
>
> Regards
>
>  Duy Hai DOAN
>
>
> On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan <batranub@yahoo.com>wrote:
>
>> I don't know about hector but the datastax java driver needs just one ip
>> from the cluster and it will discover the rest of the nodes. Then by
>> default it will do a round robin when sending requests. So if Hector does
>> the same the patterb will againg appear.
>> Did you look at the size of the dirs?
>> That documentation is for C* 0.8. It's old. But depending on your boxes
>> you might reach CPU bottleneck. Might want to google for write path in
>> cassandra..  According to that, there is not much to do when writes come
>> in...
>>   On Friday, April 25, 2014 12:00 AM, DuyHai Doan <doanduyhai@gmail.com>
>> wrote:
>>  I did some experiments.
>>
>>  Let's say we have node1 and node2
>>
>> First, I configured Hector with node1 & node2 as hosts and I saw that
>> only node1 has high CPU load
>>
>> To eliminate the "client connection" issue, I re-test with only node2
>> provided as host for Hector. Same pattern. CPU load is above 50% on node1
>> and below 10% on node2.
>>
>> It means that node2 is playing as coordinator and forward many write/read
>> request to node1
>>
>>  Why did I look at CPU load and not iostat & al ?
>>
>>  Because I have a very intensive write work load with read-only-once
>> pattern. I've read here (
>> http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning)
>> that heavy write in C* is more CPU bound but maybe the info may be outdated
>> and no longer true
>>
>>  Regards
>>
>>  Duy Hai DOAN
>>
>>
>> On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler <michael@pbandjelly.org>wrote:
>>
>> On 04/24/2014 10:29 AM, DuyHai Doan wrote:
>>
>>   Client used = Hector 1.1-4
>>   Default Load Balancing connection policy
>>   Both nodes addresses are provided to Hector so according to its
>> connection policy, the client should switch alternatively between both
>> nodes
>>
>>
>> OK, so is only one connection being established to one node for one bulk
>> write operation? Or are multiple connections being made to both nodes and
>> writes performed on both?
>>
>> --
>> Michael
>>
>>
>>
>>
>>
>
>

Mime
View raw message