incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bromhead <...@instaclustr.com>
Subject Re: Load balancing issue with virtual nodes
Date Tue, 29 Apr 2014 01:40:46 GMT
Some imbalance is expected and considered normal:

See http://wiki.apache.org/cassandra/VirtualNodes/Balance

As well as

https://issues.apache.org/jira/browse/CASSANDRA-7032

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 29 Apr 2014, at 7:30 am, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Hello all
> 
>  Some update about the issue.
> 
>  After wiping completely all sstable/commitlog/saved_caches folder and restart the cluster
from scratch, we still experience weird figures. After the restart, nodetool status does not
show an exact balance of 50% of data for each node :
> 
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address  Load Tokens Owns (effective) Host ID Rack
> UN host1 48.57 KB 256 51.6%  d00de0d1-836f-4658-af64-3a12c00f47d6 rack1
> UN host2 48.57 KB 256 48.4%  e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1
> 
> 
> As you can see, the % is very close to 50% but not exactly 50%
> 
>  What can explain that ? Can it be network connection issue during token initial shuffle
phase ?
> 
> P.S: both host1 and host2 are supposed to have exactly the same hardware
> 
> Regards
> 
>  Duy Hai DOAN
> 
> 
> On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan <batranub@yahoo.com> wrote:
> I don't know about hector but the datastax java driver needs just one ip from the cluster
and it will discover the rest of the nodes. Then by default it will do a round robin when
sending requests. So if Hector does the same the patterb will againg appear.
> Did you look at the size of the dirs?
> That documentation is for C* 0.8. It's old. But depending on your boxes you might reach
CPU bottleneck. Might want to google for write path in cassandra..  According to that, there
is not much to do when writes come in...  
> On Friday, April 25, 2014 12:00 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:
> I did some experiments.
> 
>  Let's say we have node1 and node2
> 
> First, I configured Hector with node1 & node2 as hosts and I saw that only node1
has high CPU load
> 
> To eliminate the "client connection" issue, I re-test with only node2 provided as host
for Hector. Same pattern. CPU load is above 50% on node1 and below 10% on node2.
> 
> It means that node2 is playing as coordinator and forward many write/read request to
node1
> 
>  Why did I look at CPU load and not iostat & al ?
> 
>  Because I have a very intensive write work load with read-only-once pattern. I've read
here (http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning) that heavy write
in C* is more CPU bound but maybe the info may be outdated and no longer true
> 
>  Regards
> 
>  Duy Hai DOAN
> 
> 
> On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler <michael@pbandjelly.org> wrote:
> On 04/24/2014 10:29 AM, DuyHai Doan wrote:
>   Client used = Hector 1.1-4
>   Default Load Balancing connection policy
>   Both nodes addresses are provided to Hector so according to its
> connection policy, the client should switch alternatively between both nodes
> 
> OK, so is only one connection being established to one node for one bulk write operation?
Or are multiple connections being made to both nodes and writes performed on both?
> 
> -- 
> Michael
> 
> 
> 
> 


Mime
View raw message