cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Peculiar imbalance affecting 2 machines in a 6 node cluster
Date Wed, 10 Aug 2011 09:12:41 GMT
WRT the load imbalance checking the basics: you've run cleanup after any tokens moves? Repair
is running ?  Also sometimes nodes get a bit bloated from repair and will settle down with
compaction. 

Your slightly odd tokens in the MTL DC are making it a little tricky to understand whats going
on. But I'm trying to check if you've followed the multi DC token selection here  http://wiki.apache.org/cassandra/Operations#Token_selection
. Background about what can happen in a multi dc deployment if the tokens are not right http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html

This is what you currently haveā€¦.

DC:LA
IPLA1           Up     Normal  34.57 GB        11.11%  0                                 
         
IPLA2           Up     Normal  17.55 GB        11.11%  56713727820156410577229101238628035242
     
IPLA3           Up     Normal  51.37 GB        11.11%  113427455640312821154458202477256070485
    

DC: MTL
IPMTL1          Up     Normal  34.43 GB        22.22%  37809151880104273718152734159085356828
     
IPMTL2          Up     Normal  34.56 GB        22.22%  94522879700260684295381835397713392071
     
IPMTL3          Up     Normal  34.71 GB        22.22%  151236607520417094872610936636341427313
  

Using the bump approach you would have 

IPLA1	0     
IPLA2   	56713727820156410577229101238628035242    
IPLA3	113427455640312821154458202477256070484     

IPMTL1	1           
IPMTL2	56713727820156410577229101238628035243          
IPMTL3	113427455640312821154458202477256070485          

Using the interleaving you would have 

IPLA1 	0
IPMTL1 	28356863910078205288614550619314017621
IPLA2 	56713727820156410577229101238628035242
IPMTL2 	85070591730234615865843651857942052863
IPLA3 	113427455640312821154458202477256070484
IPMTL3 	141784319550391026443072753096570088105

The current setup in LA give each node in LA 33% of the LA local ring. Which should be right,
just checking.  

If cleanup / repair / compaction is all good and you are confident the tokens are right try
poking around with nodetool getendpoints to see which nodes keys are sent to.  Like you I
cannot see anything obvious in NTS that would cause load to be imbalanced if they are all
in the same rack. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 11:24, Mina Naguib wrote:

> Hi everyone
> 
> I'm observing a very peculiar type of imbalance and I'd appreciate any help or ideas
to try.  This is on cassandra 0.7.8.
> 
> The original cluster was 3 machines in the DCMTL, equally balanced at 33.33% each and
each holding roughly 34G.
> 
> Then, I added to it 3 machines in the LA data center.  The ring is currently as follows
(IP addresses redacted for clarity):
> 
> Address         Status State   Load            Owns    Token                        
              
>                                                       151236607520417094872610936636341427313
    
> IPLA1           Up     Normal  34.57 GB        11.11%  0                            
              
> IPMTL1          Up     Normal  34.43 GB        22.22%  37809151880104273718152734159085356828
     
> IPLA2           Up     Normal  17.55 GB        11.11%  56713727820156410577229101238628035242
     
> IPMTL2          Up     Normal  34.56 GB        22.22%  94522879700260684295381835397713392071
     
> IPLA3           Up     Normal  51.37 GB        11.11%  113427455640312821154458202477256070485
    
> IPMTL3          Up     Normal  34.71 GB        22.22%  151236607520417094872610936636341427313
    
> 
> The bump in the 3 MTL nodes (22.22%) is in anticipation of 3 more machines in yet another
data center, but they're not ready yet to join the cluster.  Once that third DC joins all
nodes will be at 11.11%. However, I don't think this is related.
> 
> The problem I'm currently observing is visible in the LA machines, specifically IPLA2
and IPLA3.  IPLA2 has 50% the expected volume, and IPLA3 has 150% the expected volume.
> 
> Putting their load side by side shows the peculiar ratio of 2:1:3 between the 3 LA nodes:
> 34.57 17.55 51.37
> (the same 2:1:3 ratio is reflected in our internal tools trending reads/second and writes/second)
> 
> I've tried several iterations of compactions/cleanups to no avail.  In terms of config
this is the main keyspace:
>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>    Options: [DCMTL:2, DCLA:2]
> And this is the cassandra-topology.properties file (IPs again redacted for clarity):
>  IPMTL1:DCMTL:RAC1
>  IPMTL2:DCMTL:RAC1
>  IPMTL3:DCMTL:RAC1
>  IPLA1:DCLA:RAC1
>  IPLA2:DCLA:RAC1
>  IPLA3:DCLA::RAC1
>  IPLON1:DCLON:RAC1
>  IPLON2:DCLON:RAC1
>  IPLON3:DCLON:RAC1
>  # default for unknown nodes
>  default=DCBAD:RACBAD
> 
> 
> One thing that did occur to me while reading the source code for the NetworkTopologyStrategy's
calculateNaturalEndpoints is that it prefers placing data on different racks.  Since all my
machines are defined as in the same rack, I believe that the 2-pass approach would still yield
balanced placement.
> 
> However, just to test, I modified live the topology file to specify that IPLA1, IPLA2
and IPLA3 are in 3 different racks, and sure enough I saw immediately that the reads/second
and writes/second equalized to expected fair volume (I quickly reverted that change).
> 
> So, it seems somehow related to rack awareness, but I've been raking my head and I can't
figure out how/why, or why the three MTL machines are not affected the same way.
> 
> If the solution is to specify them in different racks and run repair on everything, I'm
okay with that - but I hate doing that without first understanding *why* the current behavior
is the way it is.
> 
> Any ideas would be hugely appreciated.
> 
> Thank you.
> 


Mime
View raw message