cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mina Naguib <mina.nag...@bloomdigital.com>
Subject Peculiar imbalance affecting 2 machines in a 6 node cluster
Date Tue, 09 Aug 2011 23:24:16 GMT
Hi everyone

I'm observing a very peculiar type of imbalance and I'd appreciate any help or ideas to try.
 This is on cassandra 0.7.8.

The original cluster was 3 machines in the DCMTL, equally balanced at 33.33% each and each
holding roughly 34G.

Then, I added to it 3 machines in the LA data center.  The ring is currently as follows (IP
addresses redacted for clarity):

Address         Status State   Load            Owns    Token                             
         
                                                       151236607520417094872610936636341427313
    
IPLA1           Up     Normal  34.57 GB        11.11%  0                                 
         
IPMTL1          Up     Normal  34.43 GB        22.22%  37809151880104273718152734159085356828
     
IPLA2           Up     Normal  17.55 GB        11.11%  56713727820156410577229101238628035242
     
IPMTL2          Up     Normal  34.56 GB        22.22%  94522879700260684295381835397713392071
     
IPLA3           Up     Normal  51.37 GB        11.11%  113427455640312821154458202477256070485
    
IPMTL3          Up     Normal  34.71 GB        22.22%  151236607520417094872610936636341427313
    

The bump in the 3 MTL nodes (22.22%) is in anticipation of 3 more machines in yet another
data center, but they're not ready yet to join the cluster.  Once that third DC joins all
nodes will be at 11.11%. However, I don't think this is related.

The problem I'm currently observing is visible in the LA machines, specifically IPLA2 and
IPLA3.  IPLA2 has 50% the expected volume, and IPLA3 has 150% the expected volume.

Putting their load side by side shows the peculiar ratio of 2:1:3 between the 3 LA nodes:
34.57 17.55 51.37
(the same 2:1:3 ratio is reflected in our internal tools trending reads/second and writes/second)

I've tried several iterations of compactions/cleanups to no avail.  In terms of config this
is the main keyspace:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
    Options: [DCMTL:2, DCLA:2]
And this is the cassandra-topology.properties file (IPs again redacted for clarity):
  IPMTL1:DCMTL:RAC1
  IPMTL2:DCMTL:RAC1
  IPMTL3:DCMTL:RAC1
  IPLA1:DCLA:RAC1
  IPLA2:DCLA:RAC1
  IPLA3:DCLA::RAC1
  IPLON1:DCLON:RAC1
  IPLON2:DCLON:RAC1
  IPLON3:DCLON:RAC1
  # default for unknown nodes
  default=DCBAD:RACBAD


One thing that did occur to me while reading the source code for the NetworkTopologyStrategy's
calculateNaturalEndpoints is that it prefers placing data on different racks.  Since all my
machines are defined as in the same rack, I believe that the 2-pass approach would still yield
balanced placement.

However, just to test, I modified live the topology file to specify that IPLA1, IPLA2 and
IPLA3 are in 3 different racks, and sure enough I saw immediately that the reads/second and
writes/second equalized to expected fair volume (I quickly reverted that change).

So, it seems somehow related to rack awareness, but I've been raking my head and I can't figure
out how/why, or why the three MTL machines are not affected the same way.

If the solution is to specify them in different racks and run repair on everything, I'm okay
with that - but I hate doing that without first understanding *why* the current behavior is
the way it is.

Any ideas would be hugely appreciated.

Thank you.


Mime
View raw message