incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Stephen.M.Thomp...@wellsfargo.com>
Subject RE: unbalanced ring
Date Thu, 07 Feb 2013 21:36:46 GMT
I found when I tried to do queries after sending this that although it shows a ton of data,
it would no longer return ANYTHING for any query ... always 0 rows.  So something was severely
hosed.  I blew away the data and reloaded from database ... the data set is a little smaller
than before.  It shows up somewhat more balanced, although I'm still curious why the third
node is so much smaller than the first two.



[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status

Datacenter: 28

==============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address           Load       Tokens  Owns (effective)  Host ID                       
       Rack

UN  10.28.205.125     994.89 MB  255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c
 205

UN  10.28.205.126     966.17 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f
 205

UN  10.28.205.127     699.79 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9
 205

[root@Config3482VM1 apache-cassandra-1.2.1]#



And yes, that is the entire content of the output from the status call, unedited.   I have
attached the output from nodetool ring.  To answer a couple of the questions from below from
Eric:



* One data center (28)?  One rack (205)? Three nodes?

                Yes, that's right.  We're just doing a proof of concept at the moment so this
is three VMWare servers.



* How many keyspaces, and what are the replication strategies?

                There is one keyspace, and it has only one CF at this point.



[default@KEYSPACE_NAME] describe;

Keyspace: KEYSPACE_NAME:

  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy

  Durable Writes: true

    Options: [28:2]



* TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation
is suspicious.



I'm not sure what you mean by this.



Steve



-----Original Message-----
From: Eric Evans [mailto:eevans@acunu.com]
Sent: Thursday, February 07, 2013 9:56 AM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring



On Wed, Feb 6, 2013 at 2:02 PM,  <Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>>
wrote:

> Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and

> compact on each of the nodes.

>

>

>

> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status

>

> Datacenter: 28

>

> ==============

>

> Status=Up/Down

>

> |/ State=Normal/Leaving/Joining/Moving

>

> --  Address           Load       Tokens  Owns (effective)  Host ID

> Rack

>

> UN  10.28.205.125     1.7 GB     255     33.7%

> 3daab184-61f0-49a0-b076-863f10bc8c6c  205

>

> UN  10.28.205.126     591.44 MB  256     99.9%

> 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205

>

> UN  10.28.205.127     112.28 MB  257     66.4%

> d240c91f-4901-40ad-bd66-d374a0ccf0b9  205



Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized it in some
way?



It seems like there is some piece of missing context here.  Can you tell us:



* Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster
initialized with one token per node, and num_tokens set after the fact).  If so, what did
the initial token map look like?

* Was initial_token used at any point along the way (either to supply a single token, or csv
list of them), on any or all of the nodes in this cluster, at any time?

* One data center (28)?  One rack (205)? Three nodes?

* How many keyspaces, and what are the replication strategies?

* What does the full output of `nodetool ring' look like now?  Can you attach it?



> So this is a little better.  At last node 3 has some content, but they

> are still far from balanced.  If I am understand this correctly, this

> is the distribution I would expect if the tokens were set at 15/5/1

> rather than equal.  As configured, I would expect roughly equal

> amounts of data on each node. Is that right?  Do you have any

> suggestions for what I can look at to get there?



Shuffle should only be required if you started out with 1-token-per-node.  In that case, your
existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent
with one another (assuming of course that the existing ranges were evenly sized).  The shuffle
op merely "shuffles" the ranges you have to (random )other nodes in the cluster.



If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens
would have been randomly generated from within the murmur3 hash-space.  Random assignment
isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on
average.



TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is
suspicious.



> I have about 11M rows of data in this keyspace and none of them are

> exceptionally long ... it's data pulled from Oracle and didn't include

> any BLOB, etc.



[ ... ]



> From: aaron morton [mailto:aaron@thelastpickle.com]

> Sent: Tuesday, February 05, 2013 3:41 PM

> To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>

> Subject: Re: unbalanced ring

>

>

>

> Use nodetool status with vnodes

> http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnod

> es

>

>

>

> The different load can be caused by rack affinity, are all the nodes

> in the same rack ? Another simple check is have you created some very big rows?



> On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>
wrote:

>

>

>

> So I have three nodes in a ring in one data center.  My configuration

> has

> num_tokens: 256 set andinitial_token commented out.  When I look at

> the ring, it shows me all of the token ranges of course, and basically

> identical data for each range on each node.  Here is the Cliff's Notes

> version of what I see:

>

>

>

> [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

>

>

>

> Datacenter: 28

>

> ==========

>

> Replicas: 1

>

>

>

> Address         Rack        Status State   Load            Owns

> Token

>

>

> 9187343239835811839

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026347817059713363

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026276684526453414

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026205551993193465

>

>   (etc)

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9187343239835811840

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9151314442816847872

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9115285645797883904

>

>   (etc)

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> -9223372036854775808

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> 36028797018963967

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> 72057594037927935

>

>   (etc)

>

>

>

> So at this point I have a number of questions.   The biggest question is of

> Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127

> has only 0.000069 GB?  These boxes are all comparable and all

> configured identically.

>

>

>

> partitioner: org.apache.cassandra.dht.Murmur3Partitioner

>

>

>

> I'm sorry to ask so many questions - I'm having a hard time finding

> documentation that explains this stuff.





--

Eric Evans

Acunu | http://www.acunu.com | @acunu

Mime
View raw message