cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <>
Subject Re: Little question
Date Thu, 06 Oct 2016 12:59:54 GMT
Hi Rubén,

Good luck with your adoptive Cassandra cluster :-).

Some thoughts:

You are struggling with balancing your cluster, there is a lot of
documentation about how the Cassandra Architecture work.

Some insight to help you understand what your setup is while reading.

- You are *not* using vnodes (1 token range per node)
- You are using Random Partitioner.

About your current issue, I'll use an OODA loop as the content is quite
long, it helps me structuring my ideas, improve completeness, and it will
hopefully make it easier to read on your side. It's up to you to follow
this plan or not (Decide / Act) parts.


When using 5 nodes, *no* vnodes and random partitioner the token
(initial_token in cassandra.yaml) in use should be:

- 0
- 34028236692093846346337460743176821145
- 68056473384187692692674921486353642290
- 102084710076281539039012382229530463435
- 136112946768375385385349842972707284580

For 6 nodes:

- 0
- 28356863910078205288614550619314017621
*- 56713727820156410577229101238628035242*
- 85070591730234615865843651857942052863
- 113427455640312821154458202477256070484
- 141784319550391026443072753096570088105

Using, but
there are many tools like this one, just pick one :-).


So it looks like your setup is built to run with 6 nodes and as of now only
have 5.

Your cluster is then *not* well balanced. This cluster where a node
(60.206) was added recently and an other node has been removed recently,
probably sitting on Token '56713727820156410577229101238628035242'. When
removing this node the balance was broken. datacenter1 rack1 Up Normal 262.12 GB 33.33%
> 85070591730234615865843651857942052863

This node is holding a percentage of the data twice bigger than initially
planned. There is a replication factor of 3, so 2 other nodes are having
more data than expected as well. This cluster needs to be balanced to avoid
making 3 of those nodes being more loaded (size and throughput) than the
other 2. Cassandra can live with it, not sure about your server. If using
the same hardware everywhere, it makes sense to try balancing it.

Other notes:

- Using 'nodetool status <keyspace>' instead of 'nodetool ring' or
'nodetool status' is a good habit you might want to take as RF and
ownership is defined at the Keyspace level. I believe this has no impacts
in the current issue.

- As you added a new node recently, some old nodes might be still holding
data they are not owning anymore. To address this you need to run a
nodetool cleanup on *all* the nodes *excepted* the last one you added.

This latter point combined with the imbalances would explain that the new
node owns a different amount of data.


So I tend to agree with you that some data should be cleaned:

Is that difference data that is not cleaned up, such as TTL-expired cell or
> tombstoned data?

But I would say that nodes are holding data out of their primary and
replicated ranges. We need to get rid of this extra data that is no longer
used anyway.

We also want to balance the cluster if using the same hardware on all the


Here is what I would probably do

First fix the balance, this means you need to either:

- "move" (nodetool move) nodes around in order to have a well balanced 5
node cluster (you might want to read more about it if going this way).


- add a 6th node with
'initial_token: 56713727820156410577229101238628035242'

As 'nodetool move' is quite an heavy operation involving a few nodes and a
lot of streaming, to be performed on each node except one, I would
recommend adding a node, even more if you are still learning about

Once you're happy with the cluster balance, run "nodetool cleanup" on all
the nodes. It's a local operation that can be run simultaneously on many /
all the nodes, as long as there are resources available, as it is a bit I/O
and CPU intensive.

Then check again the balance (nodetool status <keyspace>). Due to the
compacton state you can have discrepancies but it should be far better.


Alain Rodriguez - @arodream -

The Last Pickle - Apache Cassandra Consulting

2016-10-04 15:57 GMT+02:00 Ruben Cardenal <>:

> Hi,
> We've inherited quite a big amazon infrastructure from a company we've
> purchased. It's has an ancient and obsolete implementation of services,
> being the worst (and more expensive) of all of them a 5 cluster of
> Cassandra (RF=3). I'm new to Cassandra, and yes, I'm doing my way
> throughout docs.
> I was told that Amazon asked them a few months ago to reboot one of their
> servers (it had been turned on for so long that Amazon had to make some
> changes and needed it rebooted), so they had to add a new node to the
> cluster. If you query nodetool as of now, it shows:
> $ nodetool ring
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address DC Rack Status State Load Owns Token
> 141784319550391026443072753096570088105
> datacenter1 rack1 Up Normal 263.06 GB 16.67% 0
> datacenter1 rack1 Up Normal 253.31 GB 16.67%
> 28356863910078205288614550619314017621
> datacenter1 rack1 Up Normal 262.12 GB 33.33%
> 85070591730234615865843651857942052863
> datacenter1 rack1 Up Normal 264.28 GB 16.67%
> 113427455640312821154458202477256070484
> datacenter1 rack1 Up Normal 65.15 GB 16.67%
> 141784319550391026443072753096570088105
> What puzzels me is the last line. It belongs to the last added node, the
> new one I talked about. While it's holding the same amount of data (16.67%)
> that other 3 nodes, the Load is about 4 times lower. What does this mean?
> Is that difference data that is not cleaned up, such as TTL-expired cell or
> tombstoned data?
> Thanks and excuse me if I'm asking something stupid.
> Rubén.

View raw message