Hello Simon,

Sorry if the question has already been answered.

This was probably answered here indeed (and multiple times I'm sure), but I do not mind taking a moment to repeat this :).

About why?

This difference is expected. It can be due to multiple factors such as:
- Different compaction states on distinct nodes
- Ongoing compaction and temporary SSTables
- Different number of tombstones evicted (somewhat related to the first point)
- imbalances in schema/workload (not applicable here, all nodes have 100% of the data)
- A low number of vnodes (that is good for many other reasons) does have a negative impact on distribution. (Not applicable, with 256 nodes, data should be almost perfectly distributed)
- Any snapshots?
- ... (others that don't come to mind right now...)

Anyway, to answer your question more precisely:

Is it OK to have differences between nodes ?

Yes, with this proportions it is perfectly ok. Nodes have a similar dataset and I imagine queries are well distributed. The situation seems to be normal, at least nothing looking wrong in this `nodetool status` output I would say.

Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting

Le mer. 29 mai 2019 à 08:09, Simon ELBAZ <selbaz@linagora.com> a écrit :


Sorry if the question has already been answered.

Where nodetool status is run on a 3 node cluster (replication factor : 3), the load between the different nodes is not equal.

# nodetool status opush
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                               Rack
UN  9,14 GB    256     100,0%            989589e8-9fcf-4c2f-85e9-c0599ac872e5  rack1
UN  8,54 GB    256     100,0%            42223dd0-1adf-433c-810d-8bc87f0d3af4  rack1
UN  8,92 GB    256     100,0%            1cecacc3-c301-4ae9-a71e-1a1a944d5731  rack1

Is it OK to have differences between nodes ?

Thanks for your answer.