Hello,

 

Question about counters, replication and the ReplicateOnWriteStage

 

I’ve recently turned on a new CF which uses a counter column.

 

We have a three DC setup running Cassandra 1.2.4 with vNodes, hex core processors, 32Gb memory.

DC 1 - 9 nodes with RF 3

DC 2 - 3 nodes with RF 2

DC 3 - 3 nodes with RF 2

 

DC 1 one receives most of the updates to this counter column. ~3k per sec.

 

I’ve disabled any client reads while I sort out this issue.

Disk utilization is very low

Memory is aplenty (while not reading)

Schema:

CREATE TABLE cf1 (

  uid uuid,

  id1 int,

  id2 int,

  id3 int,

  ct counter,

  PRIMARY KEY (uid, id1, id2, id3)

) WITH …

 

Three of the machines in DC 1 are reporting very high CPU load.

Looking at tpstats there is a large number of pending ReplicateOnWriteStage just on those machines.

 

Why would only three of the machines be reporting this?

Assuming its distributed by uuid value there should be an even load across the cluster, yea?

Am I missing something about how distributed counters work?

 

Is changing CL to ONE fine if I’m not too worried about 100% consistency?

 

 

Thanks,

Chris