when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens
I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time. The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache...
On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner...
Address DC Rack Status State Load Owns Token
10.0.0.57 datacenter1 rack1 Up Normal 2.26 GB 20.00% 0
10.0.0.56 datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145
10.0.0.55 datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290
10.0.0.54 datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435
10.0.0.72 datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580
The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.