when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens

I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update.  Is this the case?  It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time.  The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache...

On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that normal?  I'm using RandomPartitioner...

Address         DC          Rack        Status State   Load            Owns    Token
                                                                           136112946768375385385349842972707284580    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  34028236692093846346337460743176821145    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  68056473384187692692674921486353642290    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  102084710076281539039012382229530463435    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  136112946768375385385349842972707284580

The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and last node both have a count of 0.  This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours.  The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.