Thank you for the information.
I have increased the rf, and I think the increase we have seen in cpu load etc is due to the counter cf's, which is almost write-only (reads a few times a day). The load increase is noticeable, but no problem.
Repair went fine. But I noticed that when I increased rf for a counter column and for (some completely different reasons) took one node down, and after that ran Repair I would get multiple lines in system.log:
"invalid counter shard detected; (X, Y, Z) and (X, Y, Z2) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard"
I guess this is because that while the node was down, the counters gets out of sync and needs to just pick the highest? In my case this will be (more or less) correct, since the sync-problem happened because of a downed node,which means _all_ increases happens on the other node and that node will have the correct number? I am just curious, as some minor errors in the counters would be no problem for us.
----- Original Message -----
To:<email@example.com>, "Vegard Berget" <firstname.lastname@example.org>
Sent:Fri, 14 Jun 2013 17:20:26 -0700
Subject:Re: Changing replication factor
On Mon, Jun 10, 2013 at 6:04 AM, Vegard Berget <email@example.com> wrote:
> If one increases the replication factor of a keyspace and then do a repair,
> how will this affect the performance of the affected nodes? Could we risk
> the nodes being (more or less) unresponsive while repair is going on?
Repair is a relatively heavyweight activity (the heaviest a cassandra
node can do!) which requires significant headroom in terms of CPU,
heap memory and disk space. It is possible that nodes could become
unavailable transiently during the repair, but unless they are already
very busy they should not become completely unresponsive. For one
thing, both compaction and streaming respect throttles which are
designed to minimize the impact of the streaming/compaction workload
resulting from repair.
> The nodes I am speaking of contains ~100gb of data.
This is a relatively small amount of data per node, which makes the
impact of Repair less severe.
> Also, some of the keyspaces I am considering increase the replication factor
> for contains Counter Column Families (has rf:1). I think I have read that
> adding replication to counter cfs will affect performance negatively, is
> this correct?
Per Sylvain (one of the primary authors of the Counters codebase)  :
For counters, it's a little bit different. At RF=3, for each inserts,
one node is doing a write *and* a read, while the two other nodes are
only doing a
write. So given that the read takes a time is non negligible, you
should see simple
improvement a RF=3 compared to RF=1 because each node gets 1/3 of the
reads (involved in
the counter write) it would get if it was the only replica. Now if the
were negligible compared to the read time, then yes you would see roughly a 3x
increase. But while writes are still faster than reads in Cassandra,
reads a now fairly
fast too (but all this depends on other factor like how much the
caches helps, etc...), so it
will likely be less than a 3x increase. Should be noticeable though."
I interpret the above to mean that RF=3 is actually slightly *faster*
for Counters than RF=1.