Sweeet, I %100 understand this now from these last few emails. It has always been a bit confusing.
Thanks,
Dean
From: Sylvain Lebresne >
Reply-To: "user@cassandra.apache.org" >
Date: Friday, March 1, 2013 4:36 AM
To: "user@cassandra.apache.org" >
Subject: Re: -pr vs. no -pr
On Thu, Feb 28, 2013 at 11:39 PM, Hiller, Dean > wrote:
Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 nodes(RF=3) instead of using nodetool repair –pr???
Yes, it is true.
And to precise further, in your case you have 2 options:
1) doing repair *without* -pr on 2 nodes (assuming you pick the correct 2 nodes, it's *not* any 2 nodes)
2) doing a repair *with* -pr on the 6 nodes
Both of those cases would 1) repair the full ring and 2) do the same amount of work.
What is the advantage of –pr then?
As it happens, your case is a special case. You have a number of node that is a multiple of your replication factor. Now if that wasn't the case (say 5, 7 or 8 nodes with RF=3), then there is *no way* you can repair *without* -pr the whole cluster without doing *more* work than by doing a repair *with* -pr on all nodes.
So the advantages of --pr (which btw, should be use for repair the whole cluster, not when you want to rebuild a specific node) are:
1) it always do the minimum of work, while repair without --pr is wasteful if the number of nodes is not a multiple of the replication factor (no matter how smart you are at scheduling the repairs).
2) even if your number of nodes is a multiple of the replication factor, you still have to make sure you pick the right N/RF nodes to repair if you don't use -pr. If you don't pick the correct ones, you will not repair the full ring. Using -pr is much more shoot-footing free: you have to run it on every node, period.
--
Sylvain