ok, so we just lost the data on that node. are building the raid on it, but once it is up what is the best way to bring it back in the cluster
like I said have 500G and am on 0.7.4, 3 node cluster and RF=3

On Thu, Aug 18, 2011 at 9:42 PM, aaron morton <aaron@thelastpickle.com> wrote:
You should get on 0.7.4 while you are doing this, this is a pretty good reason https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58

 Never done a read repair on this cluster before, is that a problem?
Repair will ensure that your data is distributed, and that deletes done mysteriously come back to life http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
Personally I would get a repair to complete before I started this process. 

You may want to make sure everything is compacted as best it can be before hand, see some of the other threads about repair using a lot of space. 

* use nodetool to change the compaction threshold down to 2 for the CF's
* trigger a minor compaction using nodetool flush
* wait and monitor using nodetool compactionstats

The do a repair, reapir one CF at a time. Starting with the smallest CF. Monitor disk space and 
nodetool compactionstats 
nodetool netstats

If you have the network space I would just move the files and then put them back….

* drain
* copy the /var/lib/cassandra/data and saved_caches dirs
* copy the yaml 
* blast away
* put things back in  place
* start up and run repair

I know you have RF 3 and 3 nodes. I'm been cautious. If you don't have space the current approach is fine. 

You may want to disable Hinted Handoff while you are doing this as you are going to run repair anyway when the node comes back. 


Aaron Morton
Freelance Cassandra Developer

On 19/08/2011, at 11:57 AM, Anand Somani wrote:


version - 0.7.4
cluster size = 3
RF = 3.
data size on a node ~500G

I want to do some disk maintenance on a cassandra node, so the process that I came up with is
  • drain this node
  • back up the system data space
  • rebuild the disk partition
  • copy data from another node
  • copy data from the backed up system data
  • restart node
  • run nodetool repair
Is this process sane. Never done a read repair on this cluster before, is that a problem? Should I run it per CF? Would it help if I did this before bringing the node down?

Any pointers, things to worry about.