ok, so we just lost the data on that node. are building the raid on it, but once it is up what is the best way to bring it back in the cluster
You should get on 0.7.4 while you are doing this, this is a pretty good reason https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58Potentially.Never done a read repair on this cluster before, is that a problem?Repair will ensure that your data is distributed, and that deletes done mysteriously come back to life http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSecondsPersonally I would get a repair to complete before I started this process.You may want to make sure everything is compacted as best it can be before hand, see some of the other threads about repair using a lot of space.* use nodetool to change the compaction threshold down to 2 for the CF's* trigger a minor compaction using nodetool flush* wait and monitor using nodetool compactionstatsThe do a repair, reapir one CF at a time. Starting with the smallest CF. Monitor disk space andnodetool compactionstatsthennodetool netstatsIf you have the network space I would just move the files and then put them back….* drain* copy the /var/lib/cassandra/data and saved_caches dirs* copy the yaml* blast away* put things back in place* start up and run repairI know you have RF 3 and 3 nodes. I'm been cautious. If you don't have space the current approach is fine.You may want to disable Hinted Handoff while you are doing this as you are going to run repair anyway when the node comes back.CheersOn 19/08/2011, at 11:57 AM, Anand Somani wrote:Hi,
version - 0.7.4
cluster size = 3
RF = 3.
data size on a node ~500G
I want to do some disk maintenance on a cassandra node, so the process that I came up with is
Is this process sane. Never done a read repair on this cluster before, is that a problem? Should I run it per CF? Would it help if I did this before bringing the node down?
- drain this node
- back up the system data space
- rebuild the disk partition
- copy data from another node
- copy data from the backed up system data
- restart node
- run nodetool repair
Any pointers, things to worry about.