incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Somani <meatfor...@gmail.com>
Subject Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process
Date Fri, 19 Aug 2011 18:09:23 GMT
ok, so we just lost the data on that node. are building the raid on it, but
once it is up what is the best way to bring it back in the cluster

   - just let it come up and run nodetool repair
   - copy data from another node and then run nodetool repair,
      -  do I still need to run repair immeidately if I copy the data? Want
      to schedule repair for later during non peak hours?

like I said have 500G and am on 0.7.4, 3 node cluster and RF=3


On Thu, Aug 18, 2011 at 9:42 PM, aaron morton <aaron@thelastpickle.com>wrote:

> You should get on 0.7.4 while you are doing this, this is a pretty good
> reason
> https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58
>
>  Never done a read repair on this cluster before, is that a problem?
>
> Potentially.
> Repair will ensure that your data is distributed, and that deletes done
> mysteriously come back to life
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
> Personally I would get a repair to complete before I started this process.
>
> You may want to make sure everything is compacted as best it can be before
> hand, see some of the other threads about repair using a lot of space.
>
> * use nodetool to change the compaction threshold down to 2 for the CF's
> * trigger a minor compaction using nodetool flush
> * wait and monitor using nodetool compactionstats
>
> The do a repair, reapir one CF at a time. Starting with the smallest CF.
> Monitor disk space and
> nodetool compactionstats
> then
> nodetool netstats
>
>
> If you have the network space I would just move the files and then put them
> backā€¦.
>
> * drain
> * copy the /var/lib/cassandra/data and saved_caches dirs
> * copy the yaml
> * blast away
> * put things back in  place
> * start up and run repair
>
> I know you have RF 3 and 3 nodes. I'm been cautious. If you don't have
> space the current approach is fine.
>
> You may want to disable Hinted Handoff while you are doing this as you are
> going to run repair anyway when the node comes back.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/08/2011, at 11:57 AM, Anand Somani wrote:
>
> Hi,
>
> version - 0.7.4
> cluster size = 3
> RF = 3.
> data size on a node ~500G
>
> I want to do some disk maintenance on a cassandra node, so the process that
> I came up with is
>
>    - drain this node
>    - back up the system data space
>    - rebuild the disk partition
>    - copy data from another node
>    - copy data from the backed up system data
>    - restart node
>    - run nodetool repair
>
> Is this process sane. Never done a read repair on this cluster before, is
> that a problem? Should I run it per CF? Would it help if I did this before
> bringing the node down?
>
> Any pointers, things to worry about.
>
> Thanks
> Anand
>
>
>

Mime
View raw message