cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process
Date Fri, 19 Aug 2011 18:39:10 GMT
Hi -

 From what I understand, Peter's recommendation should work for you. They  
have both worked for me. No need to copy anything by hand on the new node.  
Bootstrap/repair does that for you. From the Wiki:

If a node goes down entirely, then you have two options:

(Recommended approach) Bring up the replacement node with a new IP address,  
Set initial token to (failure node's token) - 1 and AutoBootstrap set to  
true in cassandra.yaml (storage-conf.xml for 0.6 or earlier). This will  
place the replacement node in front of the failure node. Then the bootstrap  
process begins. While this process runs, the node will not receive reads  
until finished. Once this process is finished on the replacement node, run  
nodetool removetoken once, supplying the token of the dead node, and  
nodetool cleanup on each node. You can obtain the dead node's token by  
running nodetool ring on any live node, unless there was some kind of  
outage, and the others came up but not the down one -- in that case, you  
can retrieve the token from the live nodes' system tables.

(Alternative approach) Bring up a replacement node with the same IP and  
token as the old, and run nodetool repair. Until the repair process is  
complete, clients reading only from this node may get no data back. Using a  
higher ConsistencyLevel on reads will avoid this.

On , Anand Somani <> wrote:
> Let me be specific on lost data -> lost a replica , the other 2 nodes  
> have replicas

> I am running read/write at quorum. At this point I have turned off my  
> clients from talking to this node. So if that is the case I can  
> potentially just nodetool repair (without changing IP). But would it be  
> better if I copied over the data/mykeyspace from another replica and then  
> run repair?

> On Fri, Aug 19, 2011 at 11:20 AM, Peter Schuller  
>> wrote:

> > ok, so we just lost the data on that node. are building the raid on it,  
> but

> > once it is up what is the best way to bring it back in the cluster

> You're saying the raid failed and data is gone?

> > just let it come up and run nodetool repair

> > copy data from another node and then run nodetool repair,

> >

> > do I still need to run repair immeidately if I copy the data? Want to

> > schedule repair for later during non peak hours?

> If data is gone, the safe way is to have it re-join the cluster:


> But note that in your case, since you've lost data (if I understand

> you), it's effectively a completely new node. That means you either

> want to switch it's IP address and go for the "recommended" approach,

> or do the other option but that WILL mean the node is serving reads

> with incorrect data, violating the consistency. Depending on your

> application, this may or may not be the case.

> Unless it's a major problem for you, I suggest bringing it back in

> with a new IP address and make it be treated like a completely fresh

> replacement node. Probably decreases the risk of mistakes happening.

> As for the other stuff about repair in the e-mail you pasted; periodic

> repairs are part of regular cluster maintenance. See:


> --

> / Peter Schuller (@scode on twitter)

View raw message