cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: problem with bootstrap
Date Wed, 09 Mar 2011 20:18:18 GMT
The definition of "down" is important here. 

Down refers to a node that has joined the ring, so the other nodes know of it's existence
and the range it is storing, which is not responding to gossip messages. While it is down
it is still considered an endpoint. The error you and Patrik saw refers to the number of endpoints
in the ring, not the number of Up nodes. When doing dev I have a 2 nodes cluster on my laptop
with rf=2, it's fine to bring the nodes in the cluster up one at a time. 

The issue I think you and Patrik are seeing occurs when you *remove* nodes from the ring.
The ring does not know if they are up or down. E.g. you have a ring of 3 nodes, and add a
keyspace with RF 3. Then for whatever reason 2 nodes are removed from the ring. When bootstrapping
a node into this ring it will fail because it detects the cluster does not have enough *endpoints*
(different to up nodes) to support the keyspace. 

One thing I want to double check is that the node doing the bootstrap considers it's self
when calculating the number of end points. Some of the things you and Patrik said about bootstrapping
node 3 into a ring of 3 with rf=3 made me want to check. 

IMHO bootstrapping is the process of pulling data the *new* node is responsible for from other
nodes in the ring. This is different to joining the ring. 

Hope that helps.

On 9/03/2011, at 10:54 AM, mcasandra wrote:

> I think this not the right functionality and it is really odd that you can't
> successfully bring it online without turning off bootstrap BUT you can bring
> it online by turning auto_boostrap off and then run nodetool repair
> afterwards.
> Also, if that's the case then when one node goes down, say out of 3 one node
> goes down then should cassandra eject other nodes as well?? Why should
> cassandra exit on startup? That node could at least serve other keyspaces
> and alleviate load while returning errors to the client for those keyspaces
> where RF cannot be met. 
> As noted in my other post regarding similar issue that I reported, I have
> also seen wierd behaviour where I had 2 nodes down out of 3 and I was able
> to bring up one of the nodes except the remaining one. You would think that
> no nodes will come up but I really think there is a problem here.
> --
> View this message in context:
> Sent from the mailing list archive at

View raw message