cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: node failure, and automatic decommission (or removetoken)
Date Mon, 28 Feb 2011 19:50:30 GMT
I thought there was more to it.

The steps for move or removing nodes are outlined on the operations page wiki as you probably
know.

What approach are you considering to rebalancing the token distribution when removing a node?
E.g. If you have 5 nodes and remove 1 the best long term solution is to spread that token
range across the remaining 4. This will result in additional data streaming.

My understanding is that Cassandra is designed for a relatively stable number of nodes in
the cluster. With the assumption the failures are generally transitory. The features to handle
permanent moves and removal are somewhat heavy weight and not designed to be used frequently.

Hope that helps
Aaron
On 1/03/2011, at 2:22 AM, Mimi Aluminium <mimi.aluminium@gmail.com> wrote:

> Aaron,
> Thanks a lot,
> Actually I meant a larger number of nodes than 3 and replication factor of 3.
> We are looking on a system that may shrink due to permanent failures, and then automatically
detects the failure and stream its range to other nodes in the cluster to have again 3 replicas.
> I understnd there is no such script.
> Thanks
> Miriam
> 
> On Mon, Feb 28, 2011 at 11:51 AM, aaron morton <aaron@thelastpickle.com> wrote:
> AFAIK the general assumption is that you will want to repair the node manually, within
the GCGraceSeconds period. If this cannot be done then nodetool decomission and removetoken
are the recommended approach. 
> 
> In your example though, with 3 nodes and an RF of 3 your cluster can sustain a single
node failure and continue to operate at CL Quorum for reads and writes. So there is no immediate
need to move data. 
> 
> Does that help? 
> 
> Aaron
> 
> On 28 Feb 2011, at 07:41, Mimi Aluminium wrote:
> 
>> Hi,
>> I have a question about a tool or a wrapper that perform automatic data move upon
node failure?
>> Assuming I have  3 nodes with a replication factor of 3. In case of one node failure,
does the third replica (that was located before on the failed node ) re-appears on one the
of live nodes? 
>> I am looking for something that is similar to Hinted Handoff but with with a viable
that can be read.
>> I know we can stream manually the data (using nodetool move or decommissions), but
is there something automatic?
>> I also found an open ticket 957 but was not sure this is what I am looking for.
>> Thanks
>> Miriam
>>  
>>  
>>  
>>  
> 

Mime
View raw message