incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Matarazzo <marco.matara...@hexkeep.com>
Subject Re: Recovering from a faulty cassandra node
Date Tue, 19 Mar 2013 16:04:08 GMT
I'm still missing something, please excuse me.

Let's say, for example, that I have a 4 node cluster with a replica factor of 2. One node
goes down and I have to reinstall it. In the meantime the cluster still works and data is
read and written.

After a while the node is reinstalled, same IP is used, and cassandra configuration is restored
(but data are not). Wouldn't be enough to just start cassandra, and maybe run a repair ?

On which node, and at which point of this scenario should I use decommission and/or removenode
?

Il giorno 19/mar/2013, alle ore 16:56, Alain RODRIGUEZ <arodrime@gmail.com> ha scritto:

> Decommission doesn't need a RF > 1 since it is run from the node being removed from
the cluster. It gives the data to the next node in the ring, that will be responsible for
it before leaving.
> Removenode (At least if it is like the old removetoken) use replicas to dispatch the
data to their new nodes. So yes, this one needs a RF > 1, but has the advantage that it
can be used having a node totally unreachable.
> 
> But anyway having a RF = 1 is pretty bad since you have a SPOF (Single Point Of Failure)
which can be avoided by C* with a higher RF.
> 
> Alain
> 
> 
> 2013/3/19 Marco Matarazzo <marco.matarazzo@hexkeep.com>
> Is nodetool removenode / decommission actually needed having a RF > 1 ? What does
it do, exactly ?
> 
> Il giorno 19/mar/2013, alle ore 16:45, Alain RODRIGUEZ <arodrime@gmail.com> ha
scritto:
> 
> > In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable,
else I guess nodetool decommission remains the good way to remove a node. (http://www.datastax.com/docs/1.2/references/nodetool)
> >
> > When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the
configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that
node again. It should join the ring as a new node.
> >
> > Good luck.
> >
> >
> > 2013/3/19 Hiller, Dean <Dean.Hiller@nrel.gov>
> > Since you "cleared" out that node, it IS the replacement node.
> >
> > Dean
> >
> > From: Jabbar Azam <ajazam@gmail.com<mailto:ajazam@gmail.com>>
> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> > Date: Tuesday, March 19, 2013 9:29 AM
> > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> > Subject: Re: Recovering from a faulty cassandra node
> >
> > Hello Dean.
> >
> > I'm using vnodes so can't specify a token. In addition I can't follow the replace
node docs because I don't have a replacement node.
> >
> >
> > On 19 March 2013 15:25, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
> > I have not done this as of yet but from all that I have read your best option is
to follow the replace node documentation which I belive you need to
> >
> >
> >  1.  Have the token be the same BUT add 1 to it so it doesn't think it's the same
computer
> >  2.  Have the bootstrap option set or something so streaming takes affect.
> >
> > I would however test that all out in QA to make sure it works and if you have QUOROM
reads/writes a good part of that test would be to take node X down after your node Y is back
in the cluster to make sure reads/writes are working on the node you fixed…..you just need
to make sure node X shares one of the token ranges of node Y AND your writes/reads are in
that token range.
> >
> > Dean
> >
> > From: Jabbar Azam <ajazam@gmail.com<mailto:ajazam@gmail.com><mailto:ajazam@gmail.com<mailto:ajazam@gmail.com>>>
> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> > Date: Tuesday, March 19, 2013 8:51 AM
> > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> > Subject: Recovering from a faulty cassandra node
> >
> > Hello,
> >
> > I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over
a week to insert lots of data into the cluster. During the end of the process one of the nodes
had a hardware fault.
> >
> > I have fixed the hardware fault but the filing system on that node is corrupt so
I'll have to reinstall the OS and cassandra.
> >
> > I can think of two ways of reintegrating the host into the cluster
> >
> > 1) shrink the cluster to three nodes and add the node into the cluster
> >
> > 2) Add the node into the cluster without shrinking
> >
> > I'm not sure of the best approach to take and I'm not sure how to achieve each step.
> >
> > Can anybody help?
> >
> >
> > --
> > Thanks
> >
> >  Jabbar Azam
> >
> >
> >
> > --
> > Thanks
> >
> > Jabbar Azam
> >
> 
> --
> Marco Matarazzo
> == Hex Keep ==
> 
> W: http://www.hexkeep.com
> M: +39 347 8798528
> E: marco.matarazzo@hexkeep.com
> 
> "You can learn more about a man
>   in one hour of play
>   than in one year of conversation.” - Plato
> 
> 
> 
> 
> 

--
Marco Matarazzo
== Hex Keep ==

W: http://www.hexkeep.com
M: +39 347 8798528
E: marco.matarazzo@hexkeep.com

"You can learn more about a man
  in one hour of play
  than in one year of conversation.” - Plato





Mime
View raw message