cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Can I cancel a decommissioning procedure??
Date Wed, 05 Jun 2019 08:50:37 GMT
Sure, you're welcome, glad to hear it worked! =)

Thanks for letting us know/reporting this back here, it might matter for
other people as well.

C*heers!
Alain


Le mer. 5 juin 2019 à 07:45, William R <triole@protonmail.com> a écrit :

> Eventually after the reboot the decommission was cancelled. Thanks a lot
> for the info!
>
> Cheers
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
> > the issue is that the rest nodes in the cluster marked it as DL
> (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up!
>
> The last information other nodes had is that this node is leaving, and
> down, that's expected in this situation. When the node comes back online,
> it should come back UN and 'quickly' other nodes should ACK it.
>
> During decommission, the node itself is responsible for streaming its data
> over. Streams were stopped as the node went down, Cassandra won't remove
> the node unless data was streamed properly (or if you force  the node out).
> I don't think that there is a decommission 'resume', and even les that it
> is enabled by default.
> Thus when the node comes back, the only possible option I see is a
> 'regular' start for that node and other to acknowledge that the node is up
> and not leaving anymore.
>
> The only consequence I expect (other than the node missing the latest
> data) is that other nodes might have some extra data due to the
> decommission attempts. If that's needed (streaming for long or no TTL), you
> can consider using 'nodetool cleanup -j 2' on all the other nodes than the
> one that went down, to remove the extra data (and free space).
>
>  I did restart, still waiting to come up (normally takes ~ 30 minutes)
>>
>
> 30 minutes to start the nodes sounds like a long time to me, but well,
> that's another topic.
>
> C*heers
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 4 juin 2019 à 18:31, William R <triole@protonmail.com> a écrit :
>
>> Hi Alain,
>>
>> Thank you for your comforting reply :)  I did restart, still waiting to
>> come up (normally takes ~ 30 minutes) , the issue is that the rest nodes in
>> the cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed..
>> Lets see once is up!
>>
>>
>> Sent with ProtonMail <https://protonmail.com> Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ <arodrime@gmail.com>
>> wrote:
>>
>> Hello William,
>>
>> At the moment we keep the node down before figure out a way to cancel
>>> that.
>>>
>>
>> Off the top of my head, a restart of the node is the way to go to cancel
>> a decommission.
>> I think you did the right thing and your safety measure is also the fix
>> here :).
>>
>> Did you try to bring it up again?
>>
>> If it's really critical, you can probably test that quickly with ccm (
>> https://github.com/riptano/ccm), tlp-cluster (
>> https://github.com/thelastpickle/tlp-cluster) or simply with any
>> existing dev/test environment if you have any available with some data.
>>
>> Good luck with that, a PEBKAC issue are the worst. You can do a lot of
>> damage, could always have avoided it and it makes you feel terrible.
>> It doesn't sound that bad in your case though, I've seen (and done)
>> worse  ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are
>> unpredictable :).
>> Nonetheless, and to go back to something more serious, there are ways to
>> limit the amount and possible scope of those, such as good practices,
>> testing and automations.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - alain@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> Le mar. 4 juin 2019 à 17:55, William R <triole@protonmail.com.invalid> a
>> écrit :
>>
>>> Hi,
>>>
>>> Was an accidental decommissioning of a node and we really need to to
>>> cancel it.. is there any way? At the moment we keep the node down before
>>> figure out a way to cancel that.
>>>
>>> Thanks
>>>
>>
>>
>

Mime
View raw message