cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex ...@aca-o.com>
Subject Re: Assassinate fails
Date Fri, 16 Aug 2019 07:16:31 GMT
Hello Alain, 

long time  - I had to wait for a quiet week to try this. I finally did,
I thought I'd give you some feedback. 

Short reminder: one of the nodes of my 3.9 cluster died and I replaced
it. But it still appeared in nodetool status, on one node with a "null"
host_id and on another with the same host_id of its replacement.
nodetool assassinate failed and I could not decommission or remove any
other node on the cluster. 

Basically, after backup and preparing another cluster in case anything
went wrong, I did : 

DELETE FROM system.peers WHERE peer = '192.168.1.18'; 

and restarted cassandra on the two nodes still seeing the zombie node. 

After the first restart, the cassandra system.log was filled with: 

java.lang.NullPointerException: null
WARN  [MutationStage-2] 2019-08-15 15:31:44,735
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on
thread Thread[MutationStage-2,5,main]: 

So... I restarted again. The error disappeared. I ran a full repair and
everything seems to be back in order. I could decommission a node
without problem. 

Thanks for your help ! 

Alex 

Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit :

> Alex, 
> 
>> Well, I tried : rolling restart did not work its magic.
> 
> Sorry to hear and for misleading you. May faith into the rolling restart magical power
went down a bit, but I still think it was worth a try :D. 
> 
>> @ Alain : In system.peers I see both the dead node and its replacement with the same
ID :    peer         | host_id
>> --------------+--------------------------------------
>> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 
>> 
>> Is it expected ? 
>> 
>> If I cannot fix this, I think I will add new nodes and remove, one by one, the nodes
that show the dead node in nodetool status.
> 
> Well, no. This is clearly not good or expected I would say. 
> 
> TL;DR - SUGGESTED FIX: 
> What I would try to fix this is the following is removing this row. It *should* be safe
but that's only my opinion and with the condition you remove *only* the 'ghost/dead' nodes.
Any mistake here would probably be costly. Again, be aware you're on a sensitive part when
messing with system tables. Think it twice, check it twice, take a copy of the SSTables/a
snapshot. Then I would go for it and observe changes on one node first. If no harm is done,
continue to the next node. 
> 
> Considering the old node is '192.168.1.18', I would run this on all nodes (maybe after
testing on a node) to make it simple or just run it on nodes that show the ghost node(s):
 
> 
> "DELETE FROM SYSTEM.PEERS WHERE PEER = '192.168.1.18';" 
> 
> Maybe will you need to restart, I think you won't even need it. I have good hope that
this should finally fix your issue with no harm. 
> 
> MORE CONTEXT - IDEA OF THE PROBLEM: 
> This above, is clearly an issue I would say. Most probably the source of your troubles
here. The problem is that I lack understanding. From where I stand, this kind of bugs should
not happen anymore in Cassandra (I did not see anything similar for a while). 
> 
> I would blame: 
> - A corner case scenario (unlikely, system tables are rather solid for a while). Or maybe
are you using an old C* version. It *might* be related to this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122)

> - A really weird operation (A succession of action might have put you in this state,
but hard for me to say what) 
> - KairosDB? I don't know It or what it does. Might it be less reliable than Cassandra
is, and have lead to this issue? Maybe, I have no clue once again. 
> 
> RISK OF THIS OPERATION AND CURRENT SITUATION: 
> Also, I *think* the current situation is relatively 'stable' (maybe just some hints being
stored for nothing, and possibly not being able to add more nodes or change schema?). This
is the kind of situation where 'rushing' a solution without understanding the impacts and
risks can make things to go terribly wrong. Take the time to analyse my suggested fix, maybe
read the ticket above etc. When you're ready, backup the data, prepare well the DELETE command
and observe how 1 node reacts to the fix first. 
> 
> As you can see, I think it's the 'good' fix, but I'm not comfortable with this operation.
And you should not be either :). 
> I would say, arbitrary to share my feeling about this operation, that there is 95% chances
this does not hurt, 90% chances to fix the issue with that, but if something goes wrong, if
we are in the 5% were it does not go well, there is a not negligible probability that you
will destroy your cluster in a very bad way. I guess I try to say be careful, watch your step,
make sure you remove the good line, ensure it works on one node with no harm. 
> I shared my feeling and I would try this fix. But it's ultimately your responsibility
and I won't be behind the machine when you'll fix it. None of us will. 
> 
> Good luck ! :) 
> 
> C*heers, 
> 
> ----------------------- 
> Alain Rodriguez - alain@thelastpickle.com 
> France / Spain 
> 
> The Last Pickle - Apache Cassandra Consulting 
> http://www.thelastpickle.com 
> 
> Le jeu. 4 avr. 2019 à 19:29, Kenneth Brotman <kenbrotman@yahoo.com.invalid> a
écrit : 
> 
>> Alex,
>> 
>> According to this TLP article http://thelastpickle.com/blog/2018/09/18/assassinate.html
:
>> 
>> Note that the LEFT status should stick around for 72 hours to ensure all nodes come
to the consensus that the node has been removed. So please don't rush things if that's the
case. Again, it's only cosmetic.
>> 
>> If a gossip state will not forget a node that was removed from the cluster more than
a week ago:
>> 
>> Login to each node within the Cassandra cluster.
>> Download jmxterm on each node, if nodetool assassinate is not an option.
>> Run nodetool assassinate, or the unsafeAssassinateEndpoint command, multiple times
in quick succession.
>> I typically recommend running the command 3-5 times within 2 seconds.
>> I understand that sometimes the command takes time to return, so the "2 seconds"
suggestion is less of a requirement than it is a mindset.
>> Also, sometimes 3-5 times isn't enough. In such cases, shoot for the moon and try
20 assassination attempts in quick succession.
>> 
>> What we are trying to do is to create a flood of messages requesting all nodes completely
forget there used to be an entry within the gossip state for the given IP address. If each
node can prune its own gossip state and broadcast that to the rest of the nodes, we should
eliminate any race conditions that may exist where at least one node still remembers the given
IP address.
>> 
>> As soon as all nodes come to agreement that they don't remember the deprecated node,
the cosmetic issue will no longer be a concern in any system.logs, nodetool describecluster
commands, nor nodetool gossipinfo output.
>> 
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID] 
>> Sent: Thursday, April 04, 2019 10:40 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>> 
>> Alex,
>> 
>> Did you remove the option JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
after the node started and then restart the node again?
>> 
>> Are you sure there isn't a typo in the file?
>> 
>> Ken
>> 
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID] 
>> Sent: Thursday, April 04, 2019 10:31 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>> 
>> I see; system_auth is a separate keyspace.    
>> 
>> -----Original Message-----
>> From: Jon Haddad [mailto:jon@jonhaddad.com] 
>> Sent: Thursday, April 04, 2019 10:17 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Assassinate fails
>> 
>> No, it can't.  As Alain (and I) have said, since the system keyspace
>> is local strategy, it's not replicated, and thus can't be repaired.
>> 
>> On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
>> <kenbrotman@yahoo.com.invalid> wrote:
>>> 
>>> Right, could be similar issue, same type of fix though.
>>> 
>>> -----Original Message-----
>>> From: Jon Haddad [mailto:jon@jonhaddad.com]
>>> Sent: Thursday, April 04, 2019 9:52 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Assassinate fails
>>> 
>>> System != system_auth.
>>> 
>>> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>>> <kenbrotman@yahoo.com.invalid> wrote:
>>>> 
>>>> From Mastering Cassandra:
>>>> 
>>>> 
>>>> Forcing read repairs at consistency - ALL
>>>> 
>>>> The type of repair isn't really part of the Apache Cassandra repair paradigm
at all. When it was discovered that a read repair will trigger 100% of the time when a query
is run at ALL consistency, this method of repair started to gain popularity in the community.
In some cases, this method of forcing data consistency provided better results than normal,
scheduled repairs.
>>>> 
>>>> Let's assume, for a second, that an application team is having a hard time
logging into a node in a new data center. You try to cqlsh out to these nodes, and notice
that you are also experiencing intermittent failures, leading you to suspect that the system_auth
tables might be missing a replica or two. On one node you do manage to connect successfully
using cqlsh. One quick way to fix consistency on the system_auth tables is to set consistency
to ALL, and run an unbound SELECT on every table, tickling each record:
>>>> 
>>>> use system_auth ;
>>>> consistency ALL;
>>>> consistency level set to ALL.
>>>> 
>>>> SELECT COUNT(*) FROM resource_role_permissons_index ;
>>>> SELECT COUNT(*) FROM role_permissions ;
>>>> SELECT COUNT(*) FROM role_members ;
>>>> SELECT COUNT(*) FROM roles;
>>>> 
>>>> This problem is often seen when logging in with the default cassandra user.
Within cqlsh, there is code that forces the default cassandra user to connect by querying
system_auth at QUORUM consistency. This can be problematic in larger clusters, and is another
reason why you should never use the default cassandra user.
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Jon Haddad [mailto:jon@jonhaddad.com]
>>>> Sent: Thursday, April 04, 2019 9:21 AM
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Assassinate fails
>>>> 
>>>> Ken,
>>>> 
>>>> Alain is right about the system tables.  What you're describing only
>>>> works on non-local tables.  Changing the CL doesn't help with
>>>> keyspaces that use LocalStrategy.  Here's the definition of the system
>>>> keyspace:
>>>> 
>>>> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
>>>> AND durable_writes = true;
>>>> 
>>>> Jon
>>>> 
>>>> On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
>>>> <kenbrotman@yahoo.com.invalid> wrote:
>>>>>
>>>>> The trick below I got from the book Mastering Cassandra.  You have to
set the consistency to ALL for it to work. I thought you guys knew that one.
>>>>>
>>>>>
>>>>>
>>>>> From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
>>>>> Sent: Thursday, April 04, 2019 8:46 AM
>>>>> To: user cassandra.apache.org [1]
>>>>> Subject: Re: Assassinate fails
>>>>>
>>>>>
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>>
>>>>>
>>>>> About previous advices:
>>>>>
>>>>>
>>>>>
>>>>> You might have inconsistent data in your system tables.  Try setting
the consistency level to ALL, then do read query of system tables to force repair.
>>>>>
>>>>>
>>>>>
>>>>> System tables use the 'LocalStrategy', thus I don't think any repair
would happen for the system.* tables. Regardless the consistency you use. It should not harm,
but I really think it won't help.
>>>>>
>>>>>
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org

 

Links:
------
[1] http://cassandra.apache.org
Mime
View raw message