cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Assassinate fails
Date Thu, 29 Aug 2019 16:01:33 GMT
Hello Alex,

long time  - I had to wait for a quiet week to try this. I finally did, I
> thought I'd give you some feedback.


Thanks for taking the time to share this, I guess it might be useful to
some other people around to know the end of the story ;-).

Glad this worked for you,

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le ven. 16 août 2019 à 08:16, Alex <ml@aca-o.com> a écrit :

> Hello Alain,
>
> long time  - I had to wait for a quiet week to try this. I finally did, I
> thought I'd give you some feedback.
>
> Short reminder: one of the nodes of my 3.9 cluster died and I replaced it.
> But it still appeared in nodetool status, on one node with a "null" host_id
> and on another with the same host_id of its replacement. nodetool
> assassinate failed and I could not decommission or remove any other node on
> the cluster.
>
> Basically, after backup and preparing another cluster in case anything
> went wrong, I did :
>
> DELETE FROM system.peers WHERE peer = '192.168.1.18';
>
> and restarted cassandra on the two nodes still seeing the zombie node.
>
> After the first restart, the cassandra system.log was filled with:
>
> java.lang.NullPointerException: null
> WARN  [MutationStage-2] 2019-08-15 15:31:44,735
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[MutationStage-2,5,main]:
>
> So... I restarted again. The error disappeared. I ran a full repair and
> everything seems to be back in order. I could decommission a node without
> problem.
>
> Thanks for your help !
>
> Alex
>
>
>
>
> Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit :
>
> Alex,
>
>
>> Well, I tried : rolling restart did not work its magic.
>
>
> Sorry to hear and for misleading you. May faith into the rolling restart
> magical power went down a bit, but I still think it was worth a try :D.
>
>> @ Alain : In system.peers I see both the dead node and its replacement
>> with the same ID :
>>    peer         | host_id
>>   --------------+--------------------------------------
>>    192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>>    192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>>
>> Is it expected ?
>>
>> If I cannot fix this, I think I will add new nodes and remove, one by
>> one, the nodes that show the dead node in nodetool status.
>>
> Well, no. This is clearly not good or expected I would say.
>
> *tl;dr - Suggested fix:*
> What I would try to fix this is the following is removing this row. It
> *should* be safe but that's only my opinion and with the condition you
> remove *only* the 'ghost/dead' nodes. Any mistake here would probably be
> costly. Again, be aware you're on a sensitive part when messing with system
> tables. Think it twice, check it twice, take a copy of the SSTables/a
> snapshot. Then I would go for it and observe changes on one node first. If
> no harm is done, continue to the next node.
>
> Considering the old node is '192.168.1.18', I would run this on all nodes
> (maybe after testing on a node) to make it simple or just run it on nodes
> that show the ghost node(s):
>
> *"DELETE FROM system.peers WHERE peer = '192.168.1.18';"*
>
> Maybe will you need to restart, I think you won't even need it. I have
> good hope that this should finally fix your issue with no harm.
>
> *More context - Idea of the problem:*
> This above, is clearly an issue I would say. Most probably the source of
> your troubles here. The problem is that I lack understanding. From where I
> stand, this kind of bugs should not happen anymore in Cassandra (I did not
> see anything similar for a while).
>
> I would blame:
> - A corner case scenario (unlikely, system tables are rather solid for a
> while). Or maybe are you using an old C* version. It *might* be related to
> this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122)
> - A really weird operation (A succession of action might have put you in
> this state, but hard for me to say what)
> - KairosDB? I don't know It or what it does. Might it be less reliable
> than Cassandra is, and have lead to this issue? Maybe, I have no clue once
> again.
>
> *Risk of this operation and current situation:*
> Also, I *think* the current situation is relatively 'stable' (maybe just
> some hints being stored for nothing, and possibly not being able to add
> more nodes or change schema?). This is the kind of situation where
> 'rushing' a solution without understanding the impacts and risks can make
> things to go terribly wrong. Take the time to analyse my suggested fix,
> maybe read the ticket above etc. When you're ready, backup the data,
> prepare well the DELETE command and observe how 1 node reacts to the fix
> first.
>
> As you can see, I think it's the 'good' fix, but I'm not comfortable with
> this operation. And you should not be either :).
> I would say, arbitrary to share my feeling about this operation, that
> there is 95% chances this does not hurt, 90% chances to fix the issue with
> that, but if something goes wrong, if we are in the 5% were it does not go
> well, there is a not negligible probability that you will destroy your
> cluster in a very bad way. I guess I try to say be careful, watch your
> step, make sure you remove the good line, ensure it works on one node with
> no harm.
> I shared my feeling and I would try this fix. But it's ultimately
> your responsibility and I won't be behind the machine when you'll fix it.
> None of us will.
>
> Good luck ! :)
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> Le jeu. 4 avr. 2019 à 19:29, Kenneth Brotman <kenbrotman@yahoo.com.invalid>
> a écrit :
>
>> Alex,
>>
>> According to this TLP article
>> http://thelastpickle.com/blog/2018/09/18/assassinate.html :
>>
>> Note that the LEFT status should stick around for 72 hours to ensure all
>> nodes come to the consensus that the node has been removed. So please don't
>> rush things if that's the case. Again, it's only cosmetic.
>>
>> If a gossip state will not forget a node that was removed from the
>> cluster more than a week ago:
>>
>>     Login to each node within the Cassandra cluster.
>>     Download jmxterm on each node, if nodetool assassinate is not an
>> option.
>>     Run nodetool assassinate, or the unsafeAssassinateEndpoint command,
>> multiple times in quick succession.
>>         I typically recommend running the command 3-5 times within 2
>> seconds.
>>         I understand that sometimes the command takes time to return, so
>> the "2 seconds" suggestion is less of a requirement than it is a mindset.
>>         Also, sometimes 3-5 times isn't enough. In such cases, shoot for
>> the moon and try 20 assassination attempts in quick succession.
>>
>> What we are trying to do is to create a flood of messages requesting all
>> nodes completely forget there used to be an entry within the gossip state
>> for the given IP address. If each node can prune its own gossip state and
>> broadcast that to the rest of the nodes, we should eliminate any race
>> conditions that may exist where at least one node still remembers the given
>> IP address.
>>
>> As soon as all nodes come to agreement that they don't remember the
>> deprecated node, the cosmetic issue will no longer be a concern in any
>> system.logs, nodetool describecluster commands, nor nodetool gossipinfo
>> output.
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
>> Sent: Thursday, April 04, 2019 10:40 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>>
>> Alex,
>>
>> Did you remove the option JVM_OPTS="$JVM_OPTS
>> -Dcassandra.replace_address=address_of_dead_node after the node started and
>> then restart the node again?
>>
>> Are you sure there isn't a typo in the file?
>>
>> Ken
>>
>>
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
>> Sent: Thursday, April 04, 2019 10:31 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>>
>> I see; system_auth is a separate keyspace.
>>
>> -----Original Message-----
>> From: Jon Haddad [mailto:jon@jonhaddad.com]
>> Sent: Thursday, April 04, 2019 10:17 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Assassinate fails
>>
>> No, it can't.  As Alain (and I) have said, since the system keyspace
>> is local strategy, it's not replicated, and thus can't be repaired.
>>
>> On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
>> <kenbrotman@yahoo.com.invalid> wrote:
>> >
>> > Right, could be similar issue, same type of fix though.
>> >
>> > -----Original Message-----
>> > From: Jon Haddad [mailto:jon@jonhaddad.com]
>> > Sent: Thursday, April 04, 2019 9:52 AM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Assassinate fails
>> >
>> > System != system_auth.
>> >
>> > On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>> > <kenbrotman@yahoo.com.invalid> wrote:
>> > >
>> > > From Mastering Cassandra:
>> > >
>> > >
>> > > Forcing read repairs at consistency – ALL
>> > >
>> > > The type of repair isn't really part of the Apache Cassandra repair
>> paradigm at all. When it was discovered that a read repair will trigger
>> 100% of the time when a query is run at ALL consistency, this method of
>> repair started to gain popularity in the community. In some cases, this
>> method of forcing data consistency provided better results than normal,
>> scheduled repairs.
>> > >
>> > > Let's assume, for a second, that an application team is having a hard
>> time logging into a node in a new data center. You try to cqlsh out to
>> these nodes, and notice that you are also experiencing intermittent
>> failures, leading you to suspect that the system_auth tables might be
>> missing a replica or two. On one node you do manage to connect successfully
>> using cqlsh. One quick way to fix consistency on the system_auth tables is
>> to set consistency to ALL, and run an unbound SELECT on every table,
>> tickling each record:
>> > >
>> > > use system_auth ;
>> > > consistency ALL;
>> > > consistency level set to ALL.
>> > >
>> > > SELECT COUNT(*) FROM resource_role_permissons_index ;
>> > > SELECT COUNT(*) FROM role_permissions ;
>> > > SELECT COUNT(*) FROM role_members ;
>> > > SELECT COUNT(*) FROM roles;
>> > >
>> > > This problem is often seen when logging in with the default cassandra
>> user. Within cqlsh, there is code that forces the default cassandra user to
>> connect by querying system_auth at QUORUM consistency. This can be
>> problematic in larger clusters, and is another reason why you should never
>> use the default cassandra user.
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Jon Haddad [mailto:jon@jonhaddad.com]
>> > > Sent: Thursday, April 04, 2019 9:21 AM
>> > > To: user@cassandra.apache.org
>> > > Subject: Re: Assassinate fails
>> > >
>> > > Ken,
>> > >
>> > > Alain is right about the system tables.  What you're describing only
>> > > works on non-local tables.  Changing the CL doesn't help with
>> > > keyspaces that use LocalStrategy.  Here's the definition of the system
>> > > keyspace:
>> > >
>> > > CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
>> > > AND durable_writes = true;
>> > >
>> > > Jon
>> > >
>> > > On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
>> > > <kenbrotman@yahoo.com.invalid> wrote:
>> > > >
>> > > > The trick below I got from the book Mastering Cassandra.  You have
>> to set the consistency to ALL for it to work. I thought you guys knew that
>> one.
>> > > >
>> > > >
>> > > >
>> > > > From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
>> > > > Sent: Thursday, April 04, 2019 8:46 AM
>> > > > To: user cassandra.apache.org
>> > > > Subject: Re: Assassinate fails
>> > > >
>> > > >
>> > > >
>> > > > Hi Alex,
>> > > >
>> > > >
>> > > >
>> > > > About previous advices:
>> > > >
>> > > >
>> > > >
>> > > > You might have inconsistent data in your system tables.  Try
>> setting the consistency level to ALL, then do read query of system tables
>> to force repair.
>> > > >
>> > > >
>> > > >
>> > > > System tables use the 'LocalStrategy', thus I don't think any
>> repair would happen for the system.* tables. Regardless the consistency you
>> use. It should not harm, but I really think it won't help.
>> > > >
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> > > For additional commands, e-mail: user-help@cassandra.apache.org
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> > > For additional commands, e-mail: user-help@cassandra.apache.org
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: user-help@cassandra.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: user-help@cassandra.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>
>

Mime
View raw message