cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Brotman" <kenbrot...@yahoo.com.INVALID>
Subject RE: Assassinate fails
Date Thu, 04 Apr 2019 18:29:14 GMT
Alex,

According to this TLP article http://thelastpickle.com/blog/2018/09/18/assassinate.html :

Note that the LEFT status should stick around for 72 hours to ensure all nodes come to the
consensus that the node has been removed. So please don’t rush things if that’s the case.
Again, it’s only cosmetic.

If a gossip state will not forget a node that was removed from the cluster more than a week
ago:

    Login to each node within the Cassandra cluster.
    Download jmxterm on each node, if nodetool assassinate is not an option.
    Run nodetool assassinate, or the unsafeAssassinateEndpoint command, multiple times in
quick succession.
        I typically recommend running the command 3-5 times within 2 seconds.
        I understand that sometimes the command takes time to return, so the “2 seconds”
suggestion is less of a requirement than it is a mindset.
        Also, sometimes 3-5 times isn’t enough. In such cases, shoot for the moon and try
20 assassination attempts in quick succession.

What we are trying to do is to create a flood of messages requesting all nodes completely
forget there used to be an entry within the gossip state for the given IP address. If each
node can prune its own gossip state and broadcast that to the rest of the nodes, we should
eliminate any race conditions that may exist where at least one node still remembers the given
IP address.

As soon as all nodes come to agreement that they don’t remember the deprecated node, the
cosmetic issue will no longer be a concern in any system.logs, nodetool describecluster commands,
nor nodetool gossipinfo output.





-----Original Message-----
From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID] 
Sent: Thursday, April 04, 2019 10:40 AM
To: user@cassandra.apache.org
Subject: RE: Assassinate fails

Alex,

Did you remove the option JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
after the node started and then restart the node again?

Are you sure there isn't a typo in the file?

Ken


-----Original Message-----
From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID] 
Sent: Thursday, April 04, 2019 10:31 AM
To: user@cassandra.apache.org
Subject: RE: Assassinate fails

I see; system_auth is a separate keyspace.    

-----Original Message-----
From: Jon Haddad [mailto:jon@jonhaddad.com] 
Sent: Thursday, April 04, 2019 10:17 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

No, it can't.  As Alain (and I) have said, since the system keyspace
is local strategy, it's not replicated, and thus can't be repaired.

On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
<kenbrotman@yahoo.com.invalid> wrote:
>
> Right, could be similar issue, same type of fix though.
>
> -----Original Message-----
> From: Jon Haddad [mailto:jon@jonhaddad.com]
> Sent: Thursday, April 04, 2019 9:52 AM
> To: user@cassandra.apache.org
> Subject: Re: Assassinate fails
>
> System != system_auth.
>
> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
> <kenbrotman@yahoo.com.invalid> wrote:
> >
> > From Mastering Cassandra:
> >
> >
> > Forcing read repairs at consistency – ALL
> >
> > The type of repair isn't really part of the Apache Cassandra repair paradigm at
all. When it was discovered that a read repair will trigger 100% of the time when a query
is run at ALL consistency, this method of repair started to gain popularity in the community.
In some cases, this method of forcing data consistency provided better results than normal,
scheduled repairs.
> >
> > Let's assume, for a second, that an application team is having a hard time logging
into a node in a new data center. You try to cqlsh out to these nodes, and notice that you
are also experiencing intermittent failures, leading you to suspect that the system_auth tables
might be missing a replica or two. On one node you do manage to connect successfully using
cqlsh. One quick way to fix consistency on the system_auth tables is to set consistency to
ALL, and run an unbound SELECT on every table, tickling each record:
> >
> > use system_auth ;
> > consistency ALL;
> > consistency level set to ALL.
> >
> > SELECT COUNT(*) FROM resource_role_permissons_index ;
> > SELECT COUNT(*) FROM role_permissions ;
> > SELECT COUNT(*) FROM role_members ;
> > SELECT COUNT(*) FROM roles;
> >
> > This problem is often seen when logging in with the default cassandra user. Within
cqlsh, there is code that forces the default cassandra user to connect by querying system_auth
at QUORUM consistency. This can be problematic in larger clusters, and is another reason why
you should never use the default cassandra user.
> >
> >
> >
> > -----Original Message-----
> > From: Jon Haddad [mailto:jon@jonhaddad.com]
> > Sent: Thursday, April 04, 2019 9:21 AM
> > To: user@cassandra.apache.org
> > Subject: Re: Assassinate fails
> >
> > Ken,
> >
> > Alain is right about the system tables.  What you're describing only
> > works on non-local tables.  Changing the CL doesn't help with
> > keyspaces that use LocalStrategy.  Here's the definition of the system
> > keyspace:
> >
> > CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
> > AND durable_writes = true;
> >
> > Jon
> >
> > On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
> > <kenbrotman@yahoo.com.invalid> wrote:
> > >
> > > The trick below I got from the book Mastering Cassandra.  You have to set the
consistency to ALL for it to work. I thought you guys knew that one.
> > >
> > >
> > >
> > > From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
> > > Sent: Thursday, April 04, 2019 8:46 AM
> > > To: user cassandra.apache.org
> > > Subject: Re: Assassinate fails
> > >
> > >
> > >
> > > Hi Alex,
> > >
> > >
> > >
> > > About previous advices:
> > >
> > >
> > >
> > > You might have inconsistent data in your system tables.  Try setting the consistency
level to ALL, then do read query of system tables to force repair.
> > >
> > >
> > >
> > > System tables use the 'LocalStrategy', thus I don't think any repair would
happen for the system.* tables. Regardless the consistency you use. It should not harm, but
I really think it won't help.
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: user-help@cassandra.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: user-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message