cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sumsion <Sumsio...@familysearch.org>
Subject Re: gossipinfo contains two nodes dead for more than two years
Date Wed, 28 Aug 2019 21:39:37 GMT
I've seen something similar if there is a node still referring to that IP as a seed node in
cassandra.yaml.  You might want to check that.
________________________________
From: Vincent Rischmann <vincent@rischmann.fr>
Sent: Wednesday, August 28, 2019 10:10 AM
To: user@cassandra.apache.org <user@cassandra.apache.org>
Subject: Re: gossipinfo contains two nodes dead for more than two years

Yep, they're not visible in both ring and status.

On Wed, Aug 28, 2019, at 17:08, Jeff Jirsa wrote:
Based on what you've posted, I assume the instances are not visible in `nodetool ring` or
`nodetool status`, and the only reason you know they're still in gossipinfo is you see them
in the logs? If that's the case, then yes, I would do `nodetool assassinate`.



On Wed, Aug 28, 2019 at 7:33 AM Vincent Rischmann <vincent@rischmann.fr<mailto:vincent@rischmann.fr>>
wrote:

Hi,

while replacing a node in a cluster I saw this log:

    2019-08-27 16:35:31,439 Gossiper.java:995 - InetAddress /10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A&e=>
is now DOWN

it caught my attention because that ip address doesn't exist anymore in the cluster and it
hasn't for a long time.

After some reading I ran `nodetool gossipinfo` and I saw these entries which are nodes that
don't exist anymore:

    /10.15.53.27<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.15.53.27&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=hzMFMit5iJlSQrtHTmcoepAiFg-t5CGPnjZQeLduo4A&e=>
      generation:1503480618
      heartbeat:26970
      STATUS:2:hibernate,true
      LOAD:26810:6.17363354147E11
      SCHEMA:101:d21b1e47-f226-3417-8de7-5802518ae824
      DC:10:DC1
      RACK:12:RAC1
      RELEASE_VERSION:6:2.1.18
      INTERNAL_IP:8:10.15.53.27
      RPC_ADDRESS:5:10.15.53.27
      SEVERITY:26972:0.0
      NET_VERSION:3:8
      HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
      TOKENS:1:<hidden>
    /10.5.1.16<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.5.1.16&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=rb7LNU-vuRE1cs3Nzup8H-mjsgVNkaE5SgQYtCM5amA&e=>
      generation:1503636779
      heartbeat:324
      STATUS:2:hibernate,true
      LOAD:204:2.601990697532E12
      SCHEMA:14:d21b1e47-f226-3417-8de7-5802518ae824
      DC:10:DC1
      RACK:12:RAC1
      RELEASE_VERSION:6:2.1.18
      INTERNAL_IP:8:10.5.1.16
      RPC_ADDRESS:5:10.5.1.16
      SEVERITY:326:0.0
      NET_VERSION:3:8
      HOST_ID:4:2488fccc-108a-4a9d-ad43-5e8b8b6ee17b
      TOKENS:1:<hidden>

the generations are:

- Wed, 23 Aug 2017 09:30:18 GMT
- Fri, 25 Aug 2017 04:52:59 GMT

I don't remember what we did at that time but it looks like we botched something while joining
a node or something.

After reading https://thelastpickle.com/blog/2018/09/18/assassinate.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2018_09_18_assassinate.html&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=W9UI0GQq10yOhf5LxSjoITGT9p69DtOfFK_UGgl4kx8&m=kS56sxxKgO_TMMOvPmtTFIEW8M-c-pm5Dh-dJVf7_pA&s=nq2MU2bQmBvRn14-ALr4SpzhmqeeYYGXCOye1zjnQJw&e=>
I'm thinking of doing the following:

* nodetool removenode 10.15.53.27
* if it doesn't work for some reason: nodetool assassinate 10.15.53.27

Since those nodes have been long dead and don't appear in system.peer I don't anticipate any
problems but I'd like some confirmation that this can't break my cluster.

Thanks !

Mime
View raw message