From user-return-64081-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Jun 20 12:59:10 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id DA7CD180670 for ; Thu, 20 Jun 2019 14:59:09 +0200 (CEST) Received: (qmail 34817 invoked by uid 500); 20 Jun 2019 12:59:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34807 invoked by uid 99); 20 Jun 2019 12:59:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2019 12:59:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AE7FB180C3E for ; Thu, 20 Jun 2019 12:59:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.801 X-Spam-Level: * X-Spam-Status: No, score=1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id SfaErHjCi3qY for ; Thu, 20 Jun 2019 12:59:03 +0000 (UTC) Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CD7695F1B9 for ; Thu, 20 Jun 2019 12:59:02 +0000 (UTC) Received: by mail-vs1-f48.google.com with SMTP id u124so1458712vsu.2 for ; Thu, 20 Jun 2019 05:59:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Argn/PB4v4lbO1ge0RD4/ylgE355mr3SU770ViRXD/A=; b=YaUUUtda6CdAch2Xn3L8M8XV58U/qqXgoAlCgW6i2UUYUAbSUVzDRS6F2VW58WN/D4 HYi/LVXr/HjS6LXIxzzDKzGqzFRDW/BAohas0fyiwcXO7fdz6c5bP2xVAgjH63tsqdYE y6rawWF9Ma5dYoygh3GGWlkPhWhLCVD7/6Qg/HyXi+ke7hjLCt1C3lXb/axKLN0P8o5H LLtAKQ3FWj0PWuDO1VRXyGx4K4msd5QfiOFGogwVKmN4cie8L/c/a+hPzpnYxo9lSKTa f9OPk66muWgf8KcszTQzS+npxcYNabJC8jws7qWKheKmhiVqPXOoN8Nidc94L3Ao/Tao irLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Argn/PB4v4lbO1ge0RD4/ylgE355mr3SU770ViRXD/A=; b=WM6A4PXBd5A+95DMIVJENRhAwgAq/CERXaIBA/L4Rlf7OZqJbviH8PhzU1/coNy6aZ qQ1EGMkQDuiCw/BdcTt6WlWRCmbLG4LOp6L6rKFCWBMIKzGXVU15wd0HO6Q56+eWYC7R gOi/9E8WZ5DecB3edwvZ/cS5CYJLsvLfOpHfvjL7csntHvbghu7S9gEKzJVkx/ElBi8P bgknbTHb/0PkWVi1gTyqKGC7Nn8WCcUNFWQrQBoWtP/4eWpxAuxNtie5c2rl8fmYD5ju MIcD1HGzdgg+SVqWKJX2iH/zPl5GZziQM0bZVtJ8G/dekSDtxSt4kPWLxmSLt84DZ2yR FczQ== X-Gm-Message-State: APjAAAXSYAVHgT32a63VpsJJQ4Jkm+Qq6UJROPW1HACB5sx5oe2se0lr yKSVSOROebuieoyzNm3+gpWeIB1eIH7ixjmyQ1gDyht+vcY= X-Google-Smtp-Source: APXvYqxBO5bGTBszDHU0BeNGKenV4PZhhmpMrLQhm5pA8qTatOtzHKsr/o780I+zZ5x2X0Req0/27P2+yPwRBW8PyVo= X-Received: by 2002:a67:ac46:: with SMTP id n6mr8943763vsh.113.1561035535681; Thu, 20 Jun 2019 05:58:55 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alain RODRIGUEZ Date: Thu, 20 Jun 2019 14:58:44 +0200 Message-ID: Subject: Re: Decommissioned nodes are in UNREACHABLE state To: "user cassandra.apache.org" Content-Type: multipart/alternative; boundary="000000000000960cc1058bc0e86f" --000000000000960cc1058bc0e86f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, Assuming you nodes are out for a while and you don't need the data after 60 days (or cannot get it anyway), the way to fix this is to force the node out. I would try, in this order: - nodetool removenode HOSTID - nodetool removenode force These 2 might really not work at this stage, but if they do, this is a clean way to do so. Now, to really push the ghost nodes to the exit door, it often takes: - nodetool assassinate I think Cassandra 2.1 doesn't have it, you might have to use JMX, more details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html): echo "run -b org.apache.cassandra.net:type=3DGossiper > unsafeAssassinateEndpoint $IP_TO_ASSASSINATE" | java -jar > jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199 This should really remove the traces of the node, without any safety, no streaming, no checks, just get rid of it. So to use with a lot of care and understanding. In your situation I guess this is what will work. As a last attempt, you could try removing traces of the dead node(s) from all the live nodes 'system.peers' table. This table is local to each node, so the DELETE command is to be send to all the nodes (that have a trace of an old node). - cqlsh -e "DELETE $IP_TO_REMOVE FROM system.peers;" but I see the node IPs in UNREACHABLE state in "nodetool describecluster" > output. I believe they appear only for 72 hours, but in my case I see > those nodes in UNREACHABLE for ever (more than 60 days) To be more accurate, you should never see leaving node as unreachable I believe (not even for 72 hours). The 72 hours is the time Gossip should continue referencing the old nodes. Typically when you remove the ghost nodes, they should no longer appear in 'nodetool describe' cluster at all, I would say immediately, but still appear in 'nodetool gossipinfo' with a 'left' or 'remove' status. I hope that helps and that one of the above will do the trick (I'd bet on the assassinate :)). Also sorry it took us a while to answer you this relatively common question :); C*heers, ----------------------- Alain Rodriguez - alain@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le jeu. 13 juin 2019 =C3=A0 00:55, Jai Bheemsen Rao Dhanwada < jaibheemsen@gmail.com> a =C3=A9crit : > Hello, > > I have a Cassandra cluster running with 2.1.16 version of Cassandra, wher= e > I have decommissioned few nodes from the cluster using "nodetool > decommission", but I see the node IPs in UNREACHABLE state in "nodetool > describecluster" output. I believe they appear only for 72 hours, but in > my case I see those nodes in UNREACHABLE for ever (more than 60 days). > Rolling restart of the nodes didn't remove them. any idea what could be > causing here? > > Note: I don't see them in the nodetool status output. > --000000000000960cc1058bc0e86f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

Assum= ing you nodes are out for a while and you don't need the data after 60 = days (or cannot get it anyway), the way to fix this is to force the node ou= t. I would try, in this order:

- nodetool removeno= de HOSTID
- nodetool removenode force

Th= ese 2 might really not work at this stage, but if they do, this is a clean = way to do so.
Now, to really push the ghost nodes to the exit doo= r, it often takes:

- nodetool assassinate=C2=A0

I think Cassandra 2.1 doesn't have it, you might= have to use JMX, more details here:=C2=A0https://thelastpickle.com/blog/2018/0= 9/18/assassinate.html):

echo "= ;run -b org.apache.cassandra.net:type=3DGossiper unsafeAssassinateEndpoint = $IP_TO_ASSASSINATE" =C2=A0| java -jar jmxterm-1.0.0-uber.jar -l $IP_OF= _LIVE_NODE:7199

This should really remove t= he traces of the node, without any safety, no streaming, no checks, just ge= t rid of it. So to use with a lot of care and understanding. In your situat= ion I guess this is what will work.

As a last atte= mpt, you could try removing traces of the dead node(s) from all the live no= des 'system.peers' table. This table is local to each node, so the = DELETE command is to be send to all the nodes (that have a trace of an old = node).

- cqlsh -e "DELETE=C2=A0=C2=A0$IP_TO_R= EMOVE=C2=A0FROM system.peers;"

= but I see the node IPs in UNREACHABLE state in "nodetool describeclust= er" output. I believe =C2=A0they appear only for 72 hours, but in my c= ase I see those nodes in UNREACHABLE for ever (more than 60 days)

To be more accurate, =C2=A0you should never see leav= ing node as unreachable I believe (not even for 72 hours). The 72 hours is = the time Gossip should continue referencing the old nodes. Typically when y= ou remove the ghost nodes, they should no longer appear in 'nodetool de= scribe' cluster at all, =C2=A0I would say immediately, but still appear= in 'nodetool gossipinfo' with a 'left' or 'remove'= status.

I hope that helps and that one of the abo= ve will do the trick (I'd bet on the assassinate :)). Also sorry it too= k us a while to answer you this relatively common question :);
C*heers,
-----------------------
Alai= n Rodriguez - alain@thelastpickl= e.com
France / Spain

The Last Pickle= - Apache Cassandra Consulting
=

--000000000000960cc1058bc0e86f--