Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DDAA97605 for ; Sun, 21 Aug 2011 22:16:10 +0000 (UTC) Received: (qmail 9318 invoked by uid 500); 21 Aug 2011 22:16:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 9257 invoked by uid 500); 21 Aug 2011 22:16:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 9249 invoked by uid 99); 21 Aug 2011 22:16:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Aug 2011 22:16:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Aug 2011 22:16:01 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 1150837A065 for ; Sun, 21 Aug 2011 15:14:38 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=MlihGM4/LOB/52s+rLjzXkQmYDLpq6j4xHUrXcZvv6X TfsiPDbs8sGXRaG11r5rXXNBEgD0TTxSTnsy5dZgcP54If67QLwpO6faFkRFZzZu +x2QeZsc/qi8yNJZjqjKQYSorZN3Cx8JrbB2NkGy/DfBZYG5cGGX76aoe7JzlGGU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=YlGeKlSDrfpdTRvup8QKD35K51s=; b=GFSe6cp9JN rBjfmacf+aKav3bxI/6f14Rqc1uEGLCgCt/SnnzYwhrEjCNJEpL6jeShoey4mT7z Q0SAOqT/sun12+W9E9D2n3B+aq3uBTjJAhAsR2c3Wf0oq1rL0iDJS7Hlbg/ryrak GpOfCsnN3raXvG7f8zqUwNl1xEg7hyFjs= Received: from 202-126-206-198.vectorcommunications.net.nz (unknown [202.126.206.198]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id BC0F737A05B for ; Sun, 21 Aug 2011 15:14:37 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1244.3) Subject: Re: Completely removing a node from the cluster From: aaron morton In-Reply-To: <376CEC01195C894CB9F8A3C274029A96AF256B8B@FISH-EX2K10-01.azaleos.net> Date: Mon, 22 Aug 2011 10:15:35 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <504F4C34-7C5C-43D5-8821-18758D389F16@thelastpickle.com> References: <376CEC01195C894CB9F8A3C274029A96AF25338F@FISH-EX2K10-01.azaleos.net> <593A1215-C630-4D6B-B905-4779389A782B@thelastpickle.com> <376CEC01195C894CB9F8A3C274029A96AF256B8B@FISH-EX2K10-01.azaleos.net> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1244.3) I see the mistake I made about ring, gets the endpoint list from the = same place but uses the token's to drive the whole process.=20 I'm guessing here, don't have time to check all the code. But there is a = 3 day timeout in the gossip system. Not sure if it applies in this case.=20= Anyone know ? Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: > Both .2 and .3 list the same from the mbean that Unreachable is empty = collection, and Live node lists all 3 nodes still: > 192.168.20.2 > 192.168.20.3 > 192.168.20.1 >=20 > The removetoken was done a few days ago, and I believe the remove was = done from .2 >=20 > Here is what ring outlook looks like, not sure why I get that token on = the empty first line either: > Address DC Rack Status State Load = Owns Token > = 85070591730234615865843651857942052864 > 192.168.20.2 datacenter1 rack1 Up Normal 79.53 GB = 50.00% 0 > 192.168.20.3 datacenter1 rack1 Up Normal 42.63 GB = 50.00% 85070591730234615865843651857942052864 >=20 > Yes, both nodes show the same thing when doing a describe cluster, = that .1 is unreachable. >=20 >=20 > -----Original Message----- > From: aaron morton [mailto:aaron@thelastpickle.com]=20 > Sent: Sunday, August 21, 2011 4:23 AM > To: user@cassandra.apache.org > Subject: Re: Completely removing a node from the cluster >=20 > Unreachable nodes in either did not respond to the message or were = known to be down and were not sent a message.=20 > The way the node lists are obtained for the ring command and describe = cluster are the same. So it's a bit odd.=20 >=20 > Can you connect to JMX and have a look at the o.a.c.db.StorageService = MBean ? What do the LiveNode and UnrechableNodes attributes say ?=20 >=20 > Also how long ago did you remove the token and on which machine? Do = both 20.2 and 20.3 think 20.1 is still around ?=20 >=20 > Cheers >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: >=20 >> I'm on 0.8.4 >>=20 >> I have removed a dead node from the cluster using nodetool = removetoken command, and moved one of the remaining nodes to rebalance = the tokens. Everything looks fine when I run nodetool ring now, as it = only lists the remaining 2 nodes and they both look fine, owning 50% of = the tokens. >>=20 >> However, I can still see it being considered as part of the cluster = from the Cassandra-cli (192.168.20.1 being the removed node) and I'm = worried that the cluster is still queuing up hints for the node, or any = other issues it may cause: >>=20 >> Cluster Information: >> Snitch: org.apache.cassandra.locator.SimpleSnitch >> Partitioner: org.apache.cassandra.dht.RandomPartitioner >> Schema versions: >> dcc8f680-caa4-11e0-0000-553d4dced3ff: [192.168.20.2, = 192.168.20.3] >> UNREACHABLE: [192.168.20.1] >>=20 >>=20 >> Do I need to do something else to completely remove this node? >>=20 >> Thanks, >> Bryce >=20