Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 281AC765A for ; Tue, 23 Aug 2011 07:28:31 +0000 (UTC) Received: (qmail 2196 invoked by uid 500); 23 Aug 2011 07:28:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 1515 invoked by uid 500); 23 Aug 2011 07:28:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99731 invoked by uid 99); 23 Aug 2011 07:27:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 07:27:30 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a58.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 07:27:24 +0000 Received: from homiemail-a58.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTP id 723477D8060 for ; Tue, 23 Aug 2011 00:27:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=gA1VUd1h4XPBbuvikp64jmjWgKkkYrRyt5xH4ucBqPE tABbBDp+F0xZDE3MHqo6QAaLgdryu5NWWN4YYVzU1fa01ulP5stQPfpbldcAO9gZ +yiWRbjGgZlU8NZkeDtEqnG9ewCPIwa1ABOhFYulfMye7tGD3GgQKtHu2kJ6+uCc = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=fXkufShriQx/k6dAO6/ltm3EjTM=; b=NJMRna94QH p2Le6nOGf7L/cSmD0OrYVHvw1Ym3G7XL2aI/ymj+jERFs9UZmsmkx59axe4DEUcm kS0YrKojaglOVtYndthC0PD3d3AAdjDC6FwaEltf93PbSdOikgHpI0JWxpqEcavS t6E3d3h+rczAA44G02qgtD9FS0CRYxz68= Received: from [172.16.1.4] (222-152-100-67.jetstream.xtra.co.nz [222.152.100.67]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTPSA id 9B83F7D805B for ; Tue, 23 Aug 2011 00:27:02 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1244.3) Subject: Re: Completely removing a node from the cluster From: aaron morton In-Reply-To: <376CEC01195C894CB9F8A3C274029A96AF258687@FISH-EX2K10-01.azaleos.net> Date: Tue, 23 Aug 2011 19:26:58 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <81FAAD69-6DA8-41A9-86E0-F5B66D55FD34@thelastpickle.com> References: <376CEC01195C894CB9F8A3C274029A96AF25338F@FISH-EX2K10-01.azaleos.net> <593A1215-C630-4D6B-B905-4779389A782B@thelastpickle.com> <376CEC01195C894CB9F8A3C274029A96AF256B8B@FISH-EX2K10-01.azaleos.net> <504F4C34-7C5C-43D5-8821-18758D389F16@thelastpickle.com> <376CEC01195C894CB9F8A3C274029A96AF256DAD@FISH-EX2K10-01.azaleos.net> <376CEC01195C894CB9F8A3C274029A96AF258687@FISH-EX2K10-01.azaleos.net> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1244.3) I'm running low on ideas for this one. Anyone else ?=20 If the phantom node is not listed in the ring, other nodes should not be = storing hints for it. You can see what nodes they are storing hints for = via JConsole.=20 You can try a rolling restart passing the JVM opt = -Dcassandra.load_ring_state=3Dfalse However if the phantom node is been = passed around in the gossip state it will probably just come back again.=20= Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote: > Could this ghost node be causing my hints column family to grow to = this size? I also crash after about 24 hours due to commit logs growth = taking up all the drive space. A manual nodetool flush keeps it under = control though. >=20 >=20 > Column Family: HintsColumnFamily > SSTable count: 6 > Space used (live): 666480352 > Space used (total): 666480352 > Number of Keys (estimate): 768 > Memtable Columns Count: 1043 > Memtable Data Size: 461773 > Memtable Switch Count: 3 > Read Count: 38 > Read Latency: 131.289 ms. > Write Count: 582108 > Write Latency: 0.019 ms. > Pending Tasks: 0 > Key cache capacity: 7 > Key cache size: 6 > Key cache hit rate: 0.8333333333333334 > Row cache: disabled > Compacted row minimum size: 2816160 > Compacted row maximum size: 386857368 > Compacted row mean size: 120432714 >=20 > Is there a way for me to manually remove this dead node? >=20 > -----Original Message----- > From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com]=20 > Sent: Sunday, August 21, 2011 9:09 PM > To: user@cassandra.apache.org > Subject: RE: Completely removing a node from the cluster >=20 > It's been at least 4 days now. >=20 > -----Original Message----- > From: aaron morton [mailto:aaron@thelastpickle.com]=20 > Sent: Sunday, August 21, 2011 3:16 PM > To: user@cassandra.apache.org > Subject: Re: Completely removing a node from the cluster >=20 > I see the mistake I made about ring, gets the endpoint list from the = same place but uses the token's to drive the whole process.=20 >=20 > I'm guessing here, don't have time to check all the code. But there is = a 3 day timeout in the gossip system. Not sure if it applies in this = case.=20 >=20 > Anyone know ? >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: >=20 >> Both .2 and .3 list the same from the mbean that Unreachable is empty = collection, and Live node lists all 3 nodes still: >> 192.168.20.2 >> 192.168.20.3 >> 192.168.20.1 >>=20 >> The removetoken was done a few days ago, and I believe the remove was = done from .2 >>=20 >> Here is what ring outlook looks like, not sure why I get that token = on the empty first line either: >> Address DC Rack Status State Load = Owns Token >> = 85070591730234615865843651857942052864 >> 192.168.20.2 datacenter1 rack1 Up Normal 79.53 GB = 50.00% 0 >> 192.168.20.3 datacenter1 rack1 Up Normal 42.63 GB = 50.00% 85070591730234615865843651857942052864 >>=20 >> Yes, both nodes show the same thing when doing a describe cluster, = that .1 is unreachable. >>=20 >>=20 >> -----Original Message----- >> From: aaron morton [mailto:aaron@thelastpickle.com]=20 >> Sent: Sunday, August 21, 2011 4:23 AM >> To: user@cassandra.apache.org >> Subject: Re: Completely removing a node from the cluster >>=20 >> Unreachable nodes in either did not respond to the message or were = known to be down and were not sent a message.=20 >> The way the node lists are obtained for the ring command and describe = cluster are the same. So it's a bit odd.=20 >>=20 >> Can you connect to JMX and have a look at the o.a.c.db.StorageService = MBean ? What do the LiveNode and UnrechableNodes attributes say ?=20 >>=20 >> Also how long ago did you remove the token and on which machine? Do = both 20.2 and 20.3 think 20.1 is still around ?=20 >>=20 >> Cheers >>=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: >>=20 >>> I'm on 0.8.4 >>>=20 >>> I have removed a dead node from the cluster using nodetool = removetoken command, and moved one of the remaining nodes to rebalance = the tokens. Everything looks fine when I run nodetool ring now, as it = only lists the remaining 2 nodes and they both look fine, owning 50% of = the tokens. >>>=20 >>> However, I can still see it being considered as part of the cluster = from the Cassandra-cli (192.168.20.1 being the removed node) and I'm = worried that the cluster is still queuing up hints for the node, or any = other issues it may cause: >>>=20 >>> Cluster Information: >>> Snitch: org.apache.cassandra.locator.SimpleSnitch >>> Partitioner: org.apache.cassandra.dht.RandomPartitioner >>> Schema versions: >>> dcc8f680-caa4-11e0-0000-553d4dced3ff: [192.168.20.2, = 192.168.20.3] >>> UNREACHABLE: [192.168.20.1] >>>=20 >>>=20 >>> Do I need to do something else to completely remove this node? >>>=20 >>> Thanks, >>> Bryce >>=20 >=20