From user-return-14897-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Mar 22 22:24:31 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 52737 invoked from network); 22 Mar 2011 22:24:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Mar 2011 22:24:31 -0000 Received: (qmail 72866 invoked by uid 500); 22 Mar 2011 22:24:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72825 invoked by uid 500); 22 Mar 2011 22:24:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72817 invoked by uid 99); 22 Mar 2011 22:24:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 22:24:29 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 22:24:20 +0000 Received: by iyi12 with SMTP id 12so9258013iyi.31 for ; Tue, 22 Mar 2011 15:23:59 -0700 (PDT) Received: by 10.42.150.6 with SMTP id y6mr9700371icv.485.1300832639203; Tue, 22 Mar 2011 15:23:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.13.74 with HTTP; Tue, 22 Mar 2011 15:23:39 -0700 (PDT) X-Originating-IP: [66.234.239.195] From: =?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= Date: Tue, 22 Mar 2011 18:23:39 -0400 Message-ID: Subject: Ghost node showing up in the ring To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, I've seen some strange occurrence of a deleted node reappearing all of a sudden in the ring, which leads to my question: where is the ring structure maintained (memory with local copies?) and what prompts it to change? I appreciate any thoughts on the events below. I'm running 0.7.4 on 4 EC2 large machines with a replication factor of 3. On Sunday I dropped a node that was misbehaving (drained then decommissioned). Everything was well until a few minutes ago: On 1.2.3.47 (nevermind the temporary key imbalance) ubuntu@YYY:~$ nodetool -h localhost ring 1.2.3.47 Up Normal 17.89 GB 12.48% 0 1.2.3.36 Up Normal 27.72 GB 25.00% 42535295865117307932921825928971026432 1.2.3.193 Up Normal 42.14 GB 50.00% 127605887595351923798765477786913079296 1.2.3.252 Up Normal 36.71 GB 12.52% 148904621249875869977532879268261763219 Then all of a sudden the node that used to sit in the middle shows up (as "Down"). The machine itself was decommissioned over the week-end. It's confirmed that it is not in play. ubuntu@YYY:~$ nodetool -h localhost ring 1.2.3.47 Up Normal 17.93 GB 12.48% 0 1.2.3.36 Up Normal 27.76 GB 25.00% 42535295865117307932921825928971026432 2.3.4.193 Down Normal 12.35 GB 25.00% 85070591730234615865843651857942052864 1.2.3.193 Up Normal 42.24 GB 25.00% 127605887595351923798765477786913079296 1.2.3.252 Up Normal 36.66 GB 12.52% 148904621249875869977532879268261763219 >From logs on each node: 2011-03-22T21:30:17.040407+00:00 Node /2.3.4.193 is now part of the cluster 2011-03-22T21:30:16.956335+00:00 Node /2.3.4.193 is now part of the cluster 2011-03-22T21:30:18.887269+00:00 Node /2.3.4.193 is now part of the cluster 2011-03-22T21:30:18.978861+00:00 Node /2.3.4.193 is now part of the cluster (a node coming back from the dead) On 1.2.3.193, trying to remove the ghost token... ubuntu@XXX:~$ nodetool -h localhost ring 148904621249875869977532879268261763219 1.2.3.47 Up Normal 17.93 GB 12.48% 0 1.2.3.36 Up Normal 27.76 GB 25.00% 42535295865117307932921825928971026432 2.3.4.193 Down Leaving 12.35 GB 25.00% 85070591730234615865843651857942052864 1.2.3.193 Up Normal 52.06 GB 25.00% 127605887595351923798765477786913079296 1.2.3.252 Up Normal 43.11 GB 12.52% 148904621249875869977532879268261763219 ubuntu@XXX:~$ nodetool -h localhost removetoken status RemovalStatus: Removing token (85070591730234615865843651857942052864). Waiting for replication confirmation from [/1.2.3.193]. (wait wait wait) ubuntu@XXX:~$ nodetool -h localhost removetoken force RemovalStatus: Removing token (85070591730234615865843651857942052864). Waiting for replication confirmation from [/1.2.3.193]. (fixed) ubuntu@XXX:~$ nodetool -h localhost ring 1.2.3.47 Up Normal 17.93 GB 12.48% 0 1.2.3.36 Up Normal 27.76 GB 25.00% 42535295865117307932921825928971026432 1.2.3.193 Up Normal 53.73 GB 50.00% 127605887595351923798765477786913079296 1.2.3.252 Up Normal 43.11 GB 12.52% 148904621249875869977532879268261763219 -- Alexis L=C3=AA-Qu=C3=B4c