Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EB5ADBD4 for ; Tue, 11 Sep 2012 08:00:23 +0000 (UTC) Received: (qmail 53348 invoked by uid 500); 11 Sep 2012 08:00:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52985 invoked by uid 500); 11 Sep 2012 08:00:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52959 invoked by uid 99); 11 Sep 2012 08:00:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 08:00:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.215.44] (HELO mail-lpp01m010-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 08:00:10 +0000 Received: by lahm15 with SMTP id m15so151697lah.31 for ; Tue, 11 Sep 2012 00:59:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding:x-gm-message-state; bh=W7bZcz729i5mrAphSqDk1Fmwqjms42NO04Ymb8spOY0=; b=n9ucmRP9RO3SkAMS4zqFi0hnwcLXS8H1/hQYlxVdYT4u7gsYegxjsg2Y5h7gY5Wlup 1XoLc5W/skmkI7M+4NHNB0BgrhEcK/98CdpAvnThP8zOZ2L+339sVulI+GHbVGDpYg8O cLaye31YaILW4Z0LwryF68D+LEuB69s1sZcDtEjraVpC9940F9+rZipIGUN/tMMzt5NT IvDfqgM0AQXnEi8M+Yq/gKz0DSwRIi4PRp3bk3eSRFgkzmwO35g9kt7JAimhons0pf7m XVVvUeuFbpOQ17fFt5hJJakeL6zDio7g6gF2i3yf/RHRgZFEDdQ1UhiBYtF7n/I6KzDN Ns5w== Received: by 10.152.110.46 with SMTP id hx14mr14794864lab.21.1347350387517; Tue, 11 Sep 2012 00:59:47 -0700 (PDT) Received: from [192.168.2.92] (81-94-164-42.customer.itmastaren.net. [81.94.164.42]) by mx.google.com with ESMTPS id h8sm4131026lbi.13.2012.09.11.00.59.45 (version=SSLv3 cipher=OTHER); Tue, 11 Sep 2012 00:59:46 -0700 (PDT) Message-ID: <504EEF3E.7080806@sitevision.se> Date: Tue, 11 Sep 2012 09:58:54 +0200 From: Fredrik User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Removed node, jumps back into the cluster Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQk8Q4cMXCtkCzznwZNp2kVnFQi3IfVhLd5gCkfZ26bLv1Jh0mGKCUyNFYRaM6u/RLo329gE X-Virus-Checked: Checked by ClamAV on apache.org I've tested a scenario where I wanted to reuse a removed node in a new cluster with same IP, maybe not very common but anyway, found some strange behaviour in Gossiper. Here is what I think/see happening: - Cassandra 1.1. Three node cluster A, B and C. - Shutdown node C and remove token for node C. - Everything looks ok in logs, reporting that node C is removed etc.. - Node A and B still sends Gossip digest about the removed node, but I guess that's ok since they know about it (Gossiper.endpointStateMap). - Node C has status removed when checking in JMX console. - Checked in LocationInfo that Ring only contains token/IP for node A and B. - Removed system/data tables for C. - Changed seed on C to point to itself. - Startup node C, node C only gossips itself and node A and B doesn't recognize that node C is running, which is correct. - Restart e.g. node A. Now node A will loose all gossip information (Gossiper.endpointStateMap) about node C. Node A will request information from LocationInfo and ask node B about endpoint states. Node A will receive information from node B about node C, this will trigger Gossiper.handleMajorStateChange and node C will be first marked as unreachable because it's in dead state (removed), node A will try to Gossip (unreachable endpoints) to node C, which will reply that it's up and node C becomes incorporated into the "old" cluster again. Is this a a bug or is it a requirement that if you take a node out of the cluster you must change IP on the removed node if you want to use it in another cluster? Please enlight me. Regards /Fredrik