From user-return-25858-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue May 1 01:17:01 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C46A09E5E for ; Tue, 1 May 2012 01:17:01 +0000 (UTC) Received: (qmail 39498 invoked by uid 500); 1 May 2012 01:16:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39479 invoked by uid 500); 1 May 2012 01:16:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39470 invoked by uid 99); 1 May 2012 01:16:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 01:16:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a94.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 01:16:53 +0000 Received: from homiemail-a94.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTP id 3366A38A05B for ; Mon, 30 Apr 2012 18:16:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=LiiPozqpCm EO5UrEK1bfHvVw2exOK+YFztZvXlhY95WVhsbD6u6WzqGFvCJosJRxkQn0e3vQ2f loafDggJBb7+Tf0T9sqGKrBDHRV3oZ0zaXSVa8Sd+egcAzv3Xk859sreJT+fcHva IBG2mw0DRjSCG3AbeWxoSbeY3AbWbtD9s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=m5XNIqdHR+tE0T7K TZcrD5kpsXM=; b=BuDTyPJAT6SRHlQSzEpx2Lz2kWIKKMavE5xv5vRAu/7ylb31 /uIZIpI6S5GvESDytkj0jX5w+NAqb/datjx9sciF4PSsWJ+M1X0QamOYAb4hjpcC oqisof1eWK4KtH4R5SDEmQ4nummF1JN80p1tcNGop2NgNt/H5OgaPcX4YYY= Received: from [192.168.2.189] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTPSA id B596538A059 for ; Mon, 30 Apr 2012 18:16:30 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_8A43A6CF-F2A2-4D94-9C47-272543AA5F3B" Subject: Re: strange gossip messages after node reboot with different ip Date: Tue, 1 May 2012 13:16:29 +1200 In-Reply-To: <4F9EAB24.3090502@gmail.com> To: user@cassandra.apache.org References: <4F9EAB24.3090502@gmail.com> Message-Id: X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_8A43A6CF-F2A2-4D94-9C47-272543AA5F3B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Gossip information about a node can stay in the cluster for up to 3 = days. How long has this been going on for ?=20 I'm unsure if this is expected behaviour. But it sounds like Gossip is = kicking out the phantom node correctly. Can you use nodetool gossipinfo on the nodes to capture some artefacts = while it is still running? > How come the old ip 10.63.14.214 still popup as UP and then declared = as DEAD again, an so on and on? I think this is gossip bouncing information about the node around. Once = it has been observed as dead for 3 days it should be purged. =20 > Another question, if node is recognised as new (due to ip change) but = with same token - will other nodes stream the hinted handoffs to it? Hints are stored against the token, not the end point address. When a = node comes up the process is reversed and the end point is mapped to = it's (new) token. > And is there way to tell cassandra also use names and if ip changes = but node name is the same and resolves to the new ip then the cluster = treat it as old node? >=20 Not that I am aware of. It's designed to handle IP addresses changing. = AFAIK the log messages are not indicative of a fault. Instead they = indicate something odd happening with Gossip that is being correctly = handled.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/05/2012, at 3:09 AM, Piavlo wrote: >=20 > Hi, >=20 > We have a cassandra cluster in ec2. > If i stop a node and start it - as a result the node ip changes. The = node is recognised as NEW node and is declared as replacing the previous = node with same token.(But this is the same node of course) >=20 > In this specific case the node ip before stop/start was 10.63.14.214 = and new ip is 10.54.81.14. > And even that the cluster and node seems to be working fine for more = than a day after the stop/start of this node, I see the following loop = of messages ~ once every minute. >=20 > INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) = Node /10.63.14.214 is now part of the cluster > INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) = InetAddress /10.63.14.214 is now UP > INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java (line = 1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have the same = token 0. Ignoring /10.63.14.214 > INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) = InetAddress /10.63.14.214 is now dead. > INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) = FatClient /10.63.14.214 has been silent for 30000ms, removing from = gossip > INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) = Node /10.63.14.214 is now part of the cluster > ... >=20 > How come the old ip 10.63.14.214 still popup as UP and then declared = as DEAD again, an so on and on? > I know since this is ec2 other node with same ip can come UP, but i've = verified and there is no such node and it certainly does not run = cassandra :) > I stop/started another node and observe similar behaviour. > This is version 1.0.8 >=20 > Another question, if node is recognised as new (due to ip change) but = with same token - will other nodes stream the hinted handoffs to it? > And is there way to tell cassandra also use names and if ip changes = but node name is the same and resolves to the new ip then the cluster = treat it as old node? >=20 > Thanks > Alex --Apple-Mail=_8A43A6CF-F2A2-4D94-9C47-272543AA5F3B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
How = come the old ip 10.63.14.214 still popup as UP and then declared as DEAD = again, an so on and on?
I think this is gossip = bouncing information about the node around. Once it has been observed as = dead for 3 days it should be = purged.
  
Another question, if node is recognised as new (due = to ip change) but with same token - will other nodes stream the hinted = handoffs to it?
Hints are stored against the = token, not the end point address. When a node comes up the process is = reversed and the end point is mapped to it's (new) = token.

 And is there way to tell cassandra also use = names and if ip changes but node name is the same and resolves to the = new ip then the cluster treat it as old = node?

Not that I am aware of. It's = designed to handle IP addresses changing. AFAIK the log messages are not = indicative of a fault. Instead they indicate something odd happening = with Gossip that is being correctly = handled. 

Hope that = helps. 
http://www.thelastpickle.com

On 1/05/2012, at 3:09 AM, Piavlo wrote:


= Hi,

We have a cassandra cluster in ec2.
If i stop a node and = start it - as a result the node ip changes. The node is recognised as = NEW node and is declared as replacing the previous node with same = token.(But this is the same node of course)

In this specific case = the node ip before stop/start was 10.63.14.214 and new ip is = 10.54.81.14.
And even that the cluster and node  seems to be = working fine for more than a day after the stop/start of this node, I = see the following loop of messages ~ once every minute.

INFO = [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) Node = /10.63.14.214 is now part of the cluster
INFO [GossipStage:1] = 2012-04-30 14:18:57,089 Gossiper.java (line 804) InetAddress = /10.63.14.214 is now UP
INFO [GossipStage:1] 2012-04-30 14:18:57,090 = StorageService.java (line 1017) Nodes /10.63.14.214 and = cassa1a.internal/10.54.81.14 have the same token 0.  Ignoring = /10.63.14.214
INFO [GossipTasks:1] 2012-04-30 14:19:11,834 = Gossiper.java (line 818) InetAddress /10.63.14.214 is now dead.
INFO = [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) = FatClient /10.63.14.214 has been silent for 30000ms, removing from = gossip
INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java = (line 838) Node /10.63.14.214 is now part of the = cluster
...

How come the old ip 10.63.14.214 still popup as UP = and then declared as DEAD again, an so on and on?
I know since this = is ec2 other node with same ip can come UP, but i've verified and there = is no such node and it certainly does not run cassandra :)
I = stop/started another node and observe similar behaviour.
This is = version 1.0.8

Another question, if node is recognised as new (due = to ip change) but with same token - will other nodes stream the hinted = handoffs to it?
And is there way to tell cassandra also use names and = if ip changes but node name is the same and resolves to the new ip then = the cluster treat it as old = node?

Thanks
Alex

= --Apple-Mail=_8A43A6CF-F2A2-4D94-9C47-272543AA5F3B--