cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piavlo <lolitus...@gmail.com>
Subject Re: strange gossip messages after node reboot with different ip
Date Tue, 01 May 2012 06:42:04 GMT
  Hi Aaron,

Below is the reposted gossipinfo on a fresh 6 node cluster for which I 
stop/started all nodes one by one ~12hours ago,

As you can see gossipinfo reports on 11 nodes, but what bothers me is 
why it reports STATUS:NORMAL for all of them
and decides that non existing node is UP just o announce it's dead a few 
seconds later?
Also the number of reported nodes in each gossipinfo invocation can 
differ - probably accordingly to nodes detected UP & DOWN all the time.
Besides this the cluster seems to be working properly , so I understand 
I can ignore these UPs & DOWNs - but it feels wrong
and I'm interested to understand what exactly makes the not existing 
nodes appear wrongly as UP again.

The nodes which are real have 
SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5 - others are unexisting nodes.

# nodetool -h localhost gossipinfo
/10.240.243.92
   STATUS:NORMAL,28356863910078205288614550619314017621
   DC:eu-west
   RACK:1b
   LOAD:7.8498918E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.55.93.28
   STATUS:NORMAL,85070591730234615865843651857942052864
   RACK:1a
   DC:eu-west
   LOAD:7.7833298E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.56.46.131
   STATUS:NORMAL,56713727820156410577229101238628035242
   DC:eu-west
   RACK:1c
   LOAD:8.2077425E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.239.95.94
   STATUS:NORMAL,113427455640312821154458202477256070485
   RACK:1b
   LOAD:6.5744801E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.49.37.125
   STATUS:NORMAL,85070591730234615865843651857942052864
   RACK:1a
   LOAD:6.6832903E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.239.95.182
   STATUS:NORMAL,28356863910078205288614550619314017621
   RACK:1b
   LOAD:6.775791E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
dsc1a.internal/10.226.74.97
   STATUS:NORMAL,0
   DC:eu-west
   RACK:1a
   LOAD:7.8533797E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.248.81.46
   STATUS:NORMAL,56713727820156410577229101238628035242
   RACK:1c
   LOAD:6.8754218E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.228.37.155
   STATUS:NORMAL,113427455640312821154458202477256070485
   DC:eu-west
   RACK:1b
   LOAD:7.8066429E7
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9
/10.248.83.29
   STATUS:NORMAL,141784319550391026443072753096570088106
   RACK:1c
   LOAD:6.5235089E7
   DC:eu-west
   RPC_ADDRESS:0.0.0.0
   SCHEMA:d429bd90-91bc-11e1-0000-6faa44d3dcff
   RELEASE_VERSION:1.0.9
/10.250.217.83
   STATUS:NORMAL,141784319550391026443072753096570088106
   LOAD:7.598275E7
   DC:eu-west
   RACK:1c
   RPC_ADDRESS:0.0.0.0
   SCHEMA:adbf19a0-934e-11e1-0000-8b8140c3b9f5
   RELEASE_VERSION:1.0.9

Thanks
Alex

On 05/01/2012 04:16 AM, aaron morton wrote:
> Gossip information about a node can stay in the cluster for up to 3 
> days. How long has this been going on for ?
>
> I'm unsure if this is expected behaviour. But it sounds like Gossip is 
> kicking out the phantom node correctly.
>
> Can you use nodetool gossipinfo on the nodes to capture some artefacts 
> while it is still running?
>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
> I think this is gossip bouncing information about the node around. 
> Once it has been observed as dead for 3 days it should be purged.
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
> Hints are stored against the token, not the end point address. When a 
> node comes up the process is reversed and the end point is mapped to 
> it's (new) token.
>
>>  And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
> Not that I am aware of. It's designed to handle IP addresses changing. 
> AFAIK the log messages are not indicative of a fault. Instead they 
> indicate something odd happening with Gossip that is being correctly 
> handled.
>
> Hope that helps.
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/05/2012, at 3:09 AM, Piavlo wrote:
>
>>
>> Hi,
>>
>> We have a cassandra cluster in ec2.
>> If i stop a node and start it - as a result the node ip changes. The 
>> node is recognised as NEW node and is declared as replacing the 
>> previous node with same token.(But this is the same node of course)
>>
>> In this specific case the node ip before stop/start was 10.63.14.214 
>> and new ip is 10.54.81.14.
>> And even that the cluster and node  seems to be working fine for more 
>> than a day after the stop/start of this node, I see the following 
>> loop of messages ~ once every minute.
>>
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> INFO [GossipStage:1] 2012-04-30 14:18:57,089 Gossiper.java (line 804) 
>> InetAddress /10.63.14.214 is now UP
>> INFO [GossipStage:1] 2012-04-30 14:18:57,090 StorageService.java 
>> (line 1017) Nodes /10.63.14.214 and cassa1a.internal/10.54.81.14 have 
>> the same token 0.  Ignoring /10.63.14.214
>> INFO [GossipTasks:1] 2012-04-30 14:19:11,834 Gossiper.java (line 818) 
>> InetAddress /10.63.14.214 is now dead.
>> INFO [GossipTasks:1] 2012-04-30 14:19:27,896 Gossiper.java (line 632) 
>> FatClient /10.63.14.214 has been silent for 30000ms, removing from gossip
>> INFO [GossipStage:1] 2012-04-30 14:20:30,803 Gossiper.java (line 838) 
>> Node /10.63.14.214 is now part of the cluster
>> ...
>>
>> How come the old ip 10.63.14.214 still popup as UP and then declared 
>> as DEAD again, an so on and on?
>> I know since this is ec2 other node with same ip can come UP, but 
>> i've verified and there is no such node and it certainly does not run 
>> cassandra :)
>> I stop/started another node and observe similar behaviour.
>> This is version 1.0.8
>>
>> Another question, if node is recognised as new (due to ip change) but 
>> with same token - will other nodes stream the hinted handoffs to it?
>> And is there way to tell cassandra also use names and if ip changes 
>> but node name is the same and resolves to the new ip then the cluster 
>> treat it as old node?
>>
>> Thanks
>> Alex
>


Mime
View raw message