incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radim Kolar <...@sendmail.cz>
Subject Re: frequent node UP/Down?
Date Sun, 25 Sep 2011 13:57:57 GMT
Dne 25.9.2011 14:31, Radim Kolar napsal(a):
> Dne 25.9.2011 9:29, Philippe napsal(a):
>> I have this happening on 0.8.x It looks to me as this happens when 
>> the node is under heavy load such as unthrottled compactions or a 
>> huge GC.
> i have this problem too. Node down detection must be improved - 
> increased timeouts a bit or make more tries before making decision. If 
> node is under load (especially if there is swap activity), it is often 
> marked unavailable.
Also there needs to be implemented algorithm like it is used in BGP 
routing protocol to prevent route flap. It should guard against cases 
like this:

   INFO [GossipTasks:1] 2011-09-25 14:56:36,544 Gossiper.java (line 695) 
InetAddress /216.17.99.40 is now dead.
  INFO [GossipStage:1] 2011-09-25 14:56:36,641 Gossiper.java (line 681) 
InetAddress /216.17.99.40 is now UP
  INFO [GossipTasks:1] 2011-09-25 14:56:37,823 Gossiper.java (line 695) 
InetAddress /216.17.99.40 is now dead.
  INFO [GossipStage:1] 2011-09-25 14:56:37,971 Gossiper.java (line 681) 
InetAddress /216.17.99.40 is now UP

route flap protection works like - announce 1st state change immediately 
to peer, next change for example after 30 seconds if state is changed in 
less than 30 seconds, if route keeps flaping up/down then increase 
report time to 60 seconds etc.

Mime
View raw message