cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: electricity outage problem
Date Fri, 15 Jan 2016 16:54:38 GMT
Nodes need about 60-90 second delay before it can start accepting
connections as a seed node. Also a seed node needs time to accept a node
starting up, and syncing to other nodes (on 10gbit the max new nodes is
only 1 or 2, on 1gigabit it can handle at least 3-4 new nodes connecting).
In a large cluster (500 nodes) I see this wierd condition where nodetool
status shows overlapping subsets of nodes, and the problem does not go away
after even an hour on a 10 gigabit network).



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Jan 15, 2016 at 9:17 AM, Adil <adil.chabaq@gmail.com> wrote:

> Hi,
> we did full restart of the cluster but nodetool status still giving
> incoerent info from different nodes, some nodes appers UP from a node but
> appers DOWN from another, and in the log as is said still having the
> message "received an invalid gossip generation for peer /x.x.x.x"
> cassandra version is 2.1.2, we want to execute the purge operation as
> explained here
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
> but we don't found the peers folder, should we do it via cql deleting the
> peers content? should we do it for all nodes?
>
> thanks
>
>
> 2016-01-12 17:42 GMT+01:00 Jack Krupansky <jack.krupansky@gmail.com>:
>
>> Sometimes you may have to clear out the saved Gossip state:
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html
>>
>> Note the instruction about bringing up the seed nodes first. Normally
>> seed nodes are only relevant when initially joining a node to a cluster
>> (and then the Gossip state will be persisted locally), but if you clear te
>> persisted Gossip state the seed nodes will again be needed to find the rest
>> of the cluster.
>>
>> I'm not sure whether a power outage is the same as stopping and
>> restarting an instance (AWS) in terms of whether the restarted instance
>> retains its current public IP address.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle <daemeonr@gmail.com>
>> wrote:
>>
>>> This happens when there is insufficient time for nodes coming up to join
>>> a network. It takes a few seconds for a node to come up, e.g. your seed
>>> node. If you tell a node to join a cluster you can get this scenario
>>> because of high network utilization as well. I wait 90 seconds after the
>>> first (i.e. my first seed) node comes up to start the next one. Any nodes
>>> that are seeds need some 60 seconds, so the additional 30 seconds is a
>>> buffer. Additional nodes each wait 60 seconds before joining (although this
>>> is a parallel tree for large clusters).
>>>
>>>
>>>
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *“Life should not be a journey to the grave with the intention of
>>> arriving safely in apretty and well preserved body, but rather to skid in
>>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44)
(0)
>>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Tue, Jan 12, 2016 at 6:56 AM, Adil <adil.chabaq@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> we have two DC with 5 nodes in each cluster, yesterday there was an
>>>> electricity outage causing all nodes down, we restart the clusters but when
>>>> we run nodetool status on DC1 it results that some nodes are DN, the
>>>> strange thing is that running the command from diffrent node in DC1 doesn't
>>>> give the same node in DC as own, we have noticed this message in the log
>>>> "received an invalid gossip generation for peer", does anyone know how to
>>>> resolve this problem? should we purge the gossip?
>>>>
>>>> thanks
>>>>
>>>> Adil
>>>>
>>>
>>>
>>
>

Mime
View raw message