cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Forsberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9183) Failure detector should detect and ignore local pauses
Date Tue, 21 Apr 2015 13:13:59 GMT


Erik Forsberg commented on CASSANDRA-9183:

As CASSANDRA-9218 was closed as a duplicate of this, I would like to add that I'm seeing a
behaviour where the node that had a pause never recovers, you need to restart parts of your
cluster to make it recover, as the gossip is waiting for an echo reply that never comes back,
as network packets were dropped during the pause.

> Failure detector should detect and ignore local pauses
> ------------------------------------------------------
>                 Key: CASSANDRA-9183
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 3.0
>         Attachments: 9183-v2.txt, 9183.txt
> A local node can be paused for many reasons such as GC, and if the pause is long enough
when it recovers it will think all the other nodes are dead until it gossips, causing UAE
to be thrown to clients trying to use it as a coordinator.  Instead, the FD can track the
current time, and if the gap there becomes too large, skip marking the nodes down (reset the
FD data perhaps)

This message was sent by Atlassian JIRA

View raw message