cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sankalp kohli (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9183) Failure detector should detect and ignore local pauses
Date Wed, 20 May 2015 20:58:02 GMT


sankalp kohli commented on CASSANDRA-9183:

[~brandon.williams] In the interpret method, I can see that you would not mark 2 endpoints
as down due to the way you are using the "wasPaused" variable. Why is that? 
The third endpoint will be marked as down after the pause.  

> Failure detector should detect and ignore local pauses
> ------------------------------------------------------
>                 Key: CASSANDRA-9183
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.2.0 beta 1, 2.1.6
>         Attachments: 9183-v2.txt, 9183.txt
> A local node can be paused for many reasons such as GC, and if the pause is long enough
when it recovers it will think all the other nodes are dead until it gossips, causing UAE
to be thrown to clients trying to use it as a coordinator.  Instead, the FD can track the
current time, and if the gap there becomes too large, skip marking the nodes down (reset the
FD data perhaps)

This message was sent by Atlassian JIRA

View raw message