helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Swaroop Jagadish (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-26) Better support for handling network partition and process freeze
Date Fri, 25 Jan 2013 02:17:13 GMT

    [ https://issues.apache.org/jira/browse/HELIX-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562282#comment-13562282
] 

Swaroop Jagadish commented on HELIX-26:
---------------------------------------

There are two distinct cases to consider
1) Process freeze
2) Network partition

Borrowing ideas heavily from https://ramcloud.stanford.edu/wiki/display/ramcloud/Distributed+Leases,
we can have the following strategy

1) Process freeze
Objective: Don't be active if coordinator thinks you are dead as soon as possible

a) When disconnection from zk has been processed by the helix client thread after unfreezing,
it needs to assume that the global view has diverged from the local view and send a "reset"
so that the participant can put itself in a "limbo" state till it hears from the coordinator/zk
and after a timeout enters "zombie" state.

b) Before the disconnection event is processed,  we need to guard against not being active
when local view has diverged from global view. The participant can ping another random peer
participant or the coordinator periodically (every 10ms?). Pinging a random peer participant
scales better than every participant pinging the coordinator every 10ms. In response to a
ping, if the participant learns that it is a "zombie", it puts itself in a "limbo" state till
it hears from the coordinator/zk and after a timeout enters "zombie" state.

2) Network partition
Assume coordinator and zk are in the same partition
 Objective: All nodes who cannot reach coordinator/zk need to enter zombie state as quickly
as possible

 a) If a ping response to a peer participant yields no response, the participant puts itself
in a "limbo" state till it hears from the coordinator/zk and after a timeout enters "zombie"
state (same as 1b)

 b) If the response to a ping request is "I'm in limbo or I'm a zombie", the participant puts
itself in a "limbo" state till it hears from the coordinator/zk and after a timeout enters
"zombie" state. The participant can optionally wait for a response before entering "limbo"
in order to avoid overly aggressive spread of "limbo" infection - its important to ensure
coordinator response time is much lesser than ping interval. Hence keeping coordinator load
light is important
This scheme ensures the disconnected partition quickly converges to a "zombie" state  
 


                
> Better support for handling network partition and process freeze
> ----------------------------------------------------------------
>
>                 Key: HELIX-26
>                 URL: https://issues.apache.org/jira/browse/HELIX-26
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: kishore gopalakrishna
>
> Handling network partition is tricky in distributed systems. Zookeeper allows us to solve
this upto some degree with the use of heart beat. But this is not sufficient in large scale
systems with many nodes. One of the problems is that once the client detects disconnect which
happens on the client side, the options are
> 1. Put your self in a pause state until you reconnect.
> 2. Continue what ever you are doing until notified of session expiry.
> Unfortunately 1 is too agressive and 2 is too passive. Since Helix comes with the centralized
controller, its possible to have a more middle ground solution where once the participant
receives a disconnect event, it can check with co-ordinator(s)/peers to check if it can continue
operating.
> The challenge here for the node to detect if it belongs to the same partition as of the
co-ordinator or not. So its goal is to reach the controller, if it cannot reach the controller
it has to disable/fence itself.
> As of now Helix simply provides the state if its disconnected from the cluster and user
can either chose 1) or 2).
> This JIRA aims to investigate better ways to enhance network partition detection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message