curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CURATOR-330) Need a way to handle connection lost while entering double barrier
Date Thu, 26 May 2016 18:51:13 GMT

     [ https://issues.apache.org/jira/browse/CURATOR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Wang updated CURATOR-330:
-------------------------------
    Description: 
Here is the problem I’m meeting:

Assuming 3 node ensemble, my application has 3 clients and each one runs on same zk node (Client
1, 2 and 3). They use double barrier for coordination. 

Client 1 is entering the barrier and waiting for the other 2. Now the other 2 nodes are down
and then the ensemble gets crashed and the client 1 gets LostConnectionException from enter().
That’s expected.

After while the other 2 nodes come back,  all clients need to retry operation and reenter
the same barrier (It might become more complex if creating a new barrier). Here is the problem:

If the session for client 1 is still alive, Client 1 calling enter method will get NodeExistException
as the ephemeral node corresponding to that session is not deleted yet. 

I wonder in this case what should I do from application side? Or I’m thinking can we add
a mechanism to reenter the barrier but skip creating child node for this client if that exists?

Thanks,
Simon


  was:
Here is the problem I’m meeting:

Assuming 3 node ensemble, my application has 3 clients and each one runs on same zk node (Client
1, 2 and 3). They use double barrier for coordination. 

Client 1 is entering the barrier and waiting for the other 2. Now the other 2 nodes are down
and then the ensemble gets crashed and the client 1 gets LostConnectionException from enter().
That’s expected.

After while the other 2 nodes come back,  all clients need to retry operation and reenter
the same barrier (It might become more complex if creating a new barrier). Here is the problem:

If the session for client 1 is still alive, Client 1 calling enter method will get NodeExistException
as the ephemeral node corresponding to that session is not deleted yet. 

I wonder in this case what should I do from application side? Or I’m thinking can we add
a mechanism to reenter the barrier but skip creating child node for this client if that exists?

I would like to open a Jira for this if required. 

Thanks,
Simon



> Need a way to handle connection lost while entering double barrier
> ------------------------------------------------------------------
>
>                 Key: CURATOR-330
>                 URL: https://issues.apache.org/jira/browse/CURATOR-330
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.10.0
>            Reporter: Simon Wang
>
> Here is the problem I’m meeting:
> Assuming 3 node ensemble, my application has 3 clients and each one runs on same zk node
(Client 1, 2 and 3). They use double barrier for coordination. 
> Client 1 is entering the barrier and waiting for the other 2. Now the other 2 nodes are
down and then the ensemble gets crashed and the client 1 gets LostConnectionException from
enter(). That’s expected.
> After while the other 2 nodes come back,  all clients need to retry operation and reenter
the same barrier (It might become more complex if creating a new barrier). Here is the problem:
> If the session for client 1 is still alive, Client 1 calling enter method will get NodeExistException
as the ephemeral node corresponding to that session is not deleted yet. 
> I wonder in this case what should I do from application side? Or I’m thinking can we
add a mechanism to reenter the barrier but skip creating child node for this client if that
exists?
> Thanks,
> Simon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message