curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Simon" <>
Subject RE: Connection lost handling when entering double barrier
Date Tue, 24 May 2016 20:41:20 GMT
I also opened an jira CURATOR-330 on this.

From: Wang, Simon
Sent: Monday, May 23, 2016 5:14 PM
Subject: Connection lost handling when entering double barrier

Here is the problem I'm meeting:

Assuming 3 node ensemble, my application has 3 clients and each one runs on same zk node (Client
1, 2 and 3). They use double barrier for coordination.

Client 1 is entering the barrier and waiting for the other 2. Now the other 2 nodes are down
and then the ensemble gets crashed and the client 1 gets LostConnectionException from enter().
That's expected.

After while the other 2 nodes come back,  all clients need to retry operation and reenter
the same barrier (It might become more complex if creating a new barrier). Here is the problem:

If the session for client 1 is still alive, Client 1 calling enter method will get NodeExistException
as the ephemeral node corresponding to that session is not deleted yet.

I wonder in this case what should I do from application side? Or I'm thinking can we add a
mechanism to reenter the barrier but skip creating child node for this client if that exists?

I would like to open a Jira for this if required.


View raw message