zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Bailey <jbtec...@gmail.com>
Subject Barrier Tutorial Possible Deadlock
Date Sun, 08 May 2011 20:14:35 GMT
I'm just learning ZK and want to make sure I am understanding everything 
correctly.  In the Barrier Tutorial, it seems like there is a race 
condition that could cause a possible deadlock when the executed code 
within the barrier is short and one client has higher latency than another.

For example, say the number of process nodes required to start 
computation is 2.

1) Process 1 creates node, and enables children watcher.
2) Process 2 creates node and node creation fires watcher notification 
to process 1.
3) Process 2 retrieves children with list size 2, executes code, and 
deletes node.
4) Process 1 receives watcher notification from creation of node 2, and 
requests children, whose size is now 1.
5) Process 1 indefinitely waits for process 2's node to be created, 
while process 2 indefinitely waits for process 1's node to be deleted.

Are my assumptions of ZK's behavior correct?  If so, I can't think of 
any solutions that are both efficient and correct.  The only correct 
solutions I can think of either requires watches on all children, or 
sending children nodes and their data to processes multiple times based 
on a parent data watch event.

To any developers out there, how difficult would it be to customize the 
ZK code to both send data along with notifications and to have permanent 
watchers?  This would allow notifications for all changes to be 
guaranteed, sacrificing latency.  Having both options would be analogous 
to having both TCP and UDP protocols available for use depending on the 
particular requirements of the application.


View raw message