Hi,
I'm just learning ZK and want to make sure I am understanding everything
correctly. In the Barrier Tutorial, it seems like there is a race
condition that could cause a possible deadlock when the executed code
within the barrier is short and one client has higher latency than another.
For example, say the number of process nodes required to start
computation is 2.
1) Process 1 creates node, and enables children watcher.
2) Process 2 creates node and node creation fires watcher notification
to process 1.
3) Process 2 retrieves children with list size 2, executes code, and
deletes node.
4) Process 1 receives watcher notification from creation of node 2, and
requests children, whose size is now 1.
5) Process 1 indefinitely waits for process 2's node to be created,
while process 2 indefinitely waits for process 1's node to be deleted.
Are my assumptions of ZK's behavior correct? If so, I can't think of
any solutions that are both efficient and correct. The only correct
solutions I can think of either requires watches on all children, or
sending children nodes and their data to processes multiple times based
on a parent data watch event.
To any developers out there, how difficult would it be to customize the
ZK code to both send data along with notifications and to have permanent
watchers? This would allow notifications for all changes to be
guaranteed, sacrificing latency. Having both options would be analogous
to having both TCP and UDP protocols available for use depending on the
particular requirements of the application.
Thanks,
Justin
|