Some time ago I've wrote a proof-of-concept implementation for a highly available distribute message queue based on ZooKeeper. You can find the code on Github: https://github.com/andreisavu/zookeeper-mq I've also performed some fault injection testing and the code does a good job at handling node failures. I hope you will find this useful. It should be easy to write something similar in Java if you want. -- Andrei Savu / andreisavu.ro On Mon, Mar 7, 2011 at 12:41 PM, Sabyasachi Ruj wrote: > Hi, > > I am planning to write an application which will have Worker processes > distributed across multiple machines. One of them will be Leader which > will assign tasks to other processes. Designing the Leader elelection > process is quite simple: each process tries to create a ephemeral node > in the same path. Whoever is successful, becomes the leader. I got > this technique from Mahadev Konar's talk here: > http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/ > . But could not find any discussion about task/job distribution using > ZooKeeper. > > I'll elaborate a little on the environment setup: > > Suppose there are 10 worker maschines, each one runs a process, one of > them becomes the Leader. Tasks are submitted in the queue (may be > managed in MySQL), the Leader takes them and assigns to a worker. The > worker processes gets notified whenever a tasks is submitted by the > leader. > > I think these jobs can be coordinated as child znodes for each worker node like: > > /server/worker1/job1 > /server/worker1/job2 > /server/worker1/job3 > /server/worker2/job1 > /server/worker2/job2 > > To get an alert whenever a job is submitted, the workers can watch on > its corresponding znode. But again I've a doubt here. Is there a > chance in this case, that some jobs might get lost/delayed? > > Step 1: Worker is watching on its zonde for jobs. > Step 2: Server submits a job X. > Step 3: Worker gets notified. > Step 4: Before setting the watch again, server submits another job Y. > Step 5: Now the worker sets the watch. > > So, my questions are: > > 1. How to design the process of distributing the tasks evenly? > 2. Was ZooKeeper designed for this use case? > 3. In the example above, is there a chance that the worker may miss > notification for job Y? > > -- > Sabyasachi >