Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of Camille.Fournier@gs.com
 designates 204.4.187.100 as permitted sender)
From: "Fournier, Camille F. [Tech]" <Camille.Fournier@gs.com>
To: "'user@zookeeper.apache.org'" <user@zookeeper.apache.org>
Date: Mon, 7 Mar 2011 10:22:23 -0500
Subject: RE: Task/Job distribution using ZooKeeper
Thread-Topic: Task/Job distribution using ZooKeeper
Thread-Index: AcvctGR/nLkPeDonT2SYA1b3p1f+awAJgwZg
Message-ID: 
 <69D3016305F9084FBD2C4A0DF189BD5C16B9AAAB5B@GSCMAMP02EX.firmwide.corp.gs.com>
References: <AANLkTimiBhwUyDSv_C3yLfLmbcWVzxd=3Qv6R-At_BmG@mail.gmail.com>
In-Reply-To: <AANLkTimiBhwUyDSv_C3yLfLmbcWVzxd=3Qv6R-At_BmG@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Have you checked out the distributed queue recipe? It is what I have used t=
o implement a solution to a similar problem.
http://hadoop.apache.org/zookeeper/docs/r3.3.2/recipes.html

Are the jobs worker-specific, or can all workers handle all jobs? The distr=
ibuted queue protocol is very nice and simple. If you have a list of tasks =
and the workers are all able to handle all tasks, they can just pick tasks =
off as they become available and you don't have to worry too much about loa=
d balancing. Otherwise you can use the same recipe to do a queue per worker=
. Either way I think it will answer some of your questions about how to wat=
ch and not miss tasks.

C

-----Original Message-----
From: Sabyasachi Ruj [mailto:ruj.sabya@gmail.com]=20
Sent: Monday, March 07, 2011 5:42 AM
To: user@zookeeper.apache.org
Subject: Task/Job distribution using ZooKeeper

Hi,

I am planning to write an application which will have Worker processes
distributed across multiple=A0machines. One of them will be Leader which
will assign tasks to other processes. Designing the Leader elelection
process is quite simple: each process tries to create a ephemeral node
in the same path. Whoever is successful, becomes the leader. I got
this technique from Mahadev Konar's talk here:
http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/
. But could not find any discussion about task/job distribution using
ZooKeeper.

I'll elaborate a little on the environment setup:

Suppose there are 10 worker maschines, each one runs a process, one of
them becomes the Leader. Tasks are submitted in the queue (may be
managed in MySQL), the Leader takes them and assigns to a worker. The
worker processes gets notified whenever a tasks is submitted by the
leader.

I think these jobs can be coordinated as child znodes for each worker node =
like:

/server/worker1/job1
/server/worker1/job2
/server/worker1/job3
/server/worker2/job1
/server/worker2/job2

To get an alert whenever a job is submitted, the workers can watch on
its corresponding znode. But again I've a doubt here. Is there a
chance in this case, that some jobs might get lost/delayed?

Step 1: Worker is watching on its zonde for jobs.
Step 2: Server submits a job X.
Step 3: Worker gets notified.
Step 4: Before setting the watch again, server submits another job Y.
Step 5: Now the worker sets the watch.

So, my questions are:

1. How to design the process of distributing the tasks evenly?
2. Was ZooKeeper designed for this use case?
3. In the example above, is there a chance that the worker may miss
notification for job Y?

--
Sabyasachi