Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 703 invoked from network); 7 Mar 2011 15:22:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Mar 2011 15:22:55 -0000 Received: (qmail 38267 invoked by uid 500); 7 Mar 2011 15:22:54 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 38231 invoked by uid 500); 7 Mar 2011 15:22:54 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 38222 invoked by uid 99); 7 Mar 2011 15:22:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 15:22:54 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Camille.Fournier@gs.com designates 204.4.187.100 as permitted sender) Received: from [204.4.187.100] (HELO mxecd05.gs.com) (204.4.187.100) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 15:22:45 +0000 X-IronPort-AV: E=Sophos;i="4.62,277,1297054800"; d="scan'208";a="276948662" Received: from unknown (HELO mxpbd02-public.ny.fw.gs.com) ([148.86.115.129]) by mxecd05.idz.gs.com with ESMTP; 07 Mar 2011 10:22:24 -0500 From: "Fournier, Camille F. [Tech]" X-sendergroup: RELAYLIST Received: from gshccdp14ex.firmwide.corp.gs.com ([139.172.139.86]) by mxpbd02.ny.fw.gs.com with ESMTP; 07 Mar 2011 10:22:24 -0500 Received: from GSCMAMP02EX.firmwide.corp.gs.com ([139.172.184.156]) by gshccdp14ex.firmwide.corp.gs.com ([139.172.139.86]) with mapi; Mon, 7 Mar 2011 10:22:24 -0500 To: "'user@zookeeper.apache.org'" Date: Mon, 7 Mar 2011 10:22:23 -0500 Subject: RE: Task/Job distribution using ZooKeeper Thread-Topic: Task/Job distribution using ZooKeeper Thread-Index: AcvctGR/nLkPeDonT2SYA1b3p1f+awAJgwZg Message-ID: <69D3016305F9084FBD2C4A0DF189BD5C16B9AAAB5B@GSCMAMP02EX.firmwide.corp.gs.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-retentionstamp: Firmwide Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Have you checked out the distributed queue recipe? It is what I have used t= o implement a solution to a similar problem. http://hadoop.apache.org/zookeeper/docs/r3.3.2/recipes.html Are the jobs worker-specific, or can all workers handle all jobs? The distr= ibuted queue protocol is very nice and simple. If you have a list of tasks = and the workers are all able to handle all tasks, they can just pick tasks = off as they become available and you don't have to worry too much about loa= d balancing. Otherwise you can use the same recipe to do a queue per worker= . Either way I think it will answer some of your questions about how to wat= ch and not miss tasks. C -----Original Message----- From: Sabyasachi Ruj [mailto:ruj.sabya@gmail.com]=20 Sent: Monday, March 07, 2011 5:42 AM To: user@zookeeper.apache.org Subject: Task/Job distribution using ZooKeeper Hi, I am planning to write an application which will have Worker processes distributed across multiple=A0machines. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader. I got this technique from Mahadev Konar's talk here: http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/ . But could not find any discussion about task/job distribution using ZooKeeper. I'll elaborate a little on the environment setup: Suppose there are 10 worker maschines, each one runs a process, one of them becomes the Leader. Tasks are submitted in the queue (may be managed in MySQL), the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted by the leader. I think these jobs can be coordinated as child znodes for each worker node = like: /server/worker1/job1 /server/worker1/job2 /server/worker1/job3 /server/worker2/job1 /server/worker2/job2 To get an alert whenever a job is submitted, the workers can watch on its corresponding znode. But again I've a doubt here. Is there a chance in this case, that some jobs might get lost/delayed? Step 1: Worker is watching on its zonde for jobs. Step 2: Server submits a job X. Step 3: Worker gets notified. Step 4: Before setting the watch again, server submits another job Y. Step 5: Now the worker sets the watch. So, my questions are: 1. How to design the process of distributing the tasks evenly? 2. Was ZooKeeper designed for this use case? 3. In the example above, is there a chance that the worker may miss notification for job Y? -- Sabyasachi