Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 84935 invoked from network); 7 Mar 2011 10:42:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Mar 2011 10:42:43 -0000 Received: (qmail 28419 invoked by uid 500); 7 Mar 2011 10:42:43 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 28358 invoked by uid 500); 7 Mar 2011 10:42:43 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 28350 invoked by uid 99); 7 Mar 2011 10:42:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 10:42:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ruj.sabya@gmail.com designates 209.85.161.170 as permitted sender) Received: from [209.85.161.170] (HELO mail-gx0-f170.google.com) (209.85.161.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 10:42:36 +0000 Received: by gxk1 with SMTP id 1so2772734gxk.15 for ; Mon, 07 Mar 2011 02:42:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=Rka/80QtZJm1nWsjbXedTjciCx5ZOLnmuRr3xMQ9rt8=; b=DwBZ5Mp3kB1xyn8EmQVmAtCNvct3dC2jtVSSiMWWMRsfRMUQHr1P/3AgO1v4qoLRcZ 8Soo3HJIIog41lOGY+5XSgUjqgebUcF58oejizlYb9yYsnfgUg0Qg1ZzAj8woojsfjz2 7Gtg0E9jIOPwCE/Jnob779Hoi3K/Sin+7L2gk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; b=UgF9+bAmjYnLDXdG0+Zane9Zd37Cz8SvU3YxKZQ6bB02gcsm6e7Lk1PKbiPbgMvCRc o07dhx/gtiKlGjc/daxVs42Cx3vrb9os1IH41ksYj7hPpmhy9idsO2OFWwNJmepvXhRa dhTZFhxe6AFuBMrnZC/Gqn1D/28Mdk2p+dtkU= Received: by 10.100.233.33 with SMTP id f33mr1314771anh.89.1299494535092; Mon, 07 Mar 2011 02:42:15 -0800 (PST) MIME-Version: 1.0 Received: by 10.100.164.4 with HTTP; Mon, 7 Mar 2011 02:41:55 -0800 (PST) From: Sabyasachi Ruj Date: Mon, 7 Mar 2011 16:11:55 +0530 Message-ID: Subject: Task/Job distribution using ZooKeeper To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am planning to write an application which will have Worker processes distributed across multiple=A0machines. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader. I got this technique from Mahadev Konar's talk here: http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeeper/ . But could not find any discussion about task/job distribution using ZooKeeper. I'll elaborate a little on the environment setup: Suppose there are 10 worker maschines, each one runs a process, one of them becomes the Leader. Tasks are submitted in the queue (may be managed in MySQL), the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted by the leader. I think these jobs can be coordinated as child znodes for each worker node = like: /server/worker1/job1 /server/worker1/job2 /server/worker1/job3 /server/worker2/job1 /server/worker2/job2 To get an alert whenever a job is submitted, the workers can watch on its corresponding znode. But again I've a doubt here. Is there a chance in this case, that some jobs might get lost/delayed? Step 1: Worker is watching on its zonde for jobs. Step 2: Server submits a job X. Step 3: Worker gets notified. Step 4: Before setting the watch again, server submits another job Y. Step 5: Now the worker sets the watch. So, my questions are: 1. How to design the process of distributing the tasks evenly? 2. Was ZooKeeper designed for this use case? 3. In the example above, is there a chance that the worker may miss notification for job Y? -- Sabyasachi