From user-return-2847-apmail-zookeeper-user-archive=zookeeper.apache.org@zookeeper.apache.org Tue Mar 08 08:11:57 2011 Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 15126 invoked from network); 8 Mar 2011 08:11:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Mar 2011 08:11:53 -0000 Received: (qmail 217 invoked by uid 500); 8 Mar 2011 08:11:48 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 99340 invoked by uid 500); 8 Mar 2011 08:11:47 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 99210 invoked by uid 99); 8 Mar 2011 08:11:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 08:11:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of savu.andrei@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vw0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 06:55:21 +0000 Received: by vws10 with SMTP id 10so5922480vws.15 for ; Mon, 07 Mar 2011 22:55:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=SxykfO40SCD6FxZdkJz1BwMA4eVn5LYMVX0Z/JYGSts=; b=o+rkmphwFnJo21CMFTHkxLkyvunvenNEcEwY2EkdTpPlrXr2m9cjk6XdhFxBKe2YAr vBdrdgoVIcTL/R8ZBEp4Z1bV2V2g68FDoaeUmyPTqk0vpVF856ZUffxQswFijgzsFj6+ 4NGEwGYa3HavsuMd/GjuRwbWI7r9mnsVKF5KY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=pol2V9KgFfULsVv5HStT9ufnhSGfmAx1UDzYI92I1dYghQlL2Dc64RXbNn9l+5n3uH YC2tznHosmU2UA+KTI8JT6bM4uQpdKf4IxLGfJnsfYCbIbUwYSG1EZxplFOY0Zj25LMB 3inq2MojjKhuCwihitR//1NyVKQX+0+CmzGkQ= Received: by 10.52.98.169 with SMTP id ej9mr1356829vdb.223.1299567300115; Mon, 07 Mar 2011 22:55:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.82.3 with HTTP; Mon, 7 Mar 2011 22:54:40 -0800 (PST) In-Reply-To: References: From: Andrei Savu Date: Tue, 8 Mar 2011 08:54:40 +0200 Message-ID: Subject: Re: Task/Job distribution using ZooKeeper To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Some time ago I've wrote a proof-of-concept implementation for a highly available distribute message queue based on ZooKeeper. You can find the code on Github: https://github.com/andreisavu/zookeeper-mq I've also performed some fault injection testing and the code does a good job at handling node failures. I hope you will find this useful. It should be easy to write something similar in Java if you want. -- Andrei Savu /=A0andreisavu.ro On Mon, Mar 7, 2011 at 12:41 PM, Sabyasachi Ruj wrote= : > Hi, > > I am planning to write an application which will have Worker processes > distributed across multiple=A0machines. One of them will be Leader which > will assign tasks to other processes. Designing the Leader elelection > process is quite simple: each process tries to create a ephemeral node > in the same path. Whoever is successful, becomes the leader. I got > this technique from Mahadev Konar's talk here: > http://developer.yahoo.com/blogs/ydn/posts/2009/08/hadoop_summit_zookeepe= r/ > . But could not find any discussion about task/job distribution using > ZooKeeper. > > I'll elaborate a little on the environment setup: > > Suppose there are 10 worker maschines, each one runs a process, one of > them becomes the Leader. Tasks are submitted in the queue (may be > managed in MySQL), the Leader takes them and assigns to a worker. The > worker processes gets notified whenever a tasks is submitted by the > leader. > > I think these jobs can be coordinated as child znodes for each worker nod= e like: > > /server/worker1/job1 > /server/worker1/job2 > /server/worker1/job3 > /server/worker2/job1 > /server/worker2/job2 > > To get an alert whenever a job is submitted, the workers can watch on > its corresponding znode. But again I've a doubt here. Is there a > chance in this case, that some jobs might get lost/delayed? > > Step 1: Worker is watching on its zonde for jobs. > Step 2: Server submits a job X. > Step 3: Worker gets notified. > Step 4: Before setting the watch again, server submits another job Y. > Step 5: Now the worker sets the watch. > > So, my questions are: > > 1. How to design the process of distributing the tasks evenly? > 2. Was ZooKeeper designed for this use case? > 3. In the example above, is there a chance that the worker may miss > notification for job Y? > > -- > Sabyasachi >