Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 38B539F7B for ; Fri, 13 Jan 2012 04:03:43 +0000 (UTC) Received: (qmail 19296 invoked by uid 500); 13 Jan 2012 04:03:28 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 18991 invoked by uid 500); 13 Jan 2012 04:02:50 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 15060 invoked by uid 99); 13 Jan 2012 04:02:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jan 2012 04:02:23 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jzimmerman@netflix.com designates 69.53.237.163 as permitted sender) Received: from [69.53.237.163] (HELO exout102.netflix.com) (69.53.237.163) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jan 2012 04:02:15 +0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:in-reply-to:content-type:mime-version; bh=kE2XZ6fyT06xcSmi0QuNZgj9Qrw=; b=SFkJPJHA5+2E/+DmHtwuwKc1TedcNtbYwj1gqXqbn7/Fp6lXIC8RIUkeUPIL9W/aJvvEVlQC maRaoXxCjDdUw2mo0V5H5QgPJOqzXNI6P7hRHrIVjedkuPM/LJ+wg9nmJhRz3BzbQWgpEJ9J V6+GxfaPupgnU0UsDhT18BLXuCg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:in-reply-to:content-type:mime-version; b=RFW+FhPA8hvwsMyfQZbUXijjfBhZVNF155pdQfRXU6r6rBOorVk6HcEDkaV/mh6NwyG7+tXA dsINNhQW51H7/FpMCs74eKvB+Nuu5FY/eJ3/cJUWLZV3N2VHXbtbzooafQLRfs70+6faOkX5 OPT+zeCnHEbk9TRI0Xxr0EkT7N0= Received: from EXFE102.corp.netflix.com (10.64.32.162) by exout102.netflix.com (10.64.240.74) with Microsoft SMTP Server (TLS) id 8.3.192.1; Thu, 12 Jan 2012 20:01:53 -0800 Received: from EXMB107.corp.netflix.com ([169.254.7.30]) by exfe102.corp.netflix.com ([10.64.32.162]) with mapi id 14.01.0323.003; Thu, 12 Jan 2012 20:01:54 -0800 From: Jordan Zimmerman To: "user@zookeeper.apache.org" Subject: Re: Use cases for ZooKeeper Thread-Topic: Use cases for ZooKeeper Thread-Index: AQHMy1/6Z+uhdwXXnkSigtVgC/bZeJX9VAhAgACci4D//3vNAIAAkTyAgAsRqwCAARNvgP//lrKA Date: Fri, 13 Jan 2012 04:01:54 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.10.0.110310 x-originating-ip: [10.64.25.33] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sure - give me what you have and I'll port it to Curator. On 1/12/12 6:18 PM, "Ted Dunning" wrote: >I think I have a bit of it written already. > >It doesn't use Curator and I think you could simplify it substantially if >you were to use it. Would that help? > >On Thu, Jan 12, 2012 at 12:52 PM, Jordan Zimmerman >wrote: > >> Ted - are you interested in writing this on top of Curator? If not, I'll >> give it a whack. >> >> -JZ >> >> On 1/5/12 12:50 AM, "Ted Dunning" wrote: >> >> >Jordan, I don't think that leader election does what Josh wants. >> > >> >I don't think that consistent hashing is particularly good for that >>either >> >because the loss of one node causes the sequential state for lots of >> >entities to move even among nodes that did not fail. >> > >> >What I would recommend is a variant of micro-sharding. The key space >>is >> >divided into many micro-shards. Then nodes that are alive claim the >> >micro-shards using ephemerals and proceed as Josh described. On loss >>of a >> >node, the shards that node was handling should be claimed by the >>remaining >> >nodes. When a new node appears or new work appears, it is helpful to >> >direct nodes to effect a hand-off of traffic. >> > >> >In my experience, the best way to implement shard balancing is with and >> >external master instance much in the style of hbase or katta. This >> >external master can be exceedingly simple and only needs to wake up on >> >various events like loss of a node or change in the set of live shards. >> >It >> >can also wake up at intervals if desired to backstop the normal >> >notifications or to allow small changes for certain kinds of balancing. >> > Typically, this only requires a few hundred lines of code. >> > >> >This external master can, of course, be run on multiple nodes and which >> >master is in current control can be adjudicated with yet another leader >> >election. >> > >> >You can view this as a package of many leader elections. Or as >> >discretized >> >consistent hashing. The distinctions are a bit subtle but are very >> >important. These include, >> > >> >- there is a clean division of control between the master which >>determines >> >who serves what and the nodes that do the serving >> > >> >- there is no herd effect because the master drives the assignments >> > >> >- node loss causes the minimum amount of change of assignments since no >> >assignments to surviving nodes are disturbed. This is a major win. >> > >> >- balancing is pretty good because there are many shards compared to >>the >> >number of nodes. >> > >> >- the balancing strategy is highly pluggable. >> > >> >This pattern would make a nice addition to Curator, actually. It >>comes up >> >repeatedly in different contexts. >> > >> >On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman >> >wrote: >> > >> >> OK - so this is two options for doing the same thing. You use a >>Leader >> >> Election algorithm to make sure that only one node in the cluster is >> >> operating on a work unit. Curator has an implementation (it's really >> >>just >> >> a distributed lock with a slightly different API). >> >> >> >> -JZ >> >> >> >> On 1/5/12 12:04 AM, "Josh Stone" wrote: >> >> >> >> >Thanks for the response. Comments below: >> >> > >> >> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman >> >> >wrote: >> >> > >> >> >> Hi Josh, >> >> >> >> >> >> >Second use case: Distributed locking >> >> >> This is one of the most common uses of ZooKeeper. There are many >> >> >> implementations - one included with the ZK distro. Also, there is >> >> >>Curator: >> >> >> https://github.com/Netflix/curator >> >> >> >> >> >> >First use case: Distributing work to a cluster of nodes >> >> >> This sounds feasible. If you give more details I and others on >>this >> >>list >> >> >> can help more. >> >> >> >> >> > >> >> >Sure. I basically want to handle race conditions where two commands >> >>that >> >> >operate on the same data are received by my cluster of znodes, >> >> >concurrently. One approach is to lock on the data that is effected >>by >> >>the >> >> >command (distributed lock). Another approach is make sure that all >>of >> >>the >> >> >commands that operate on any set of data are routed to the same >>node, >> >> >where >> >> >they can be processed serially using local synchronization. >>Consistent >> >> >hashing is an algorithm that can be used to select a node to handle >>a >> >> >message (where the inputs are the key to hash and the number of >>nodes >> >>in >> >> >the cluster). >> >> > >> >> >There are various implementations for this floating around. I'm just >> >> >interesting to know how this is working for anyone else. >> >> > >> >> >Josh >> >> > >> >> > >> >> >> >> >> >> -JZ >> >> >> >> >> >> ________________________________________ >> >> >> From: Josh Stone [pacesysjosh@gmail.com] >> >> >> Sent: Wednesday, January 04, 2012 8:09 PM >> >> >> To: user@zookeeper.apache.org >> >> >> Subject: Use cases for ZooKeeper >> >> >> >> >> >> I have a few use cases that I'm wondering if ZooKeeper would be >> >>suitable >> >> >> for and would appreciate some feedback. >> >> >> >> >> >> First use case: Distributing work to a cluster of nodes using >> >>consistent >> >> >> hashing to ensure that messages of some type are consistently >> >>handled by >> >> >> the same node. I haven't been able to find any info about >>ZooKeeper + >> >> >> consistent hashing. Is anyone using it for this? A concern here >> >>would be >> >> >> how to redistribute work as nodes come and go from the cluster. >> >> >> >> >> >> Second use case: Distributed locking. I noticed that there's a >>recipe >> >> >>for >> >> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One >> >> >>concern >> >> >> would be how to handle orphaned locks if a node that obtained a >>lock >> >> >>goes >> >> >> down. >> >> >> >> >> >> Third use case: Fault tolerance. If we utilized ZooKeeper to >> >>distribute >> >> >> messages to workers, can it be made to handle a node going down by >> >> >> re-distributing the work to another node (perhaps messages that >>are >> >>not >> >> >> ack'ed within a timeout are resent)? >> >> >> >> >> >> Cheers, >> >> >> Josh >> >> >> >> >> >> >> >> >>