zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Use cases for ZooKeeper
Date Fri, 13 Jan 2012 02:18:45 GMT
I think I have a bit of it written already.

It doesn't use Curator and I think you could simplify it substantially if
you were to use it.  Would that help?

On Thu, Jan 12, 2012 at 12:52 PM, Jordan Zimmerman
<jzimmerman@netflix.com>wrote:

> Ted - are you interested in writing this on top of Curator? If not, I'll
> give it a whack.
>
> -JZ
>
> On 1/5/12 12:50 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> >Jordan, I don't think that leader election does what Josh wants.
> >
> >I don't think that consistent hashing is particularly good for that either
> >because the loss of one node causes the sequential state for lots of
> >entities to move even among nodes that did not fail.
> >
> >What I would recommend is a variant of micro-sharding.  The key space is
> >divided into many micro-shards.  Then nodes that are alive claim the
> >micro-shards using ephemerals and proceed as Josh described.  On loss of a
> >node, the shards that node was handling should be claimed by the remaining
> >nodes.  When a new node appears or new work appears, it is helpful to
> >direct nodes to effect a hand-off of traffic.
> >
> >In my experience, the best way to implement shard balancing is with and
> >external master instance much in the style of hbase or katta.  This
> >external master can be exceedingly simple and only needs to wake up on
> >various events like loss of a node or change in the set of live shards.
> >It
> >can also wake up at intervals if desired to backstop the normal
> >notifications or to allow small changes for certain kinds of balancing.
> > Typically, this only requires a few hundred lines of code.
> >
> >This external master can, of course, be run on multiple nodes and which
> >master is in current control can be adjudicated with yet another leader
> >election.
> >
> >You can view this as a package of many leader elections.  Or as
> >discretized
> >consistent hashing.  The distinctions are a bit subtle but are very
> >important.  These include,
> >
> >- there is a clean division of control between the master which determines
> >who serves what and the nodes that do the serving
> >
> >- there is no herd effect because the master drives the assignments
> >
> >- node loss causes the minimum amount of change of assignments since no
> >assignments to surviving nodes are disturbed.  This is a major win.
> >
> >- balancing is pretty good because there are many shards compared to the
> >number of nodes.
> >
> >- the balancing strategy is highly pluggable.
> >
> >This pattern would make a nice addition to Curator, actually.  It comes up
> >repeatedly in different contexts.
> >
> >On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman
> ><jzimmerman@netflix.com>wrote:
> >
> >> OK - so this is two options for doing the same thing. You use a Leader
> >> Election algorithm to make sure that only one node in the cluster is
> >> operating on a work unit. Curator has an implementation (it's really
> >>just
> >> a distributed lock with a slightly different API).
> >>
> >> -JZ
> >>
> >> On 1/5/12 12:04 AM, "Josh Stone" <pacesysjosh@gmail.com> wrote:
> >>
> >> >Thanks for the response. Comments below:
> >> >
> >> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
> >> ><jzimmerman@netflix.com>wrote:
> >> >
> >> >> Hi Josh,
> >> >>
> >> >> >Second use case: Distributed locking
> >> >> This is one of the most common uses of ZooKeeper. There are many
> >> >> implementations - one included with the ZK distro. Also, there is
> >> >>Curator:
> >> >> https://github.com/Netflix/curator
> >> >>
> >> >> >First use case: Distributing work to a cluster of nodes
> >> >> This sounds feasible. If you give more details I and others on this
> >>list
> >> >> can help more.
> >> >>
> >> >
> >> >Sure. I basically want to handle race conditions where two commands
> >>that
> >> >operate on the same data are received by my cluster of znodes,
> >> >concurrently. One approach is to lock on the data that is effected by
> >>the
> >> >command (distributed lock). Another approach is make sure that all of
> >>the
> >> >commands that operate on any set of data are routed to the same node,
> >> >where
> >> >they can be processed serially using local synchronization. Consistent
> >> >hashing is an algorithm that can be used to select a node to handle a
> >> >message (where the inputs are the key to hash and the number of nodes
> >>in
> >> >the cluster).
> >> >
> >> >There are various implementations for this floating around. I'm just
> >> >interesting to know how this is working for anyone else.
> >> >
> >> >Josh
> >> >
> >> >
> >> >>
> >> >> -JZ
> >> >>
> >> >> ________________________________________
> >> >> From: Josh Stone [pacesysjosh@gmail.com]
> >> >> Sent: Wednesday, January 04, 2012 8:09 PM
> >> >> To: user@zookeeper.apache.org
> >> >> Subject: Use cases for ZooKeeper
> >> >>
> >> >> I have a few use cases that I'm wondering if ZooKeeper would be
> >>suitable
> >> >> for and would appreciate some feedback.
> >> >>
> >> >> First use case: Distributing work to a cluster of nodes using
> >>consistent
> >> >> hashing to ensure that messages of some type are consistently
> >>handled by
> >> >> the same node. I haven't been able to find any info about ZooKeeper
+
> >> >> consistent hashing. Is anyone using it for this? A concern here
> >>would be
> >> >> how to redistribute work as nodes come and go from the cluster.
> >> >>
> >> >> Second use case: Distributed locking. I noticed that there's a recipe
> >> >>for
> >> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
> >> >>concern
> >> >> would be how to handle orphaned locks if a node that obtained a lock
> >> >>goes
> >> >> down.
> >> >>
> >> >> Third use case: Fault tolerance. If we utilized ZooKeeper to
> >>distribute
> >> >> messages to workers, can it be made to handle a node going down by
> >> >> re-distributing the work to another node (perhaps messages that are
> >>not
> >> >> ack'ed within a timeout are resent)?
> >> >>
> >> >> Cheers,
> >> >> Josh
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message