zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Stone <pacesysj...@gmail.com>
Subject Re: Use cases for ZooKeeper
Date Thu, 05 Jan 2012 17:38:51 GMT
We're thinking along the same lines. Specifically, I was thinking of using
a hash ring to minimize disruptions to the key space when nodes come and
go. Either that, or micro-sharding would be nice and I'm curious how this
has went with anyone else using ZooKeeper? I should mention, this is
basically an alternative to distributed locks. Both achieve the same thing
- protecting against race conditions.

Josh

On Thu, Jan 5, 2012 at 12:50 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Jordan, I don't think that leader election does what Josh wants.
>
> I don't think that consistent hashing is particularly good for that either
> because the loss of one node causes the sequential state for lots of
> entities to move even among nodes that did not fail.
>
> What I would recommend is a variant of micro-sharding.  The key space is
> divided into many micro-shards.  Then nodes that are alive claim the
> micro-shards using ephemerals and proceed as Josh described.  On loss of a
> node, the shards that node was handling should be claimed by the remaining
> nodes.  When a new node appears or new work appears, it is helpful to
> direct nodes to effect a hand-off of traffic.
>
> In my experience, the best way to implement shard balancing is with and
> external master instance much in the style of hbase or katta.  This
> external master can be exceedingly simple and only needs to wake up on
> various events like loss of a node or change in the set of live shards.  It
> can also wake up at intervals if desired to backstop the normal
> notifications or to allow small changes for certain kinds of balancing.
>  Typically, this only requires a few hundred lines of code.
>
> This external master can, of course, be run on multiple nodes and which
> master is in current control can be adjudicated with yet another leader
> election.
>
> You can view this as a package of many leader elections.  Or as discretized
> consistent hashing.  The distinctions are a bit subtle but are very
> important.  These include,
>
> - there is a clean division of control between the master which determines
> who serves what and the nodes that do the serving
>
> - there is no herd effect because the master drives the assignments
>
> - node loss causes the minimum amount of change of assignments since no
> assignments to surviving nodes are disturbed.  This is a major win.
>
> - balancing is pretty good because there are many shards compared to the
> number of nodes.
>
> - the balancing strategy is highly pluggable.
>
> This pattern would make a nice addition to Curator, actually.  It comes up
> repeatedly in different contexts.
>
> On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman <jzimmerman@netflix.com
> >wrote:
>
> > OK - so this is two options for doing the same thing. You use a Leader
> > Election algorithm to make sure that only one node in the cluster is
> > operating on a work unit. Curator has an implementation (it's really just
> > a distributed lock with a slightly different API).
> >
> > -JZ
> >
> > On 1/5/12 12:04 AM, "Josh Stone" <pacesysjosh@gmail.com> wrote:
> >
> > >Thanks for the response. Comments below:
> > >
> > >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
> > ><jzimmerman@netflix.com>wrote:
> > >
> > >> Hi Josh,
> > >>
> > >> >Second use case: Distributed locking
> > >> This is one of the most common uses of ZooKeeper. There are many
> > >> implementations - one included with the ZK distro. Also, there is
> > >>Curator:
> > >> https://github.com/Netflix/curator
> > >>
> > >> >First use case: Distributing work to a cluster of nodes
> > >> This sounds feasible. If you give more details I and others on this
> list
> > >> can help more.
> > >>
> > >
> > >Sure. I basically want to handle race conditions where two commands that
> > >operate on the same data are received by my cluster of znodes,
> > >concurrently. One approach is to lock on the data that is effected by
> the
> > >command (distributed lock). Another approach is make sure that all of
> the
> > >commands that operate on any set of data are routed to the same node,
> > >where
> > >they can be processed serially using local synchronization. Consistent
> > >hashing is an algorithm that can be used to select a node to handle a
> > >message (where the inputs are the key to hash and the number of nodes in
> > >the cluster).
> > >
> > >There are various implementations for this floating around. I'm just
> > >interesting to know how this is working for anyone else.
> > >
> > >Josh
> > >
> > >
> > >>
> > >> -JZ
> > >>
> > >> ________________________________________
> > >> From: Josh Stone [pacesysjosh@gmail.com]
> > >> Sent: Wednesday, January 04, 2012 8:09 PM
> > >> To: user@zookeeper.apache.org
> > >> Subject: Use cases for ZooKeeper
> > >>
> > >> I have a few use cases that I'm wondering if ZooKeeper would be
> suitable
> > >> for and would appreciate some feedback.
> > >>
> > >> First use case: Distributing work to a cluster of nodes using
> consistent
> > >> hashing to ensure that messages of some type are consistently handled
> by
> > >> the same node. I haven't been able to find any info about ZooKeeper +
> > >> consistent hashing. Is anyone using it for this? A concern here would
> be
> > >> how to redistribute work as nodes come and go from the cluster.
> > >>
> > >> Second use case: Distributed locking. I noticed that there's a recipe
> > >>for
> > >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
> > >>concern
> > >> would be how to handle orphaned locks if a node that obtained a lock
> > >>goes
> > >> down.
> > >>
> > >> Third use case: Fault tolerance. If we utilized ZooKeeper to
> distribute
> > >> messages to workers, can it be made to handle a node going down by
> > >> re-distributing the work to another node (perhaps messages that are
> not
> > >> ack'ed within a timeout are resent)?
> > >>
> > >> Cheers,
> > >> Josh
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message