hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Parent nodes & multi-step transactions
Date Tue, 24 Aug 2010 04:17:21 GMT
My own opinion is that lots of these structure sorts of problems are solved
by putting the structure into a single znode.  Atomic creation and update
come for free at that point and we can even make the node ephemeral which we
can't really do if there are children.

It is tempting to use children and grand-children in ZK when this is needed,
but it is surprisingly useful to avoid this.

Take Katta as an example.  This is a sharded query systems.  The master
knows about shards that need to be handled by nodes.  Nodes come on-line and
advertise their existence.  The master assigns shards to nodes.  The nodes
download the shards and advertise that they are handling the nodes.  The
master has to handle node failures and recoveries.

The natural representation is to have the nodes signal that they are
handling a particular node by creating an ephemeral file under a per shard
directory.  This is nice because node failures cause automagical update of
the data.  The dual is also natural ... we can create shard files under node
directories.  That dual is a serious mistake, however, and it is much better
to put all the dual information in a single node file that the node itself
creates.  This allows ephemerality to maintain a correct view for us.

There are other places where this idea works well.  One such thing is a
queue of tasks.  The queue itself can be represented as several files that
contain lots of tasks instead of keeping each task in a separate file.

This doesn't eliminate all desire for transactions, but it gets rid of LOTs
of them.


On Tue, Aug 24, 2010 at 12:31 AM, Dave Wright <wrightd@gmail.com> wrote:

> For my $0.02, I really think it would be nice if ZK supported
> "lightweight transactions". By that, I simply mean that a batch of
> create/update/delete requests could be submitted in a single request,
> and be processed atomically (if any of the requests would fail, none
> are applied).
> I know transactions have been discussed before and discarded as adding
> too much complexity, but I think a simple version of transactions
> would be extremely helpful. A significant portion of our code is
> cleanup/workarounds for the inability to make several updates
> atomically. Should the time allow for me to work on any single
> feature, that's probably the one I would pick, although I'm concerned
> that there would be resistance to accepting upstream.
>
> -Dave Wright
>
> On Mon, Aug 23, 2010 at 6:51 PM, Gustavo Niemeyer <gustavo@niemeyer.net>
> wrote:
> > Hi Mahadev,
> >
> >>  Usually the paradigm I like to suggest is to have something like
> >>
> >> /A/init
> >>
> >> Every client watches for the existence of this node and this node is
> only
> >> created after /A has been initialized with the creation of /A/C or other
> >> stuff.
> >>
> >> Would that work for you?
> >
> > Yeah, this is what I referred to as "liveness nodes" in my prior
> > ramblings, but I'm a bit sad about the amount of boilerplate work that
> > will have to be done to put use something like this.  It feels like as
> > the size of the problem increases, it might become a bit hard to keep
> > the whole picture in mind.
> >
> > Here is a slightly more realistic example (still significantly
> > reduced), to give you an idea of the problem size:
> >
> > /services/wordpress/settings
> > /services/wordpress/units/wordpress-0/agent-connected
> > /services/wordpress/units/wordpress-1
> > /machines/machine-0/agent-connected
> > /machines/machine-0/units/wordpress-1
> > /machines/machine-1/units/wordpress-0
> >
> > There are quite a few dynamic nodes here which are created and
> > initialized on demand.  If we use these liveness nodes, we'll have to
> > not only set watches in several places, but also have some kind of
> > recovering daemon to heal a half-created state, and also filter
> > user-oriented feedback to avoid showing nodes which may be dead.  All
> > of that would be avoided if there was a way to have multi-step atomic
> > actions.  I'm almost pondering about a journal-like system on top of
> > the basic API, to avoid having to deal with this manually.
> >
> > --
> > Gustavo Niemeyer
> > http://niemeyer.net
> > http://niemeyer.net/blog
> > http://niemeyer.net/twitter
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message