hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: getting created child on NodeChildrenChanged event
Date Sat, 04 Sep 2010 06:18:39 GMT
Hi Todd, 
  We have always tried to lean on the side of keeping things lightweight and
the api simple. The only way you would be able to do this is with sequential
creates.

1. create nodes like /queueelement-$i where i is a monotonically increasing
number. You could use the sequential flag of zookeeper to do this.

2. when deleting a node, you would remove the node and create a deleted node
on 

/deletedqueueelements/queuelement-$i

2.1 on notification you would go to /deletedqueelements/ and find out which
ones were deleted. 

The above only works if you are ok with monotonically unique queue elements.

3. the above method allows the folks to see the deltas using
deletedqueuelements, which can be garbage collected by some clean up process
(you can be smarter abt this as well)

Would something like this work?


Thanks
mahadev


On 8/31/10 3:55 PM, "Todd Nine" <todd@spidertracks.co.nz> wrote:

> Hi Dave,
>   Thanks for the response.  I understand your point about missed events
> during a watch reset period.  I may be off, here is the functionality I
> was thinking.  I'm not sure if the ZK internal versioning process could
> possibly support something like this.
> 
> 1. A watch is placed on children
> 2. The event is fired to the client.  The client receives the Stat
> object as part of the event for the current state of the node when the
> event was created.  We'll call this Stat A with version 1
> 3. The client performs processing.  Meanwhile the node has several
> children changed. Versions are incremented to version 2 and version 3
> 4. Client resets the watch
> 5. A node is added
> 6. The event is fired to the client.  Client receives Stat B with
> version 4
> 7. Client calls performs a deltaChildren(Stat A, Stat B)
> 8. zookeeper returns added nodes between stats, also returns deleted
> nodes between stats.
> 
> This would handle the missed event problem since the client would have
> the 2 states it needs to compare.  It also allows clients dealing with
> large data sets to only deal with the delta over time (like a git
> replay).  Our number of queues could get quite large, and I'm concerned
> that keeping my previous event's children in a set to perform the delta
> may become quite memory and processor intensive  Would a feature like
> this be possible without over complicating the Zookeeper core?
> 
> 
> Thanks,
> Todd
> 
> On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:
> 
>> Hi Todd -
>> The general explanation for why Zookeeper doesn't pass the event information
>> w/ the event notification is that an event notification is only triggered
>> once, and thus may indicate multiple events. For example, if you do a
>> GetChildren and set a watch, then multiple children are added at about the
>> same time, the first one triggers a notification, but the second (or later)
>> ones do not. When you do another GetChildren() request to get the list and
>> reset the watch, you'll see all the changed nodes, however if you had just
>> been told about the first change in the notification you would have missed
>> the others.
>> To do what you are wanting, you would really need "persistent" watches that
>> send notifications every time a change occurs and don't need to be reset so
>> you can't miss events. That isn't the design that was chosen for Zookeeper
>> and I don't think it's likely to be implemented.
>> 
>> -Dave Wright
>> 
>> On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine <todd@spidertracks.co.nz> wrote:
>> 
>>> Hi all,
>>>  I'm writing a distributed queue monitoring class for our leader node in
>>> the cluster.  We're queueing messages per input hardware device, this queue
>>> is then assigned to a node with the least load in our cluster.  To do this,
>>> I maintain 2 Persistent Znode with the following format.
>>> 
>>> data queue
>>> 
>>> /dataqueue/devices/<unit id>/<data packet>
>>> 
>>> processing follower
>>> 
>>> /dataqueue/nodes/<node name>/<unit id>
>>> 
>>> The queue monitor watches for changes on the path of /dataqueue/devices.
>>>  When the first packet from a unit is received, the queue writer will
>>> create
>>> the queue with the unit id.  This triggers the watch event on the
>>> monitoring
>>> class, which in turn creates the znode for the path with the least loaded
>>> node.  This path is watched for child node creation and the node creates a
>>> queue consumer to consume messages from the new queue.
>>> 
>>> 
>>> Our list of queues can become quite large, and I would prefer not to
>>> maintain a list of queues I have assigned then perform a delta when the
>>> event fires to determine which queues are new and caused the watch event. I
>>> can't really use sequenced nodes and keep track of my last read position,
>>> because I don't want to iterate over the list of queues to determine which
>>> sequenced node belongs to the current unit id (it would require full
>>> iteration, which really doesn't save me any reads).  Is it possible to
>>> create a watch to return the path and Stat of the child node that caused
>>> the
>>> event to fire?
>>> 
>>> Thanks,
>>> Todd
>>> 
> 


Mime
View raw message