zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch
Date Thu, 10 Aug 2017 22:54:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122481#comment-16122481
] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
-------------------------------------------

Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/136#discussion_r132588619
  
    --- Diff: src/java/main/org/apache/zookeeper/ZooDefs.java ---
    @@ -74,12 +74,16 @@
     
             public final int createTTL = 21;
     
    +        public final int addPersistentWatch = 22;
    +
             public final int auth = 100;
     
             public final int setWatches = 101;
     
             public final int sasl = 102;
     
    +        public final int setWatches2 = 103;
    --- End diff --
    
    @Randgalt im guessing you meant to put "The problem is that it makes 3.5.4 clients incompatible
with 3.5.3 servers. That might make sense for 3.6.0 but it's unreasonable for .x.N release."
here. I agree totally. This code should stay the same in the 3.5 line but there is no reason
it needs to be identical to the 3.6 line which is why i think it should be changed "here".



> Persistent Recursive Watch
> --------------------------
>
>                 Key: ZOOKEEPER-1416
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, documentation, java client, server
>            Reporter: Phillip Liu
>            Assignee: Jordan Zimmerman
>         Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch
event is sent to the client. If there are thousands of znodes being watched, when a client
(re)connect, it would have to send thousands of watch requests. At Facebook, we have this
problem storing information for thousands of db shards. Consequently a naming service that
consumes the db shard definition issues thousands of watch requests each time the service
starts and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch
reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant
nodes. A Persistent Recursive Watch behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a
 Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a corresponding
getData(..) on the znode is called, then Recursive Watch automically apply the watch on the
znode. This maintains the existing Watch semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. Practically this
means the Recursive Watch Watcher callback is the one receiving the event and event is delivered
exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no intermediate
watch event until data is read will be maintained. The only difference is we will automatically
re-add the watch after read. At the same time we add the convience of reducing the need to
add multiple watches for sibling znodes and in turn reduce the number of watch messages sent
from the client to the server.
> There are some implementation details that needs to be hashed out. Initial thinking is
to have the Recursive Watch create per-node watches. This will cause a lot of watches to be
created on the server side. Currently, each watch is stored as a single bit in a bit set relative
to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients,
each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed
across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can
be set each time a watch event from a Recursive Watch is fired. The memory utilization is
relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the
parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee is required.
If the server can send watch events regardless of one has already been fired without corresponding
read, then the server can simply fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message