zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch
Date Mon, 02 Jan 2017 15:51:59 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793074#comment-15793074
] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
-------------------------------------------

GitHub user Randgalt opened a pull request:

    https://github.com/apache/curator/pull/181

    FOR DISCUSSION ONLY - Persistent watch and Cache recipe replacements

    I've pushed an implementation for Persistent recursive watches as a ZooKeeper PR for https://issues.apache.org/jira/browse/ZOOKEEPER-1416
- if it's accepted, Curator should support this. This PR has implementations for:
    
    - PersistentWatcher
    - CuratorCache
    
    PersistentWatcher is a wrapper around the new persistent/recursive cache
    
    CuratorCache is a replacement for PathChildrenCache, TreeCache and NodeCache. With Persistent
recursive watches the implementation is orders of magnitude simpler and uses a lot less resources
(i.e. 1 watch for the entire tree).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/curator persistent-watch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/curator/pull/181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #181
    
----
commit 32a2fb7594510be2ee6d28c3c3a7db3b4ee9ab99
Author: randgalt <randgalt@apache.org>
Date:   2016-12-28T19:50:05Z

    wip

commit 94a0205d4c3d34b1e1384ab5af1b997f74d2a912
Author: randgalt <randgalt@apache.org>
Date:   2016-12-29T04:10:15Z

    Finished addPersistentWatcher DSL, re-wrote new version of cache code to handle all cases
and deprecated other versions

commit 0d9acb6dd4ec4143cf08ae1cf4ab77a0865370e2
Author: randgalt <randgalt@apache.org>
Date:   2016-12-29T15:02:15Z

    wip

commit 01652cef64e3cf3cc1e311b7a85f3c613f06ab0a
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T15:26:18Z

    wip, refactoring, testing

commit bf73f0d3999bfc21b1799ce0c9d3e06214479206
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T17:03:41Z

    continued work on porting old PathChildrenCache tests

commit 076583d14506e3e761ca061cd51a358a97c08eb6
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T18:23:14Z

    CacheListener needs to get the affected node. Also, PATH_ONLY still needs to store the
stat

commit 5b0a9f56e7d050eedfea0618f90c58d718441d3f
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T19:09:53Z

    refactoring

commit 313fd7d46ccede6bbc9ac1feb0b5a2099fce7a6d
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T20:12:04Z

    Added a composite cache

commit 1f0bdf9265e6f5bfb34520761649240209c17d72
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T21:26:10Z

    renamed rebuildTestExchanger

commit f8f5cafa956da97c5fa177ac64ee003e955887da
Author: randgalt <randgalt@apache.org>
Date:   2016-12-30T21:26:23Z

    finished ported tests

commit 38c766310432bd1d6b3f64d2778b3605df434e64
Author: randgalt <randgalt@apache.org>
Date:   2016-12-31T21:41:10Z

    More test porting, refinements

commit 6cfd38c25391865503ba4cf35530f1794c777b91
Author: randgalt <randgalt@apache.org>
Date:   2016-12-31T22:57:18Z

    More testing and refactoring. Wasn't checking for deleted children after a refresh. Also,
allow for different methods of comparing nodes for change.

commit 40a985243d2959a2fff397eeebb9ff844f6a154c
Author: randgalt <randgalt@apache.org>
Date:   2017-01-01T00:48:10Z

    finished porting TreeCache tests

commit add0d10bbb58b0dd6eeffde8c6a2bd2df99a7eae
Author: randgalt <randgalt@apache.org>
Date:   2017-01-01T15:55:26Z

    Finished porting TestTreeCacheRandomTree. However, it exposed a design issue with separate
CacheFilters and RefreshFilters. To do maxDepth properly you need both to be in sync. Need
to rething this.

commit 2cf7c412caf81cb7846a7da5aac3adbc62502d3e
Author: randgalt <randgalt@apache.org>
Date:   2017-01-01T18:33:13Z

    Reworked filters. Went back to the CacheSelector multi-method fitler used in TreeCache.

commit 72fe88c3b99e43e905cce40c178a08cb2c409b78
Author: randgalt <randgalt@apache.org>
Date:   2017-01-01T19:24:04Z

    Ported/finished NodeCache and tests

commit 8565de6d75d34b2dd597878167b51cd921a0a00e
Author: randgalt <randgalt@apache.org>
Date:   2017-01-01T22:15:14Z

    Removed composite stuff. Interesting, but gilding the lilly

commit 2148b6d1a829c2efd0309e2914b755ce9ebff003
Author: randgalt <randgalt@apache.org>
Date:   2017-01-02T02:22:32Z

    Add docs, more refactoring, final testing, etc.

----


> Persistent Recursive Watch
> --------------------------
>
>                 Key: ZOOKEEPER-1416
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, documentation, java client, server
>            Reporter: Phillip Liu
>            Assignee: Jordan Zimmerman
>         Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch
event is sent to the client. If there are thousands of znodes being watched, when a client
(re)connect, it would have to send thousands of watch requests. At Facebook, we have this
problem storing information for thousands of db shards. Consequently a naming service that
consumes the db shard definition issues thousands of watch requests each time the service
starts and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch
reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant
nodes. A Persistent Recursive Watch behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a
 Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a corresponding
getData(..) on the znode is called, then Recursive Watch automically apply the watch on the
znode. This maintains the existing Watch semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. Practically this
means the Recursive Watch Watcher callback is the one receiving the event and event is delivered
exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no intermediate
watch event until data is read will be maintained. The only difference is we will automatically
re-add the watch after read. At the same time we add the convience of reducing the need to
add multiple watches for sibling znodes and in turn reduce the number of watch messages sent
from the client to the server.
> There are some implementation details that needs to be hashed out. Initial thinking is
to have the Recursive Watch create per-node watches. This will cause a lot of watches to be
created on the server side. Currently, each watch is stored as a single bit in a bit set relative
to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients,
each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed
across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can
be set each time a watch event from a Recursive Watch is fired. The memory utilization is
relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the
parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee is required.
If the server can send watch events regardless of one has already been fired without corresponding
read, then the server can simply fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message