Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F129F200D06 for ; Mon, 25 Sep 2017 18:27:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EF64B1609BB; Mon, 25 Sep 2017 16:27:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1A62A1609E9 for ; Mon, 25 Sep 2017 18:27:07 +0200 (CEST) Received: (qmail 39603 invoked by uid 500); 25 Sep 2017 16:27:07 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 39592 invoked by uid 99); 25 Sep 2017 16:27:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Sep 2017 16:27:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AB3AB1A0FD9 for ; Mon, 25 Sep 2017 16:27:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id qF8sferWJJbs for ; Mon, 25 Sep 2017 16:27:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 957086128E for ; Mon, 25 Sep 2017 16:27:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4329AE0F1B for ; Mon, 25 Sep 2017 16:27:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E0AA4242B4 for ; Mon, 25 Sep 2017 16:27:00 +0000 (UTC) Date: Mon, 25 Sep 2017 16:27:00 +0000 (UTC) From: "Hadoop QA (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 25 Sep 2017 16:27:09 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179266#comment-16179266 ] Hadoop QA commented on ZOOKEEPER-1416: -------------------------------------- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1040//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1040//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1040//console This message is automatically generated. > Persistent Recursive Watch > -------------------------- > > Key: ZOOKEEPER-1416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation, java client, server > Reporter: Phillip Liu > Assignee: Jordan Zimmerman > Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > h4. The Problem > A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch event is sent to the client. If there are thousands of znodes being watched, when a client (re)connect, it would have to send thousands of watch requests. At Facebook, we have this problem storing information for thousands of db shards. Consequently a naming service that consumes the db shard definition issues thousands of watch requests each time the service starts and changes client watcher. > h4. Proposed Solution > We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant nodes. A Persistent Recursive Watch behaves as follows: > # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS. > # CHILDREN and DATA Recursive Watches can be placed on any znode. > # EXISTS Recursive Watches can be placed on any path. > # A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a Recursive Watch means to set watches on all descendant znodes. > # When a watch on a descendant fires, no subsequent event is fired until a corresponding getData(..) on the znode is called, then Recursive Watch automically apply the watch on the znode. This maintains the existing Watch semantic on an individual znode. > # A Recursive Watch overrides any watches placed on a descendant znode. Practically this means the Recursive Watch Watcher callback is the one receiving the event and event is delivered exactly once. > A goal here is to reduce the number of semantic changes. The guarantee of no intermediate watch event until data is read will be maintained. The only difference is we will automatically re-add the watch after read. At the same time we add the convience of reducing the need to add multiple watches for sibling znodes and in turn reduce the number of watch messages sent from the client to the server. > There are some implementation details that needs to be hashed out. Initial thinking is to have the Recursive Watch create per-node watches. This will cause a lot of watches to be created on the server side. Currently, each watch is stored as a single bit in a bit set relative to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients, each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed across all Observers. Seems expensive. > Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can be set each time a watch event from a Recursive Watch is fired. The memory utilization is relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the parameters given above. > Otherwise, a relaxation of no intermediate watch event until read guarantee is required. If the server can send watch events regardless of one has already been fired without corresponding read, then the server can simply fire watch events without tracking. -- This message was sent by Atlassian JIRA (v6.4.14#64029)