From dev-return-73347-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Sep 19 19:16:11 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C2991180621 for ; Wed, 19 Sep 2018 19:16:10 +0200 (CEST) Received: (qmail 55697 invoked by uid 500); 19 Sep 2018 17:16:04 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 55685 invoked by uid 99); 19 Sep 2018 17:16:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Sep 2018 17:16:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 69D151A165E for ; Wed, 19 Sep 2018 17:16:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id wg5vrkYD2XVz for ; Wed, 19 Sep 2018 17:16:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 185175F438 for ; Wed, 19 Sep 2018 17:16:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A2730E1328 for ; Wed, 19 Sep 2018 17:16:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6456823FA1 for ; Wed, 19 Sep 2018 17:16:02 +0000 (UTC) Date: Wed, 19 Sep 2018 17:16:02 +0000 (UTC) From: "Hadoop QA (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-1177) Enabling a large number of watches for a large number of clients MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620889#comment-16620889 ] Hadoop QA commented on ZOOKEEPER-1177: -------------------------------------- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2199//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2199//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2199//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2199//console This message is automatically generated. > Enabling a large number of watches for a large number of clients > ---------------------------------------------------------------- > > Key: ZOOKEEPER-1177 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1177 > Project: ZooKeeper > Issue Type: Improvement > Components: server > Affects Versions: 3.3.3 > Reporter: Vishal Kathuria > Assignee: Fangmin Lv > Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1177.patch, ZOOKEEPER-1177.patch, ZooKeeper-with-fix-for-findbugs-warning.patch, ZooKeeper.patch, Zookeeper-after-resolving-merge-conflicts.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > In my ZooKeeper, I see watch manager consuming several GB of memory and I dug a bit deeper. > In the scenario I am testing, I have 10K clients connected to an observer. There are about 20K znodes in ZooKeeper, each is about 1K - so about 20M data in total. > Each client fetches and puts watches on all the znodes. That is 200 million watches. > It seems a single watch takes about 100 bytes. I am currently at 14528037 watches and according to the yourkit profiler, WatchManager has 1.2 G already. This is not going to work as it might end up needing 20G of RAM just for the watches. > So we need a more compact way of storing watches. Here are the possible solutions. > 1. Use a bitmap instead of the current hashmap. In this approach, each znode would get a unique id when its gets created. For every session, we can keep track of a bitmap that indicates the set of znodes this session is watching. A bitmap, assuming a 100K znodes, would be 12K. For 10K sessions, we can keep track of watches using 120M instead of 20G. > 2. This second idea is based on the observation that clients watch znodes in sets (for example all znodes under a folder). Multiple clients watch the same set and the total number of sets is a couple of orders of magnitude smaller than the total number of znodes. In my scenario, there are about 100 sets. So instead of keeping track of watches at the znode level, keep track of it at the set level. It may mean that get may also need to be implemented at the set level. With this, we can save the watches in 100M. > Are there any other suggestions of solutions? > Thanks > -- This message was sent by Atlassian JIRA (v7.6.3#76005)