From user-return-8619-apmail-zookeeper-user-archive=zookeeper.apache.org@zookeeper.apache.org Thu Feb 26 13:14:55 2015 Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 500BA173F9 for ; Thu, 26 Feb 2015 13:14:55 +0000 (UTC) Received: (qmail 15289 invoked by uid 500); 26 Feb 2015 13:14:54 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 15242 invoked by uid 500); 26 Feb 2015 13:14:54 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 15229 invoked by uid 99); 26 Feb 2015 13:14:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Feb 2015 13:14:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of karoldudzinski@gmail.com designates 74.125.82.170 as permitted sender) Received: from [74.125.82.170] (HELO mail-we0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Feb 2015 13:14:47 +0000 Received: by wesu56 with SMTP id u56so10534260wes.10 for ; Thu, 26 Feb 2015 05:13:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cKbw1VWRKEZenziTTElJSHwcCZRdAEzx0C/id3y/+jY=; b=hgl5qI+nLSVwKujPELJPD2nkBcMB92jBnmIpyZt4mAS+bTb1qnPNAy4OfiJ7pHJasL QwjzdFinJePz4AtphmjMFIvmBJ7swNQSGOz6rzemywRwlqna7pxoumA/Xyq5aR46akZr YRkSa48nj5Pa+kgqldiWgQKZ+k88loWe++1+p4Z256A3eYI4v/NMXsWeVLLJbPMEGEyI FcFloEHpkvJVi/EkFeOef6ElHNop8HJIl413lVhBpN2aVwh46amFZOeI+4GwHEqGLjh4 SNrLOlblJd2CpjQOxilO/h0+gtzDxPOzIxWRtmWh1xB1U2y+8nJeQYkyG6bXs7sEHJm9 Bbsw== X-Received: by 10.194.81.1 with SMTP id v1mr16216518wjx.50.1424956421994; Thu, 26 Feb 2015 05:13:41 -0800 (PST) Received: from [10.7.96.132] ([85.255.232.69]) by mx.google.com with ESMTPSA id lu13sm2785172wic.10.2015.02.26.05.13.37 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Feb 2015 05:13:40 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: What goes in the snapshot? From: Karol Dudzinski X-Mailer: iPhone Mail (12B466) In-Reply-To: Date: Thu, 26 Feb 2015 13:13:36 +0000 Cc: "adam@milne-smith.co.uk" Content-Transfer-Encoding: quoted-printable Message-Id: References: <7E855836-307F-494E-A3EF-931C0B0F5F99@gmail.com> <981D7A79-37D5-4810-AC12-3DFF8952D591@gmail.com> To: "user@zookeeper.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org Hi Flavio, We've done some more analysis using the snapshot formatter and a heap dump a= nd have found the source of the snapshot bloat. What is taking the majority of the space is the longKeyMap from DataTree. I= n the heapdump, aclKeyMap has as many entries (which is to be expected given= how the maps are used) and is also taking an equally large amount of space t= hough at least aclKeyMap isn't serialised to the snapshot. We use a custom authentication provider but because the AuthenticationProvid= er.matches method does not provide the path being operated on, we end up sti= cking the path in the ACL id. Some of our apps end up generating a lot of p= aths for one time use and consequently we end up with lots of unique ACLs. The two ACL maps in DataTree seem to be an optimisation so that repeated usa= ge of ACLs does not result in the full list being stored multiple times. Ho= wever, these two maps are never removed from so if an ACL is unique these ma= ps (and the snapshot) grow forever. We're quite keen on fixing this as it's causing us lots of issues and we're h= appy to provide a patch but will need your opinion on the various options: - create a third map which would be a reference count for the ACLs which can= be updated as needed when creating, deleting or setting ACL. When the refe= rence count is 0, remove the entry from all the maps - use weak references in some shape or form though this is made harder by th= e fact that ACL optimisation essentially needs a bidirectional index (hence t= he two maps). We've given this one lots of thought but it would really requ= ire something like a ConcurrentWeakBiHashMap which just sounds wrong and ove= r engineered :) The other fix that could be made is to pass the path being operated on to th= e AuthenticationProvider. However, doing that in a backwards compatible fas= hion is not trivial and even though it would fix my problem (by allowing me t= o remove the path from the ACL id) it wouldn't fix the general problem with t= his optimisation. Looking forward to hearing your thoughts on this. Thanks, Karol > On 22 Feb 2015, at 14:55, Flavio Junqueira = wrote: >=20 > Hi Karol, >=20 > It's odd that you have such large snapshots and little data in the data tr= ee. Are you creating lots of sessions? Right now I can't think of a good rea= son, I suggest you really use the snapshot formatter to inspect the snapshot= .=20 >=20 > -Flavio >=20 >> On 22 Feb 2015, at 14:23, Karol Dudzinski wrot= e: >>=20 >> Hi Flavio, >>=20 >> Yes, one of ours clients had a bug which caused it to go into a create/de= lete tight loop with zero net effect (I.e. It was deleting what it had just c= reated). After stopping the client, the snapshot never reduced in size so ar= e the deletes in there permanently? >>=20 >> Thanks, >> Karol >>=20 >>=20 >>> On 22 Feb 2015, at 14:05, Flavio Junqueira wrote: >>>=20 >>> Hi there, >>>=20 >>> Perhaps a lot of data has been deleted? In any case, you may want to use= the SnapshotFormatter to check what is in the large snapshot. >>>=20 >>> -Flavio >>>=20 >>>> On 22 Feb 2015, at 10:44, Karol Dudzinski wr= ote: >>>>=20 >>>> Hi all, >>>>=20 >>>> I was under the impression that the snapshot contained essentially an o= n-disk copy of all the data. However, one of our clusters has a snapshot wh= ich is over 1GB while the mntr four letter word reports an approximate data s= ize in the hundreds of KB and a node count in the low thousands. So what el= se goes into the snapshot and how can I slim it down? >>>>=20 >>>> Thanks, >>>> Karol >=20