zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karol Dudzinski <karoldudzin...@gmail.com>
Subject Re: What goes in the snapshot?
Date Thu, 26 Feb 2015 13:13:36 GMT
Hi Flavio,

We've done some more analysis using the snapshot formatter and a heap dump and have found
the source of the snapshot bloat.

What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump,
aclKeyMap has as many entries (which is to be expected given how the maps are used) and is
also taking an equally large amount of space though at least aclKeyMap isn't serialised to
the snapshot.

We use a custom authentication provider but because the AuthenticationProvider.matches method
does not provide the path being operated on, we end up sticking the path in the ACL id.  Some
of our apps end up generating a lot of paths for one time use and consequently we end up with
lots of unique ACLs.

The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does
not result in the full list being stored multiple times.  However, these two maps are never
removed from so if an ACL is unique these maps (and the snapshot) grow forever.

We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide
a patch but will need your opinion on the various options:
- create a third map which would be a reference count for the ACLs which can be updated as
needed when creating, deleting or setting ACL.  When the reference count is 0, remove the
entry from all the maps
- use weak references in some shape or form though this is made harder by the fact that ACL
optimisation essentially needs a bidirectional index (hence the two maps).  We've given this
one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which
just sounds wrong and over engineered :)

The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.
 However, doing that in a backwards compatible fashion is not trivial and even though it would
fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general
problem with this optimisation.

Looking forward to hearing your thoughts on this.


> On 22 Feb 2015, at 14:55, Flavio Junqueira <fpjunqueira@yahoo.com.INVALID> wrote:
> Hi Karol,
> It's odd that you have such large snapshots and little data in the data tree. Are you
creating lots of sessions? Right now I can't think of a good reason, I suggest you really
use the snapshot formatter to inspect the snapshot. 
> -Flavio
>> On 22 Feb 2015, at 14:23, Karol Dudzinski <karoldudzinski@gmail.com> wrote:
>> Hi Flavio,
>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight
loop with zero net effect (I.e. It was deleting what it had just created). After stopping
the client, the snapshot never reduced in size so are the deletes in there permanently?
>> Thanks,
>> Karol
>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fpjunqueira@yahoo.com.INVALID>
>>> Hi there,
>>> Perhaps a lot of data has been deleted? In any case, you may want to use the
SnapshotFormatter to check what is in the large snapshot.
>>> -Flavio
>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <karoldudzinski@gmail.com>
>>>> Hi all,
>>>> I was under the impression that the snapshot contained essentially an on-disk
copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while
the mntr four letter word reports an approximate data size in the hundreds of KB and a node
count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>> Thanks,
>>>> Karol

View raw message