zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CP Mishra <mishr...@gmail.com>
Subject Re: OutOfMemory Error
Date Fri, 24 Apr 2015 21:28:59 GMT
Karol, that's interesting. Can you send the Jira ticket, please?

In our case, a rogue program added 300k entries via a service that persists
data in ZK and is meant for only a handful of entries. Now, we are dealing
with deleting these entries taking up > 3 GB.

Thanks,
CP

On Fri, Apr 24, 2015 at 1:09 PM, Karol Dudzinski <karoldudzinski@gmail.com>
wrote:

> Hi,
>
> Do you know if any of the services that use your ZK create ACLs that are
> potentially unique and one-time-ish?  I recently hit a similar problem and
> discovered that the DataTree has an ACL cache that never gets anything
> removed from it.  That was by far and away the largest memory consumer I
> found when analysing the heap dump.  If this is the case then you should
> see lots of ACL objects on the heap.
>
> I filed a JIRA for this and keep meaning to submit a patch but sadly
> haven't got round to it.  As an interim solution, I wrote a tool which uses
> the DataTree class and the serialisation utils to purge this cache of
> unused entries.  I my case it shrank the snapshot from 500MB to 12MB!  The
> time to write the snapshot went from 40 seconds to less than 1 second as a
> result.
>
> Thanks,
> Karol
>
>
> > On 24 Apr 2015, at 18:45, CP Mishra <mishracp@gmail.com> wrote:
> >
> > Hi,
> >
> > I am running a 3 node ZK ensemble on 3 VMs (2 CPU, 32GB RAM) in the test
> > environment. Lately, I have been getting OutOfMemoryError on all three ZK
> > nodes. ZK has been configured with 6GB heap size. The same ZK ensemble is
> > shared between Kafka, HDFS HA and another custom service.
> >
> > I analyzed the heap dump and 5.8+ GB is being used by DataTree.  I don't
> > have a purge policy in place and size of ZK data directory stands at ~14
> GB
> > now.  There is enough space on the disk holding ZK data (20% used).
> >
> > As soon as I restart a ZK node, it grows to use all 6GB and starts Full
> GC
> > every 1-2 sec. In 3-5 minutes, it throws OOM: GC Overhead exceeded.
> >
> > I would appreciate any help in diagnosing the issue.
> >
> > Thanks,
> > CP Mishra
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message