zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CP Mishra <mishr...@gmail.com>
Subject Re: OutOfMemory Error
Date Mon, 27 Apr 2015 02:02:57 GMT
Thanks Karol.

We ended up doing similar thing as you mentioned.
We restarted ZK on a different port with 2x heap size & quite large
initLimit & syncLimit values and deleted unnecessary znodes. Snapshot size
is back to 250m now.

Any recommendations for ZK data browser tools?

CP

On Sat, Apr 25, 2015 at 6:12 AM, Karol Dudzinski <karoldudzinski@gmail.com>
wrote:

> Hi CP,
>
> The JIRA is
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-2141
> .
>
> Doesn't sound like the same thing as what you're facing.  However, we also
> had OOM errors which was what caused us to start digging through the
> snapshots in detail.  As far as I can tell, in your case the only option is
> to bump up max heap size sufficiently to allow the server to come up and
> then delete the rogue entries.  One of the ZK devs may have some other
> ideas.
>
> Karol
>
> > On 24 Apr 2015, at 22:28, CP Mishra <mishracp@gmail.com> wrote:
> >
> > Karol, that's interesting. Can you send the Jira ticket, please?
> >
> > In our case, a rogue program added 300k entries via a service that
> persists
> > data in ZK and is meant for only a handful of entries. Now, we are
> dealing
> > with deleting these entries taking up > 3 GB.
> >
> > Thanks,
> > CP
> >
> > On Fri, Apr 24, 2015 at 1:09 PM, Karol Dudzinski <
> karoldudzinski@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Do you know if any of the services that use your ZK create ACLs that are
> >> potentially unique and one-time-ish?  I recently hit a similar problem
> and
> >> discovered that the DataTree has an ACL cache that never gets anything
> >> removed from it.  That was by far and away the largest memory consumer I
> >> found when analysing the heap dump.  If this is the case then you should
> >> see lots of ACL objects on the heap.
> >>
> >> I filed a JIRA for this and keep meaning to submit a patch but sadly
> >> haven't got round to it.  As an interim solution, I wrote a tool which
> uses
> >> the DataTree class and the serialisation utils to purge this cache of
> >> unused entries.  I my case it shrank the snapshot from 500MB to 12MB!
> The
> >> time to write the snapshot went from 40 seconds to less than 1 second
> as a
> >> result.
> >>
> >> Thanks,
> >> Karol
> >>
> >>
> >>> On 24 Apr 2015, at 18:45, CP Mishra <mishracp@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am running a 3 node ZK ensemble on 3 VMs (2 CPU, 32GB RAM) in the
> test
> >>> environment. Lately, I have been getting OutOfMemoryError on all three
> ZK
> >>> nodes. ZK has been configured with 6GB heap size. The same ZK ensemble
> is
> >>> shared between Kafka, HDFS HA and another custom service.
> >>>
> >>> I analyzed the heap dump and 5.8+ GB is being used by DataTree.  I
> don't
> >>> have a purge policy in place and size of ZK data directory stands at
> ~14
> >> GB
> >>> now.  There is enough space on the disk holding ZK data (20% used).
> >>>
> >>> As soon as I restart a ZK node, it grows to use all 6GB and starts Full
> >> GC
> >>> every 1-2 sec. In 3-5 minutes, it throws OOM: GC Overhead exceeded.
> >>>
> >>> I would appreciate any help in diagnosing the issue.
> >>>
> >>> Thanks,
> >>> CP Mishra
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message