zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karol Dudzinski <karoldudzin...@gmail.com>
Subject Re: OutOfMemory Error
Date Sat, 25 Apr 2015 11:12:12 GMT
Hi CP,

The JIRA is https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-2141.

Doesn't sound like the same thing as what you're facing.  However, we also had OOM errors
which was what caused us to start digging through the snapshots in detail.  As far as I can
tell, in your case the only option is to bump up max heap size sufficiently to allow the server
to come up and then delete the rogue entries.  One of the ZK devs may have some other ideas.

Karol

> On 24 Apr 2015, at 22:28, CP Mishra <mishracp@gmail.com> wrote:
> 
> Karol, that's interesting. Can you send the Jira ticket, please?
> 
> In our case, a rogue program added 300k entries via a service that persists
> data in ZK and is meant for only a handful of entries. Now, we are dealing
> with deleting these entries taking up > 3 GB.
> 
> Thanks,
> CP
> 
> On Fri, Apr 24, 2015 at 1:09 PM, Karol Dudzinski <karoldudzinski@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> Do you know if any of the services that use your ZK create ACLs that are
>> potentially unique and one-time-ish?  I recently hit a similar problem and
>> discovered that the DataTree has an ACL cache that never gets anything
>> removed from it.  That was by far and away the largest memory consumer I
>> found when analysing the heap dump.  If this is the case then you should
>> see lots of ACL objects on the heap.
>> 
>> I filed a JIRA for this and keep meaning to submit a patch but sadly
>> haven't got round to it.  As an interim solution, I wrote a tool which uses
>> the DataTree class and the serialisation utils to purge this cache of
>> unused entries.  I my case it shrank the snapshot from 500MB to 12MB!  The
>> time to write the snapshot went from 40 seconds to less than 1 second as a
>> result.
>> 
>> Thanks,
>> Karol
>> 
>> 
>>> On 24 Apr 2015, at 18:45, CP Mishra <mishracp@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am running a 3 node ZK ensemble on 3 VMs (2 CPU, 32GB RAM) in the test
>>> environment. Lately, I have been getting OutOfMemoryError on all three ZK
>>> nodes. ZK has been configured with 6GB heap size. The same ZK ensemble is
>>> shared between Kafka, HDFS HA and another custom service.
>>> 
>>> I analyzed the heap dump and 5.8+ GB is being used by DataTree.  I don't
>>> have a purge policy in place and size of ZK data directory stands at ~14
>> GB
>>> now.  There is enough space on the disk holding ZK data (20% used).
>>> 
>>> As soon as I restart a ZK node, it grows to use all 6GB and starts Full
>> GC
>>> every 1-2 sec. In 3-5 minutes, it throws OOM: GC Overhead exceeded.
>>> 
>>> I would appreciate any help in diagnosing the issue.
>>> 
>>> Thanks,
>>> CP Mishra
>> 

Mime
View raw message