Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A7A611446 for ; Fri, 5 Sep 2014 15:40:15 +0000 (UTC) Received: (qmail 516 invoked by uid 500); 5 Sep 2014 15:40:14 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 465 invoked by uid 500); 5 Sep 2014 15:40:14 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 454 invoked by uid 99); 5 Sep 2014 15:40:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 15:40:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bhuffman@etinternational.com designates 65.222.140.81 as permitted sender) Received: from [65.222.140.81] (HELO mail02.etinternational.com) (65.222.140.81) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 15:39:48 +0000 X-Footer: ZXRpbnRlcm5hdGlvbmFsLmNvbQ== Received: from polaris.xmen.eti ([192.168.15.21]) (authenticated user bhuffman@etinternational.com) by mail02.etinternational.com (using TLSv1/SSLv3 with cipher AES128-SHA (128 bits)) for user@zookeeper.apache.org; Fri, 5 Sep 2014 11:39:45 -0400 Message-ID: <5409D940.5080909@etinternational.com> Date: Fri, 05 Sep 2014 11:39:44 -0400 From: "Brian C. Huffman" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: user@zookeeper.apache.org Subject: Re: Consistently running out of heap space References: <5409B95D.7070807@etinternational.com> <1409924845.82416.YahooMailNeo@web142304.mail.bf1.yahoo.com> <5409C9CD.7090402@etinternational.com> <5409D6EC.8060206@etinternational.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks. Very helpful information. -b On 09/05/2014 11:31 AM, Camille Fournier wrote: > You shouldn't use ZK to keep that data around. It's not designed to store a > ton of historical information. Thousands of jobs is no big deal, but > thousands of jobs and their history back through time is not what the > system is designed for. > > C > > > On Fri, Sep 5, 2014 at 11:29 AM, Brian C. Huffman < > bhuffman@etinternational.com> wrote: > >> We use zookeeper to keep track of the jobs we run, and we run thousands of >> jobs. When a job is finished it is no longer needed except for web >> monitoring tools. Is that considered state? We want to keep that around so >> we have a history of completed jobs. Will these stay in memory? >> >> Thanks, >> Brian >> >> >> On 09/05/2014 11:26 AM, Camille Fournier wrote: >> >>> All state is stored in memory in ZK for performance reasons. It sounds >>> like >>> you're putting more data into it than the heap will accommodate. >>> ZK is useful for references to data, but not for large amounts of actual >>> data. It's not designed to be a large data store. >>> >>> Thanks, >>> C >>> >>> >>> On Fri, Sep 5, 2014 at 10:33 AM, Brian C. Huffman < >>> bhuffman@etinternational.com> wrote: >>> >>> Flavio, >>>> I was having the same problems on 3.4.5 so I upgraded to 3.4.6. So it >>>> doesn't seem to be related to the version. >>>> >>>> You might be right about the storing of state. I'm curious - does the >>>> "state" consist of the entire node listing? Is there anyway to tell >>>> zookeeper to keep a node around but only on disk? >>>> >>>> Thanks, >>>> Brian >>>> >>>> >>>> On 09/05/2014 09:47 AM, Flavio Junqueira wrote: >>>> >>>> Brian, >>>>> How much state are you storing in ZK? Can you check the size of the >>>>> snapshots? >>>>> >>>>> One common problem when folks are testing is that they forget to delete >>>>> the data from previous tests, so the state keeps accumulating and the >>>>> server keeps crashing because the state is too large. >>>>> >>>>> Also, consider trying 3.4.5 just to see if it is a problem with 3.4.6 >>>>> alone. >>>>> >>>>> -Flavio >>>>> >>>>> >>>>> On Friday, September 5, 2014 2:23 PM, Brian C. Huffman < >>>>> bhuffman@etinternational.com> wrote: >>>>> >>>>> >>>>> We're running the latest version of the stable 3.4 branch (3.4.6) and >>>>>> have been consistently having problems running out of heap space. >>>>>> >>>>>> We're running a single server (redundancy isn't a concern at this >>>>>> point) >>>>>> and I've tried the defaults (which seems to use Java's default heap of >>>>>> 8GB) as well as limiting to 3GB. Either way the Zookeeper server >>>>>> eventually dies. With larger heap size it seems to take longer to die. >>>>>> >>>>>> Here's the latest trace: >>>>>> 2014-09-05 00:51:11,419 [myid:] - ERROR >>>>>> [SyncThread:0:SyncRequestProcessor@183] - Severe unrecoverable error, >>>>>> exiting >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> at java.util.Arrays.copyOf(Arrays.java:2271) >>>>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream. >>>>>> java:113) >>>>>> at >>>>>> java.io.ByteArrayOutputStream.ensureCapacity( >>>>>> ByteArrayOutputStream.java:93) >>>>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream. >>>>>> java:140) >>>>>> at java.io.DataOutputStream.write(DataOutputStream.java:107) >>>>>> at java.io.FilterOutputStream.write(FilterOutputStream.java:97) >>>>>> at >>>>>> org.apache.jute.BinaryOutputArchive.writeBuffer( >>>>>> BinaryOutputArchive.java:119) >>>>>> at org.apache.zookeeper.txn.Txn.serialize(Txn.java:49) >>>>>> at >>>>>> org.apache.jute.BinaryOutputArchive.writeRecord( >>>>>> BinaryOutputArchive.java:123) >>>>>> at org.apache.zookeeper.txn.MultiTxn.serialize(MultiTxn.java:44) >>>>>> at >>>>>> org.apache.zookeeper.server.persistence.Util. >>>>>> marshallTxnEntry(Util.java: >>>>>> 263) >>>>>> at >>>>>> org.apache.zookeeper.server.persistence.FileTxnLog.append( >>>>>> FileTxnLog.java:216) >>>>>> at >>>>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog. >>>>>> append(FileTxnSnapLog.java:314) >>>>>> at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase. >>>>>> java:476) >>>>>> at >>>>>> org.apache.zookeeper.server.SyncRequestProcessor.run( >>>>>> SyncRequestProcessor.java:140) >>>>>> 2014-09-05 00:51:07,866 [myid:] - WARN >>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught >>>>>> end of stream exception >>>>>> EndOfStreamException: Unable to read additional data from client >>>>>> sessionid 0x14837ac98960071, likely client has closed socket >>>>>> at >>>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>>>>> at >>>>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run( >>>>>> NIOServerCnxnFactory.java:208) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> Here's my configuration: >>>>>> [user@xyz conf]$ grep -v '^#' zoo.cfg >>>>>> tickTime=2000 >>>>>> initLimit=10 >>>>>> syncLimit=5 >>>>>> dataDir=/usr/local/var/zookeeper >>>>>> clientPort=2181 >>>>>> autopurge.snapRetainCount=3 >>>>>> autopurge.purgeInterval=1 >>>>>> >>>>>> Can anyone suggest what the issue could be? >>>>>> >>>>>> Thanks, >>>>>> Brian >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>