Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of bhuffman@etinternational.com
 designates 65.222.140.81 as permitted sender)
Message-ID: <5409D940.5080909@etinternational.com>
Date: Fri, 05 Sep 2014 11:39:44 -0400
From: "Brian C. Huffman" <bhuffman@etinternational.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: user@zookeeper.apache.org
Subject: Re: Consistently running out of heap space
References: <5409B95D.7070807@etinternational.com>
	<1409924845.82416.YahooMailNeo@web142304.mail.bf1.yahoo.com>
	<5409C9CD.7090402@etinternational.com>
	<CAKF1A=vJh9H_NwyrL0pRjSqv8Yfae23bpjifB+wBtDCG4wD7uA@mail.gmail.com>
	<5409D6EC.8060206@etinternational.com>
 <CAKF1A=tatj_0JAyPQq3XLVVnH0KdZt1SpdUhhXs8bEh=uhkymw@mail.gmail.com>
In-Reply-To: 
 <CAKF1A=tatj_0JAyPQq3XLVVnH0KdZt1SpdUhhXs8bEh=uhkymw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Thanks.  Very helpful information.

-b

On 09/05/2014 11:31 AM, Camille Fournier wrote:
> You shouldn't use ZK to keep that data around. It's not designed to store a
> ton of historical information. Thousands of jobs is no big deal, but
> thousands of jobs and their history back through time is not what the
> system is designed for.
>
> C
>
>
> On Fri, Sep 5, 2014 at 11:29 AM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
>> We use zookeeper to keep track of the jobs we run, and we run thousands of
>> jobs. When a job is finished it is no longer needed except for web
>> monitoring tools. Is that considered state? We want to keep that around so
>> we have a history of completed jobs. Will these stay in memory?
>>
>> Thanks,
>> Brian
>>
>>
>> On 09/05/2014 11:26 AM, Camille Fournier wrote:
>>
>>> All state is stored in memory in ZK for performance reasons. It sounds
>>> like
>>> you're putting more data into it than the heap will accommodate.
>>> ZK is useful for references to data, but not for large amounts of actual
>>> data. It's not designed to be a large data store.
>>>
>>> Thanks,
>>> C
>>>
>>>
>>> On Fri, Sep 5, 2014 at 10:33 AM, Brian C. Huffman <
>>> bhuffman@etinternational.com> wrote:
>>>
>>>   Flavio,
>>>> I was having the same problems on 3.4.5 so I upgraded to 3.4.6.  So it
>>>> doesn't seem to be related to the version.
>>>>
>>>> You might be right about the storing of state.  I'm curious - does the
>>>> "state" consist of the entire node listing? Is there anyway to tell
>>>> zookeeper to keep a node around but only on disk?
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>>> On 09/05/2014 09:47 AM, Flavio Junqueira wrote:
>>>>
>>>>   Brian,
>>>>> How much state are you storing in ZK? Can you check the size of the
>>>>> snapshots?
>>>>>
>>>>> One common problem when folks are testing is that they forget to delete
>>>>> the data from previous tests, so the state keeps accumulating and the
>>>>> server keeps crashing because the state is too large.
>>>>>
>>>>> Also, consider trying 3.4.5 just to see if it is a problem with 3.4.6
>>>>> alone.
>>>>>
>>>>> -Flavio
>>>>>
>>>>>
>>>>> On Friday, September 5, 2014 2:23 PM, Brian C. Huffman <
>>>>> bhuffman@etinternational.com> wrote:
>>>>>
>>>>>
>>>>>   We're running the latest version of the stable 3.4 branch (3.4.6) and
>>>>>> have been consistently having problems running out of heap space.
>>>>>>
>>>>>> We're running a single server (redundancy isn't a concern at this
>>>>>> point)
>>>>>> and I've tried the defaults (which seems to use Java's default heap of
>>>>>> 8GB) as well as limiting to 3GB.  Either way the Zookeeper server
>>>>>> eventually dies.  With larger heap size it seems to take longer to die.
>>>>>>
>>>>>> Here's the latest trace:
>>>>>> 2014-09-05 00:51:11,419 [myid:] - ERROR
>>>>>> [SyncThread:0:SyncRequestProcessor@183] - Severe unrecoverable error,
>>>>>> exiting
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>        at java.util.Arrays.copyOf(Arrays.java:2271)
>>>>>>        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.
>>>>>> java:113)
>>>>>>        at
>>>>>> java.io.ByteArrayOutputStream.ensureCapacity(
>>>>>> ByteArrayOutputStream.java:93)
>>>>>>        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.
>>>>>> java:140)
>>>>>>        at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>>>>        at
>>>>>> org.apache.jute.BinaryOutputArchive.writeBuffer(
>>>>>> BinaryOutputArchive.java:119)
>>>>>>        at org.apache.zookeeper.txn.Txn.serialize(Txn.java:49)
>>>>>>        at
>>>>>> org.apache.jute.BinaryOutputArchive.writeRecord(
>>>>>> BinaryOutputArchive.java:123)
>>>>>>        at org.apache.zookeeper.txn.MultiTxn.serialize(MultiTxn.java:44)
>>>>>>        at
>>>>>> org.apache.zookeeper.server.persistence.Util.
>>>>>> marshallTxnEntry(Util.java:
>>>>>> 263)
>>>>>>        at
>>>>>> org.apache.zookeeper.server.persistence.FileTxnLog.append(
>>>>>> FileTxnLog.java:216)
>>>>>>        at
>>>>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.
>>>>>> append(FileTxnSnapLog.java:314)
>>>>>>        at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.
>>>>>> java:476)
>>>>>>        at
>>>>>> org.apache.zookeeper.server.SyncRequestProcessor.run(
>>>>>> SyncRequestProcessor.java:140)
>>>>>> 2014-09-05 00:51:07,866 [myid:] - WARN
>>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>>>>>> end of stream exception
>>>>>> EndOfStreamException: Unable to read additional data from client
>>>>>> sessionid 0x14837ac98960071, likely client has closed socket
>>>>>>        at
>>>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>>>>>>        at
>>>>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(
>>>>>> NIOServerCnxnFactory.java:208)
>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> Here's my configuration:
>>>>>> [user@xyz conf]$ grep -v '^#' zoo.cfg
>>>>>> tickTime=2000
>>>>>> initLimit=10
>>>>>> syncLimit=5
>>>>>> dataDir=/usr/local/var/zookeeper
>>>>>> clientPort=2181
>>>>>> autopurge.snapRetainCount=3
>>>>>> autopurge.purgeInterval=1
>>>>>>
>>>>>> Can anyone suggest what the issue could be?
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>