Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com
 designates 209.85.222.195 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=TyZRVJOtfhVn5ljWnYRo6ipWXYIFzQqbyIZldynhoeQJ8hVKrPeEp+DBDVq2yDyyH4
         /5PUqbpEGVZW6w0S4bapx4Jx06EqI/2+YChLbu8BUNGBHpznftFrcLgSdk6D7Cs5TFhZ
         DU/dTxof/gPbEdYND65nwIAS1lpHY/NN0SAbM=
MIME-Version: 1.0
In-Reply-To: <4AF9BEB6.9060103@apache.org>
References: <C71A2C11.28294%mahadev@yahoo-inc.com>
 <OF1B89DB71.49CD1CC9-ON8825766A.000199CF-8825766A.0001EC66@us.ibm.com>
	<c7d45fc70911091624o5fd1c37am9f7455f333dd7c75@mail.gmail.com>
	<4AF8B9DF.50002@apache.org>
 <c7d45fc70911091719u4ca6f12cl56f5ab79fbf45624@mail.gmail.com>
	<4AF8FE63.4060905@apache.org>
 <c7d45fc70911100915t3142c4bfm951feb76d6fdf036@mail.gmail.com>
	<4AF9A305.5040506@apache.org>
 <c7d45fc70911101100s5fb8b502md2ea3c59002ff2f7@mail.gmail.com>
	<4AF9BEB6.9060103@apache.org>
From: Ted Dunning <ted.dunning@gmail.com>
Date: Tue, 10 Nov 2009 11:45:12 -0800
Message-ID: <c7d45fc70911101145r75a8462an35adc748c6f19cd1@mail.gmail.com>
Subject: Re: ZK on EC2
To: zookeeper-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016e64b9046930b540478098d89

--0016e64b9046930b540478098d89
Content-Type: text/plain; charset=UTF-8

Several of our search engines use pretty large heaps (12-24GB).  That means
that if they *ever* do a full collection, disaster ensues because it can
take so long.

That means that we have to use concurrent collectors as much as possible and
make sure that the concurrent collectors get all the ephemeral garbage.  One
server, for instance, uses the following java options:

      -verbose:gc
      -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution

These options give us lots of detail about what is happening in the
collections.  Most importantly, we need to know that the tenuring
distribution never has any significant tail of objects that might survive
into the space that will cause a full collection.  This is pretty safe in
general because our servers either create objects to respond to a single
request or create cached items that survive essentially forever.

      -XX:+UseParNewGC -XX:+UseConcMarkSweepGC

Concurrent collectors are critical.  We use the hbase recommendations here.

      -XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=6

Max tenuring threshold is related to what we saw on the tenuring
distribution.  We very rarely see any objects last 4 collections so we set
it so that it would have to last two more collections in order to become
tenured.  The survivor ratio is related to this and is set based on
recommendations for non-stop, low latency servers.

      -XX:CMSInitiatingOccupancyFraction=60
-XX:+UseCMSInitiatingOccupancyOnly

CMS collections have couple of ways to be triggered.  We limit it to a
single way to make the world simpler.  Again, this is taken from outside
recommendations from the hbase guys and other commentors on the web.

      -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC

I doubt that these are important.  It is always nice to get more information
and I want to avoid any possibility of some library triggering a huge
collection.

      -XX:ParallelGCThreads=8

If the parallel GC needs horsepower, I want it to get it.

      -Xdebug

Very rarely useful, but a royal pain if not installed.  I don't know if it
has a performance impact (I think not).

      -Xms8000m -Xmx8000m

Setting the minimum heap helps avoid full GC's during the early life of the
server.


On Tue, Nov 10, 2009 at 11:27 AM, Patrick Hunt <phunt@apache.org> wrote:

> Can you elaborate on "gc tuning" - you are using the incremental collector?
>
> Patrick
>
>
> Ted Dunning wrote:
>
>> The server side is a fairly standard (but old) config:
>>
>> tickTime=2000
>> dataDir=/home/zookeeper/
>> clientPort=2181
>> initLimit=5
>> syncLimit=2
>>
>> Most of our clients now use 5 seconds as the timeout, but I think that we
>> went to longer timeouts in the past.  Without digging in to determine the
>> truth of the matter, my guess is that we needed the longer timeouts before
>> we tuned the GC parameters and that after tuning GC, we were able to
>> return
>> to a more reasonable timeout.  In retrospect, I think that we blamed EC2
>> for
>> some of our own GC misconfiguration.
>>
>> I would not use our configuration here as canonical since we didn't apply
>> a
>> whole lot of brainpower to this problem.
>>
>> On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt <phunt@apache.org> wrote:
>>
>>  Ted, could you provide your configuration information for the cluster
>>> (incl
>>> the client timeout you use), if you're willing I'd be happy to put this
>>> up
>>> on the wiki for others interested in running in EC2.
>>>
>>>
>>
>>
>>


-- 
Ted Dunning, CTO
DeepDyve

--0016e64b9046930b540478098d89--