Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <49A70E02.5060002@apache.org>
Date: Thu, 26 Feb 2009 13:47:46 -0800
From: Patrick Hunt <phunt@apache.org>
User-Agent: Thunderbird 2.0.0.19 (X11/20090105)
MIME-Version: 1.0
To: zookeeper-user@hadoop.apache.org
Subject: Re: Recommended session timeout
References: <C5C83638.188CF%mahadev@yahoo-inc.com>
	 <49A32522.7000205@apache.org>
	 <92eebe280902232337v2c6e2064oe05775534939cc40@mail.gmail.com>
	 <49A43906.30406@apache.org>
 <92eebe280902261331y63fd4e88ka185dd9b12c97e4d@mail.gmail.com>
In-Reply-To: <92eebe280902261331y63fd4e88ka185dd9b12c97e4d@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

That's very interesting results, a good job sleuthing. You might try the 
concurrent collector?
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting

specifically item 4  "-XX:+UseConcMarkSweepGC"

I've never used this before myself but it's supposed to reduce the gc 
pauses to less than a second. Might require some tuning though...

Patrick


Joey Echeverria wrote:
> I've answered the questions you asked previously below, but I thought
> I would open with the actual culprit now that we found it. When I said
> loading data before, what I was talking about was sending data via
> Thrift to the machine that was getting disconnected from zookeeper.
> This turned out to be the problem. Too much data was being sent in
> short span of time and this caused memory pressure on the heap. This
> increased the fraction of the time that the GC had to run to keep up.
> During a 143 second test, the GC was running for 33 seconds.
> 
> We found this by running tcpdump on both the machine running the
> ensemble server and the machine connecting to zookeeper as a client.
> We deduced it wasn't a network (lost packet) issue, as we never saw
> unmatched packets in our tests. What did see were "long" 2-7 second
> pauses with no packets being sent. We first attempted to up the
> priority of the zookeeper threads to see if that would help. When it
> didn't, we started monitoring the GC time. We don't have a work around
> yet, other than sending data in smaller batches and  using a longer
> sessionTimeout.
> 
> Thanks for all your help!
> 
> -Joey
> 
>> As an experiment try increasing the timeout to say 30 seconds and re-run
>> your tests. Any change?
> 
> 30 seconds and higher works fine.
> 
>> "loading data" - could you explain a bit more about what you mean by this?
>> If you are able to provide enough information for us to replicate we could
>> try it out (also provide info on your ensemble configuration as Mahadev
>> suggested)
> 
> The ensemble config file looks as follows:
> 
> tickTime=2000
> dataDir=/data/zk
> clientPort=2181
> initLimit=5
> syncLimit=2
> skipACL=true
> 
> server.1=<server>1:2888:3888
> ...
> server.7=<server>7:2888:3888
> 
>> You are referring to startConnect in SendThread?
>>
>> We randomly sleep up to 1 second to ensure that the clients don't all storm
>> the server(s) after a bounce.
> 
> That makes some sense, but it might be worth tweaking that parameter
> based on sessionTimeout since 1 second can easily be 10-20% of
> sessionTimeout.
> 
>> 1) configure your test client to connect to 1 server in the ensemble
>> 2) run the srst command on that server
>> 3) run your client test
>> 4) run the stat command on that server
>> 5) if the test takes some time, run the stat a few times during the test
>>  to get more data points
> 
> The problem doesn't appear to be on the server end as max latency
> never went above 5ms. Also, no messages are shown as queued.