Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 260 invoked from network); 26 Feb 2009 21:48:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Feb 2009 21:48:29 -0000 Received: (qmail 26785 invoked by uid 500); 26 Feb 2009 21:48:28 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 26764 invoked by uid 500); 26 Feb 2009 21:48:28 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 26641 invoked by uid 99); 26 Feb 2009 21:48:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Feb 2009 13:48:27 -0800 X-ASF-Spam-Status: No, hits=-2.8 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [207.126.228.150] (HELO rsmtp2.corp.yahoo.com) (207.126.228.150) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Feb 2009 21:48:21 +0000 Received: from [172.21.148.146] (wlanvpn-mc2e-246-146.corp.yahoo.com [172.21.148.146]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.8/8.13.8/y.rout) with ESMTP id n1QLllUT081046 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 26 Feb 2009 13:47:47 -0800 (PST) Message-ID: <49A70E02.5060002@apache.org> Date: Thu, 26 Feb 2009 13:47:46 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org Subject: Re: Recommended session timeout References: <49A32522.7000205@apache.org> <92eebe280902232337v2c6e2064oe05775534939cc40@mail.gmail.com> <49A43906.30406@apache.org> <92eebe280902261331y63fd4e88ka185dd9b12c97e4d@mail.gmail.com> In-Reply-To: <92eebe280902261331y63fd4e88ka185dd9b12c97e4d@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org That's very interesting results, a good job sleuthing. You might try the concurrent collector? http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting specifically item 4 "-XX:+UseConcMarkSweepGC" I've never used this before myself but it's supposed to reduce the gc pauses to less than a second. Might require some tuning though... Patrick Joey Echeverria wrote: > I've answered the questions you asked previously below, but I thought > I would open with the actual culprit now that we found it. When I said > loading data before, what I was talking about was sending data via > Thrift to the machine that was getting disconnected from zookeeper. > This turned out to be the problem. Too much data was being sent in > short span of time and this caused memory pressure on the heap. This > increased the fraction of the time that the GC had to run to keep up. > During a 143 second test, the GC was running for 33 seconds. > > We found this by running tcpdump on both the machine running the > ensemble server and the machine connecting to zookeeper as a client. > We deduced it wasn't a network (lost packet) issue, as we never saw > unmatched packets in our tests. What did see were "long" 2-7 second > pauses with no packets being sent. We first attempted to up the > priority of the zookeeper threads to see if that would help. When it > didn't, we started monitoring the GC time. We don't have a work around > yet, other than sending data in smaller batches and using a longer > sessionTimeout. > > Thanks for all your help! > > -Joey > >> As an experiment try increasing the timeout to say 30 seconds and re-run >> your tests. Any change? > > 30 seconds and higher works fine. > >> "loading data" - could you explain a bit more about what you mean by this? >> If you are able to provide enough information for us to replicate we could >> try it out (also provide info on your ensemble configuration as Mahadev >> suggested) > > The ensemble config file looks as follows: > > tickTime=2000 > dataDir=/data/zk > clientPort=2181 > initLimit=5 > syncLimit=2 > skipACL=true > > server.1=1:2888:3888 > ... > server.7=7:2888:3888 > >> You are referring to startConnect in SendThread? >> >> We randomly sleep up to 1 second to ensure that the clients don't all storm >> the server(s) after a bounce. > > That makes some sense, but it might be worth tweaking that parameter > based on sessionTimeout since 1 second can easily be 10-20% of > sessionTimeout. > >> 1) configure your test client to connect to 1 server in the ensemble >> 2) run the srst command on that server >> 3) run your client test >> 4) run the stat command on that server >> 5) if the test takes some time, run the stat a few times during the test >> to get more data points > > The problem doesn't appear to be on the server end as max latency > never went above 5ms. Also, no messages are shown as queued.