Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 93654 invoked from network); 25 Jan 2010 21:15:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jan 2010 21:15:01 -0000 Received: (qmail 14661 invoked by uid 500); 25 Jan 2010 21:15:00 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 14628 invoked by uid 500); 25 Jan 2010 21:15:00 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 14618 invoked by uid 99); 25 Jan 2010 21:15:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 21:15:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 21:14:54 +0000 Received: by pxi42 with SMTP id 42so6035043pxi.5 for ; Mon, 25 Jan 2010 13:14:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type; bh=iIb2OM1/itV1Me4udOywcBVoPOhGfzU97lpPpcB0Ed8=; b=p48xsFk45EfGvdn4Jtf+9CqP2oMBfhrqmE/axY7AtV8rcdkpToxT7jejLNcUaf116e vzaBnC/gF03vFWzryvxFFOLOpBnoBql+C5iMeyGwilF5K05hbaSRwlFEMl+msy1KA+Fy iQXjxNP2rE2I9s4zT/VSKPslPNk9b7BtdVypM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=gbGBMEMglGYE0XRLEFGLw3g8UBhX3Q6Az5DzQJ+sOMDE815F9ZNJZ7aXkoRLjiVU3q LLBeSJ+MpXTQhxRxWw7/n8l07Yht0HtQ6vmsT3y+02pSvVUBNq2LRglBGA9NjFuD8gCO gYT2Nw9xglqMFlK/EAs0/Cz2lS6A5FAF3r4h4= MIME-Version: 1.0 Received: by 10.115.102.8 with SMTP id e8mr4913585wam.44.1264454074233; Mon, 25 Jan 2010 13:14:34 -0800 (PST) In-Reply-To: <4B5DD841.3000709@apache.org> References: <34e4e27f1001221740r51fdb34fl2b2104ca9000e828@mail.gmail.com> <34e4e27f1001221801n4c2e9918qd0241866654bec66@mail.gmail.com> <4B5DD841.3000709@apache.org> From: Ted Dunning Date: Mon, 25 Jan 2010 13:14:14 -0800 Message-ID: Subject: Re: Server exception when closing session To: zookeeper-user@hadoop.apache.org Cc: jscheid@velocetechnologies.com Content-Type: multipart/alternative; boundary=0016e64c2782edf2a4047e03a74d --0016e64c2782edf2a4047e03a74d Content-Type: text/plain; charset=UTF-8 Be very cautious about misdirection here. It is easy to focus on the ZK server-side GC's. In my experience, I have had many more GC related ZK problems caused by *client* side GC. If the client checks out for a minute, you get disconnects and session expiration which is good for debugging that code, but generally indicates a really bad user experience. Lots of times that bad user experience is somewhat covered up by your load balancer and other general redundancy to you may notice it first from ZK forcing you to think about these things. On Mon, Jan 25, 2010 at 9:43 AM, Patrick Hunt wrote: > GC and disk IO (transactional log in particular) will cause significant > latency in some cases. See this for details on the types of things you > should look at: > > http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting > > I've seen cases where the JVM will pause for 2+ minutes for GC, in some > cases I've seen 4+ and I've heard of worse than that. Tuning GC (in > particular using incremental/cms gc) is critical for consistently low > latencies. > > Patrick > > Josh Scheid wrote: > >> On Fri, Jan 22, 2010 at 17:48, Mahadev Konar >> wrote: >> >>> The server latency does seem huge. What os and hardware are you running >>> it >>> on? >>> >> >> RHEL4 2.8GHz 2-core AMD. 8GB RAM. >> I will check with my admin for evidence of swap activity, but I don't >> anticipate any. >> >> This is bursty. I'm currently seeing ~200 connections, but maxlat in >> the last hour has been 185ms with 4ms avg. >> >> What is usage model of zookeeper? >>> >> >> Distributed lock service. Using the lock recipe. Hosts hold a lock >> for 5s to a couple of minutes with zero to dozens of waiters. >> >> How much memory are you allocating to the server? >>> >> >> That's a good question. I'm not an expert at java deployment. I just >> use zkServer.sh defaults. >> sun-jre 1.6.0_14. It's taking 1188MB of virtual memory right now, >> 100MB resident. >> >> The debug well exacerbate the problem. >>> >> >> OK. >> >> A dedicated disk means the following: >>> Zookeeper has snapshots and transaction logs. The datadir is the >>> directory >>> that stores the transaction logs. Its highly recommended that this >>> directory >>> be on a separate disk that isnt being used by any other process. The >>> snapshots can sit on a disk that is being used by the OS and can be >>> shared. >>> >> >> Yeah, I understand. I just need to get that set up and was hoping >> that my load wouldn't warrant it. >> >> Also, Pat ran some tests for serve lantecies at: >>> >>> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview >>> >>> You can take a look at that as well and see what the expected performance >>> should be for your workload. >>> >> >> I will take a look at that. Thank you for your time. >> >> -Josh >> > -- Ted Dunning, CTO DeepDyve --0016e64c2782edf2a4047e03a74d--