Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 4483 invoked from network); 10 Nov 2009 19:46:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Nov 2009 19:46:02 -0000 Received: (qmail 41676 invoked by uid 500); 10 Nov 2009 19:46:02 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 41647 invoked by uid 500); 10 Nov 2009 19:46:02 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 41637 invoked by uid 99); 10 Nov 2009 19:46:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 19:46:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.222.195 as permitted sender) Received: from [209.85.222.195] (HELO mail-pz0-f195.google.com) (209.85.222.195) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 19:45:53 +0000 Received: by pzk33 with SMTP id 33so240160pzk.2 for ; Tue, 10 Nov 2009 11:45:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=AvknHaA/r86Sgl87TexxD4Ybgd2IxCPlS3CV3asLrzU=; b=j1CB7VHfk3oan8pZMhrZ0FZOrOAlt+XA7ujTBKQm3zhXz3HSzfhloPRGi77cZ/dcj1 g0j8MNRnDoUO3BqO/6l/GblsIGuyuwCnFVpwWGgx9Uq7cO2ixVgU0KtVwU4JCQ68s17V s4hNrjzUhWsjRtmYmN2HSfcNZY7bNeYzetdJ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=TyZRVJOtfhVn5ljWnYRo6ipWXYIFzQqbyIZldynhoeQJ8hVKrPeEp+DBDVq2yDyyH4 /5PUqbpEGVZW6w0S4bapx4Jx06EqI/2+YChLbu8BUNGBHpznftFrcLgSdk6D7Cs5TFhZ DU/dTxof/gPbEdYND65nwIAS1lpHY/NN0SAbM= MIME-Version: 1.0 Received: by 10.115.100.30 with SMTP id c30mr870248wam.211.1257882332105; Tue, 10 Nov 2009 11:45:32 -0800 (PST) In-Reply-To: <4AF9BEB6.9060103@apache.org> References: <4AF8B9DF.50002@apache.org> <4AF8FE63.4060905@apache.org> <4AF9A305.5040506@apache.org> <4AF9BEB6.9060103@apache.org> From: Ted Dunning Date: Tue, 10 Nov 2009 11:45:12 -0800 Message-ID: Subject: Re: ZK on EC2 To: zookeeper-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64b9046930b540478098d89 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64b9046930b540478098d89 Content-Type: text/plain; charset=UTF-8 Several of our search engines use pretty large heaps (12-24GB). That means that if they *ever* do a full collection, disaster ensues because it can take so long. That means that we have to use concurrent collectors as much as possible and make sure that the concurrent collectors get all the ephemeral garbage. One server, for instance, uses the following java options: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution These options give us lots of detail about what is happening in the collections. Most importantly, we need to know that the tenuring distribution never has any significant tail of objects that might survive into the space that will cause a full collection. This is pretty safe in general because our servers either create objects to respond to a single request or create cached items that survive essentially forever. -XX:+UseParNewGC -XX:+UseConcMarkSweepGC Concurrent collectors are critical. We use the hbase recommendations here. -XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=6 Max tenuring threshold is related to what we saw on the tenuring distribution. We very rarely see any objects last 4 collections so we set it so that it would have to last two more collections in order to become tenured. The survivor ratio is related to this and is set based on recommendations for non-stop, low latency servers. -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly CMS collections have couple of ways to be triggered. We limit it to a single way to make the world simpler. Again, this is taken from outside recommendations from the hbase guys and other commentors on the web. -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC I doubt that these are important. It is always nice to get more information and I want to avoid any possibility of some library triggering a huge collection. -XX:ParallelGCThreads=8 If the parallel GC needs horsepower, I want it to get it. -Xdebug Very rarely useful, but a royal pain if not installed. I don't know if it has a performance impact (I think not). -Xms8000m -Xmx8000m Setting the minimum heap helps avoid full GC's during the early life of the server. On Tue, Nov 10, 2009 at 11:27 AM, Patrick Hunt wrote: > Can you elaborate on "gc tuning" - you are using the incremental collector? > > Patrick > > > Ted Dunning wrote: > >> The server side is a fairly standard (but old) config: >> >> tickTime=2000 >> dataDir=/home/zookeeper/ >> clientPort=2181 >> initLimit=5 >> syncLimit=2 >> >> Most of our clients now use 5 seconds as the timeout, but I think that we >> went to longer timeouts in the past. Without digging in to determine the >> truth of the matter, my guess is that we needed the longer timeouts before >> we tuned the GC parameters and that after tuning GC, we were able to >> return >> to a more reasonable timeout. In retrospect, I think that we blamed EC2 >> for >> some of our own GC misconfiguration. >> >> I would not use our configuration here as canonical since we didn't apply >> a >> whole lot of brainpower to this problem. >> >> On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt wrote: >> >> Ted, could you provide your configuration information for the cluster >>> (incl >>> the client timeout you use), if you're willing I'd be happy to put this >>> up >>> on the wiki for others interested in running in EC2. >>> >>> >> >> >> -- Ted Dunning, CTO DeepDyve --0016e64b9046930b540478098d89--