Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 86442 invoked from network); 3 Jan 2011 22:44:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jan 2011 22:44:57 -0000 Received: (qmail 95877 invoked by uid 500); 3 Jan 2011 22:44:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 95783 invoked by uid 500); 3 Jan 2011 22:44:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 95775 invoked by uid 99); 3 Jan 2011 22:44:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jan 2011 22:44:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jan 2011 22:44:48 +0000 Received: by fxm12 with SMTP id 12so7343109fxm.14 for ; Mon, 03 Jan 2011 14:44:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=Z687QDp3Kp4WEi1ZnmOC2CsQKC11PVmZES+osFCQsXg=; b=Sr9jKObuR1/OEvukZuWWRvKYNDeoEhaBwUa+EC+lXpQW9MWm4/55w86SIvNuPnyxfz ZBg56W7/YCwa76FSUxstiEOAYYd1tFR3X1BzTQMolbFg3QPiSKdF69pX/EKzW8NeGCZf 5nffpf1b4F/Ko1FA9iNlohD5NOfuqyPHbhhnE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=mOP7c+RPY748niwzFyJYzpiMUGI2O5Il0y2/1e4yffDl0fT4fi9euPgHvqySjiNxF8 eNKMKPQZSdxouufoT5x82XPAqS3VrivFDCoGbQUzDeRVS8nw5878IaJqajP+k6wwQDR7 E5nERsbskvJ6c65zsDGbpyAQQVZZjy+ke0MJ8= MIME-Version: 1.0 Received: by 10.223.96.199 with SMTP id i7mr3284899fan.56.1294094668709; Mon, 03 Jan 2011 14:44:28 -0800 (PST) Sender: saint.ack@gmail.com Received: by 10.223.83.136 with HTTP; Mon, 3 Jan 2011 14:44:28 -0800 (PST) In-Reply-To: References: Date: Mon, 3 Jan 2011 14:44:28 -0800 X-Google-Sender-Auth: hPGRlfEdgkptCxBT-oLsLvm8Z1E Message-ID: Subject: Re: CMF & NodeIsDeadException From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Jan 3, 2011 at 2:13 PM, Wayne wrote: > Here are the new settings we are trying out. They seemed to "help" with > cass. In the end I assume we will need a script to do rolling restarts or > better yet hbase does it on its own!! > > Thanks for the help! > > =A0 =A0 =A0 =A0-XX:+UseCMSInitiatingOccupancyOnly > =A0 =A0 =A0 =A0-XX:CMSInitiatingOccupancyFraction=3D60 This seems low. Means lots of CPU spent GC'ing. But that said, good to start low then you can work up from there. > =A0 =A0 =A0 =A0-XX:+CMSParallelRemarkEnabled > =A0 =A0 =A0 =A0-XX:SurvivorRatio=3D8 > =A0 =A0 =A0 =A0-XX:NewRatio=3D3 This is fine to start with but if it were me, I'd make the young gen bigger (if objects don't make it up into the tenured heap, they'll not get in the way of subsequent promotions). What proportion of heap was it when you had long pauses? > =A0 =A0 =A0 =A0-XX:MaxTenuringThreshold=3D1 > Setting this to 1 means stuff objects get promoted to tenured heap after surviving only one young GC. I wonder if you set this to a higher number how things would run? (Again, my rationale is that if objects don't get into the tenured space in the first place, then they can't be in the way when comes time to promote subsequent objects from young to tenured.) It might be something to mess with later. GC tuning, the "joy of java", is a little bit of a black art. Its particularly black given that a bunch think there is no tuning that will get you away from an occasional stop-the-world GC, at least when running the CMS collector. Keep us posted. St.Ack > On Mon, Jan 3, 2011 at 5:05 PM, Stack wrote: > >> On Mon, Jan 3, 2011 at 12:50 PM, Wayne wrote: >> > We have an 8GB heap. What should newsize be? I just had another node d= ie >> > hard after going into a CMF storm. I swear it had solid CMFs 30+ in a >> row. >> > >> >> Did a full stop-the-world GC run in between? =A0It should have cleaned >> up fragmentation. >> >> > I have no idea what eden space is or how to see what it is. ?? >> > >> >> Sorry. =A0There's a bunch of 'cute' terms used for describing the two >> heap areas in the JVM. =A0Basically, new stuff goes into the 'new' or >> 'eden' area first. =A0If it sticks around through N (configurable) GCs, >> it gets promoted to old or tenured generation (there are other names >> for these notions of young and old). =A0The garbage collection >> algorithms done in the two heaps differ. =A0See the Ted citation for >> more on the gruesome details (though come up to a newer version of >> that doc). =A0The JVM is supposed to work ergonomically but it just >> ain't smart enough dealing w/ HBase/Cass loadings it seems (e.g. it >> keeps growing the new/eden space pathologically it would seem). >> >> >> > Not knowing what else to do I will start using some of the Cassandra >> > settings I used to improve it by setting the occupancy fraction. Any >> other >> > ideas??? >> > >> >> Which config. you talking of? -XX:+CMSInitiatingOccupancyFraction? >> Thats a good one to toggle down from defaullts. =A0Should help put off >> promotion failures a while. >> >> >> St.Ack >> >