Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates
 74.125.46.29 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=GgAX1VcVCI3VebZAgNHenLkLBagBSQ8JdWQnEMbHGIEbMLe9GoHcgy5YE59rw9CKDu
         +DbdunrhqYKJKdrwJv3SjoQqmqR2OzGsUoBEtQBnhkAz2T3pGJEEbaVybyqj97SScbfr
         h8nUug/ZZMxOv/gL/3WWvr3JFkyEOuNMWoFJk=
MIME-Version: 1.0
In-Reply-To: <C618BAFE.92B5D%cwalter@microsoft.com>
References: <78568af10904251247y43ec5644rd48680817e7bb534@mail.gmail.com>
	 <C618BAFE.92B5D%cwalter@microsoft.com>
Date: Sat, 25 Apr 2009 13:23:01 -0700
Message-ID: <78568af10904251323w2d9b59eem7fe463285af9c3ab@mail.gmail.com>
Subject: Re: GC makes ryan sad
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-dev@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001e680f18bc3a3579046866e120

--001e680f18bc3a3579046866e120
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Heap seems a _bit_ bigger than perhaps it should be, but it just gives the
CMS more work to do.

The hidden problem might be heap fragmentation, since Java uses lots of
small objects, we might end up with an increasing heap size.

Here is a GC log during the CMS-sweep (when it's reclaiming from main heap):
42374.524: [CMS-concurrent-sweep-start]
42374.576: [GC 42374.576: [ParNew: 5560K->576K(5568K), 0.0115800 secs]
2709982K->2706003K(2947416K), 0.0117020 secs] [Times: user=0.04 sys=0.00,
real=0.01 secs]
42374.616: [GC 42374.616: [ParNew: 5568K->576K(5568K), 0.0147850 secs]
2703025K->2701054K(2947416K), 0.0148980 secs] [Times: user=0.08 sys=0.00,
real=0.01 secs]
42374.685: [GC 42374.685: [ParNew: 5568K->574K(5568K), 0.0104920 secs]
2690723K->2687015K(2947416K), 0.0106060 secs] [Times: user=0.07 sys=0.00,
real=0.01 secs]
42374.751: [GC 42374.752: [ParNew: 5566K->574K(5568K), 0.0118220 secs]
2672236K->2668439K(2947416K), 0.0119340 secs] [Times: user=0.08 sys=0.00,
real=0.01 secs]
42375.483: [GC 42375.483: [ParNew: 5508K->575K(5568K), 0.0090170 secs]
2329262K->2325313K(2947416K), 0.0091370 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
42375.614: [GC 42375.615: [ParNew: 5504K->467K(5568K), 0.0071550 secs]
2275783K->2271206K(2947416K), 0.0072510 secs] [Times: user=0.06 sys=0.00,
real=0.00 secs]
42375.746: [GC 42375.746: [ParNew: 5451K->428K(5568K), 0.0064950 secs]
2210537K->2205514K(2947416K), 0.0066040 secs] [Times: user=0.04 sys=0.00,
real=0.01 secs]
42375.885: [GC 42375.885: [ParNew: 5420K->362K(5568K), 0.0080160 secs]
2149436K->2144384K(2947416K), 0.0081120 secs] [Times: user=0.07 sys=0.00,
real=0.01 secs]
42376.006: [GC 42376.006: [ParNew: 5354K->548K(5568K), 0.0064720 secs]
2097343K->2092544K(2947416K), 0.0065760 secs] [Times: user=0.06 sys=0.00,
real=0.01 secs]
42376.120: [GC 42376.120: [ParNew: 5540K->576K(5568K), 0.0089420 secs]
2044984K->2040242K(2947416K), 0.0090420 secs] [Times: user=0.06 sys=0.00,
real=0.01 secs]
42376.336: [GC 42376.336: [ParNew: 5568K->576K(5568K), 0.0092920 secs]
1948100K->1943594K(2947416K), 0.0093910 secs] [Times: user=0.06 sys=0.01,
real=0.01 secs]
42376.885: [GC 42376.885: [ParNew: 5540K->576K(5568K), 0.0119010 secs]
1677094K->1673092K(2947416K), 0.0120230 secs] [Times: user=0.05 sys=0.00,
real=0.01 secs]
42377.328: [CMS-concurrent-sweep: 2.679/2.804 secs] [Times: user=4.75
sys=0.10, real=2.80 secs]

basically we go from 2700MB -> 1670 MB, losing about 1100MB.  Took about
2.8s and there were 12 minor collections during that time, decreasing the
size of the ParNew by 10x each time.  The GC pauses for ParNew are higher
than I want, about 6ms each, up to a high of 11ms.

My next thought/experiment is to move to a slightly large ParNew, maybe
12MB, thus preventing more garbage from entering heap.

However, HBase won't take a rolling restart - as soon as you touch the
.META. regionserver, you're screwed.

On Sat, Apr 25, 2009 at 1:05 PM, Chad Walters <Chad.Walters@microsoft.com>wrote:

>
> Cool -- that seems like it could be a reasonable set of tradeoffs for
> latency sensitive uses.
>
> How big is the memory increase?
>
> Do you know the chief source of the allocations that are GC'd?
>
> Chad
>
>
> On 4/25/09 12:47 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>
> The incremental GC doesn't help on a multicore system, it's designed to
> break up the giant concurrent phase (which I've seen last 5 minutes) on
> systems with 1 or 2 CPUs to prevent the concurrent GC from starving the app
> for CPU.  My machines have 8 cores, so this won't really help.
>
> I talked to a friend, I realize the problem is the ParNew is growing too
> big... It starts out at about 15 MB and grows 10x to 150MB.  My friend
> suggested that I size ParNew to the L2 cache size, which is 6 MB on my
> Intel
> Xeon CPUs - not sure if the L2 size is shared between pairs of cores or
> not.
>
> Switching to a much smaller ParNew has several effects:
> - GC pauses are small, tiny, 1ms ish
> - GC of ParNew happens 1-10x a second
> - Overall memory use goes up as more stuff enters the general heap and has
> to wait for the CMS to reclaim it. I watched the CMS prune 1 GB of garbage
> in 3 seconds.
>
> However, the crippling pauses that were keeping my cluster from performing
> are gone.  At the cost of higher ram usage.
>
> On Sat, Apr 25, 2009 at 10:09 AM, Chad Walters
> <Chad.Walters@microsoft.com>wrote:
>
> >
> > Done any experimentation with the incremental GC?
> >
> > Chad
> >
> > On 4/23/09 6:28 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
> >
> > From a log of 0.20 in action:
> >
> > 11061.089: [GC 11061.089: [ParNew: 153331K->17024K(153344K), 0.2249580
> > secs]
> > 3396126K->3309484K(4579960K) icms_dc=0 , 0.2251020 secs] [Times:
> user=0.66
> > sys=0.04, real=0.23 secs]
> > 11062.439: [GC 11062.439: [ParNew: 153344K->17024K(153344K), 0.1625480
> > secs]
> > 3445804K->3335150K(4579960K) icms_dc=0 , 0.1627020 secs] [Times:
> user=0.37
> > sys=0.02, real=0.17 secs]
> >
> >
> > Yes, in 1.4 seconds we had a 225ms pause and a 162ms pause.  This is not
> > atypical.  Most pauses are in the 80-150ms area.
> >
> > -ryan
> >
> >
>
>

--001e680f18bc3a3579046866e120--