hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <saint....@gmail.com>
Subject Re: GC makes ryan sad
Date Sat, 25 Apr 2009 21:23:16 GMT
Just to say that in ryan case we are at an extreme in that cell and  
key size are very small.

Ryan, why you think heap fragmentation an issue when objects are small  
(and I do not think you are reading -- right)

Is there an issue for the meta issue do you know?

Good on you Ryan





On Apr 25, 2009, at 13:23, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Heap seems a _bit_ bigger than perhaps it should be, but it just  
> gives the
> CMS more work to do.
>
> The hidden problem might be heap fragmentation, since Java uses lots  
> of
> small objects, we might end up with an increasing heap size.
>
> Here is a GC log during the CMS-sweep (when it's reclaiming from  
> main heap):
> 42374.524: [CMS-concurrent-sweep-start]
> 42374.576: [GC 42374.576: [ParNew: 5560K->576K(5568K), 0.0115800 secs]
> 2709982K->2706003K(2947416K), 0.0117020 secs] [Times: user=0.04  
> sys=0.00,
> real=0.01 secs]
> 42374.616: [GC 42374.616: [ParNew: 5568K->576K(5568K), 0.0147850 secs]
> 2703025K->2701054K(2947416K), 0.0148980 secs] [Times: user=0.08  
> sys=0.00,
> real=0.01 secs]
> 42374.685: [GC 42374.685: [ParNew: 5568K->574K(5568K), 0.0104920 secs]
> 2690723K->2687015K(2947416K), 0.0106060 secs] [Times: user=0.07  
> sys=0.00,
> real=0.01 secs]
> 42374.751: [GC 42374.752: [ParNew: 5566K->574K(5568K), 0.0118220 secs]
> 2672236K->2668439K(2947416K), 0.0119340 secs] [Times: user=0.08  
> sys=0.00,
> real=0.01 secs]
> 42375.483: [GC 42375.483: [ParNew: 5508K->575K(5568K), 0.0090170 secs]
> 2329262K->2325313K(2947416K), 0.0091370 secs] [Times: user=0.07  
> sys=0.00,
> real=0.02 secs]
> 42375.614: [GC 42375.615: [ParNew: 5504K->467K(5568K), 0.0071550 secs]
> 2275783K->2271206K(2947416K), 0.0072510 secs] [Times: user=0.06  
> sys=0.00,
> real=0.00 secs]
> 42375.746: [GC 42375.746: [ParNew: 5451K->428K(5568K), 0.0064950 secs]
> 2210537K->2205514K(2947416K), 0.0066040 secs] [Times: user=0.04  
> sys=0.00,
> real=0.01 secs]
> 42375.885: [GC 42375.885: [ParNew: 5420K->362K(5568K), 0.0080160 secs]
> 2149436K->2144384K(2947416K), 0.0081120 secs] [Times: user=0.07  
> sys=0.00,
> real=0.01 secs]
> 42376.006: [GC 42376.006: [ParNew: 5354K->548K(5568K), 0.0064720 secs]
> 2097343K->2092544K(2947416K), 0.0065760 secs] [Times: user=0.06  
> sys=0.00,
> real=0.01 secs]
> 42376.120: [GC 42376.120: [ParNew: 5540K->576K(5568K), 0.0089420 secs]
> 2044984K->2040242K(2947416K), 0.0090420 secs] [Times: user=0.06  
> sys=0.00,
> real=0.01 secs]
> 42376.336: [GC 42376.336: [ParNew: 5568K->576K(5568K), 0.0092920 secs]
> 1948100K->1943594K(2947416K), 0.0093910 secs] [Times: user=0.06  
> sys=0.01,
> real=0.01 secs]
> 42376.885: [GC 42376.885: [ParNew: 5540K->576K(5568K), 0.0119010 secs]
> 1677094K->1673092K(2947416K), 0.0120230 secs] [Times: user=0.05  
> sys=0.00,
> real=0.01 secs]
> 42377.328: [CMS-concurrent-sweep: 2.679/2.804 secs] [Times: user=4.75
> sys=0.10, real=2.80 secs]
>
> basically we go from 2700MB -> 1670 MB, losing about 1100MB.  Took  
> about
> 2.8s and there were 12 minor collections during that time,  
> decreasing the
> size of the ParNew by 10x each time.  The GC pauses for ParNew are  
> higher
> than I want, about 6ms each, up to a high of 11ms.
>
> My next thought/experiment is to move to a slightly large ParNew,  
> maybe
> 12MB, thus preventing more garbage from entering heap.
>
> However, HBase won't take a rolling restart - as soon as you touch the
> .META. regionserver, you're screwed.
>
> On Sat, Apr 25, 2009 at 1:05 PM, Chad Walters <Chad.Walters@microsoft.com 
> >wrote:
>
>>
>> Cool -- that seems like it could be a reasonable set of tradeoffs for
>> latency sensitive uses.
>>
>> How big is the memory increase?
>>
>> Do you know the chief source of the allocations that are GC'd?
>>
>> Chad
>>
>>
>> On 4/25/09 12:47 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>>
>> The incremental GC doesn't help on a multicore system, it's  
>> designed to
>> break up the giant concurrent phase (which I've seen last 5  
>> minutes) on
>> systems with 1 or 2 CPUs to prevent the concurrent GC from starving  
>> the app
>> for CPU.  My machines have 8 cores, so this won't really help.
>>
>> I talked to a friend, I realize the problem is the ParNew is  
>> growing too
>> big... It starts out at about 15 MB and grows 10x to 150MB.  My  
>> friend
>> suggested that I size ParNew to the L2 cache size, which is 6 MB on  
>> my
>> Intel
>> Xeon CPUs - not sure if the L2 size is shared between pairs of  
>> cores or
>> not.
>>
>> Switching to a much smaller ParNew has several effects:
>> - GC pauses are small, tiny, 1ms ish
>> - GC of ParNew happens 1-10x a second
>> - Overall memory use goes up as more stuff enters the general heap  
>> and has
>> to wait for the CMS to reclaim it. I watched the CMS prune 1 GB of  
>> garbage
>> in 3 seconds.
>>
>> However, the crippling pauses that were keeping my cluster from  
>> performing
>> are gone.  At the cost of higher ram usage.
>>
>> On Sat, Apr 25, 2009 at 10:09 AM, Chad Walters
>> <Chad.Walters@microsoft.com>wrote:
>>
>>>
>>> Done any experimentation with the incremental GC?
>>>
>>> Chad
>>>
>>> On 4/23/09 6:28 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>>>
>>> From a log of 0.20 in action:
>>>
>>> 11061.089: [GC 11061.089: [ParNew: 153331K->17024K(153344K), 0.2249580
>>> secs]
>>> 3396126K->3309484K(4579960K) icms_dc=0 , 0.2251020 secs] [Times:
>> user=0.66
>>> sys=0.04, real=0.23 secs]
>>> 11062.439: [GC 11062.439: [ParNew: 153344K->17024K(153344K), 0.1625480
>>> secs]
>>> 3445804K->3335150K(4579960K) icms_dc=0 , 0.1627020 secs] [Times:
>> user=0.37
>>> sys=0.02, real=0.17 secs]
>>>
>>>
>>> Yes, in 1.4 seconds we had a 225ms pause and a 162ms pause.  This  
>>> is not
>>> atypical.  Most pauses are in the 80-150ms area.
>>>
>>> -ryan
>>>
>>>
>>
>>

Mime
View raw message