hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Wollenstein <de...@klout.com>
Subject Re: Help Tuning HBase GC Settings
Date Wed, 07 Dec 2011 01:30:52 GMT
I will take a look at lowering this;  Unfortunately I'm inheriting existing
settings and trying to be as conservative as possible when making changes.
 I can definitely try lowering the memory -- I've gotten mixed messages on
how much to allocate to the HBase heap.  I'll start taking a look at moving
both of these settings down and see how it affects performance (and trying
to use https://github.com/brianfrankcooper/YCSB/wiki for testing).  Thanks
for the suggestion.

On Tue, Dec 6, 2011 at 5:20 PM, Doug Meil <doug.meil@explorysmedical.com>wrote:

>
> There are others that know this GC issues better than myself, but setting
> hfile.block.cache.size to .5 seems a bit aggressive to me.  That's 50% of
> the heap right there.
>
> Also, the issue with setting the max-heap to 24Gb is that whenever a full
> GC is required on a heap that size, it's a killer.  Folks at recent Hbase
> hackathons were talking about not going higher than 16Gb for RS.
>
>
>
>
>
> On 12/6/11 8:10 PM, "Derek Wollenstein" <derek@klout.com> wrote:
>
> >I've been working on improving GC settings for HBase RegionServer
> >instances, and I seem to have run into a bit of a dead end.
> >
> >Basically I've been trying to tune GC settings and memory settings
> >appropriately, but I keep on having my server reach something like GC
> >Death.
> >
> >My current memory settings are
> >HEAPSIZE=24576
> >-ea -Xmx24576 -Xms24576 -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
> >-XX:NewSize=192m -XX:MaxNewSize=192m
> >-XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX:+PrintGCDetails
> >-XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log
> >-XX:ParallelGCThreads=6
> >
> >We've also set hfile.block.cache.size to 0.5 ( believing that incresing
> >the
> >space for cache should improve performance;  I'm willing to accept that
> >this could be causing problems, I just haven't seen this reported)
> >
> >Basically as the service is "warming up", we see reasonable GC Log
> >timings.
> >
> >...
> >2011-12-06T10:19:14.834+0000: 782.473: [GC [1 CMS-initial-mark:
> >2225311K(3705244K)] 2237042K(3882204K), 0.0028630 secs] [Times: user=0.00
> >sys=0.00, real=0.00 secs]
> >2011-12-06T10:19:14.837+0000: 782.476: [CMS-concurrent-mark-start]
> >2011-12-06T10:19:15.523+0000: 783.162: [CMS-concurrent-mark: 0.686/0.686
> >secs] [Times: user=1.98 sys=0.05, real=0.69 secs]
> >2011-12-06T10:19:15.523+0000: 783.162: [CMS-concurrent-preclean-start]
> >2011-12-06T10:19:15.533+0000: 783.172: [CMS-concurrent-preclean:
> >0.010/0.010 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> >2011-12-06T10:19:15.534+0000: 783.172:
> >[CMS-concurrent-abortable-preclean-start]
> >2011-12-06T10:19:15.742+0000: 783.381: [GC 783.381: [ParNew:
> >168978K->9067K(176960K), 0.0076540 secs] 2394290K->2235970K(3882204K),
> >0.0077920 secs] [Times: user=0.05 sys=0.00, real=0.00 secs]
> >2011-12-06T10:19:16.351+0000: 783.990: [CMS-concurrent-abortable-preclean:
> >0.123/0.817 secs] [Times: user=0.87 sys=0.06, real=0.81 secs]
> >2011-12-06T10:19:16.352+0000: 783.990: [GC[YG occupancy: 97548 K (176960
> >K)]783.990: [Rescan (parallel) , 0.0153180 secs]784.006: [weak refs
> >processing, 0.0000170 secs] [1 CMS-remark: 2226903K(3705244K)]
> >2324451K(3882204K), 0.0154470 secs] [Times: user=0.09 sys=0.00, real=0.02
> >secs]
> >...
> >This seems to continue for about for about 16 hours
> >But as we hit some critical time, we end up stuck running multiple young
> >GC/second.  Note that they at appear to be successful (if I'm reading the
> >logs correctly)
> >011-12-06T23:49:42.132+0000: 49409.770: [GC 49409.771: [ParNew:
> >166477K->7992K(176960K), 0.0390800 secs] 11642960K->11486691K(19297180K),
> >0.0392470 secs] [Times: user=0.22 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:42.522+0000: 49410.161:
> >[CMS-concurrent-abortable-preclean: 0.247/0.589 secs] [Times: user=1.04
> >sys=0.02, real=0.58 secs]
> >2011-12-06T23:49:42.523+0000: 49410.162: [GC[YG occupancy: 149306 K
> >(176960
> >K)]49410.162: [Rescan (parallel) , 0.0314250 secs]49410.193: [weak refs
> >processing, 0.0000890 secs] [1 CMS-remark: 11478698K(1
> >9120220K)] 11628005K(19297180K), 0.0316410 secs] [Times: user=0.17
> >sys=0.01, real=0.03 secs]
> >2011-12-06T23:49:42.555+0000: 49410.194: [CMS-concurrent-sweep-start]
> >2011-12-06T23:49:42.597+0000: 49410.236: [GC 49410.236: [ParNew:
> >165304K->7868K(176960K), 0.0405890 secs] 11643677K->11487303K(19297180K),
> >0.0407690 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:43.048+0000: 49410.687: [GC 49410.687: [ParNew:
> >165180K->6485K(176960K), 0.0389860 secs] 11027946K->10869726K(19297180K),
> >0.0392000 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:43.181+0000: 49410.819: [CMS-concurrent-sweep:
> >0.542/0.625
> >secs] [Times: user=1.73 sys=0.02, real=0.62 secs]
> >2011-12-06T23:49:43.181+0000: 49410.819: [CMS-concurrent-reset-start]
> >2011-12-06T23:49:43.232+0000: 49410.870: [CMS-concurrent-reset:
> >0.051/0.051
> >secs] [Times: user=0.10 sys=0.00, real=0.05 secs]
> >2011-12-06T23:49:43.481+0000: 49411.120: [GC 49411.120: [ParNew:
> >163797K->7150K(176960K), 0.0409170 secs] 10380339K->10224698K(19297180K),
> >0.0410870 secs] [Times: user=0.24 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:43.920+0000: 49411.559: [GC 49411.559: [ParNew:
> >164462K->8178K(176960K), 0.0394640 secs] 10382010K->10226321K(19297180K),
> >0.0396290 secs] [Times: user=0.22 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:44.355+0000: 49411.994: [GC 49411.994: [ParNew:
> >165490K->8303K(176960K), 0.0367330 secs] 10383633K->10227244K(19297180K),
> >0.0368970 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
> >2011-12-06T23:49:44.785+0000: 49412.424: [GC 49412.424: [ParNew:
> >165615K->10439K(176960K), 0.0398080 secs] 10384556K->10229598K(19297180K),
> >0.0399870 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
> >2011-12-06T23:49:45.225+0000: 49412.864: [GC 49412.864: [ParNew:
> >167751K->13171K(176960K), 0.0393970 secs] 10386910K->10233071K(19297180K),
> >0.0395730 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
> >...
> >Note that we are doing about 2 YGC/second (about 80ms). I believe that
> >this
> >situation represents us using up some undesirable amount of memory
> >resources, but I can't find any particular reason why we're going from
> >using a reasonable amount of GC to using GC constantly.   One theory is
> >that this is the result of something in the young generation being
> >promoted?  But I'm not really sure what would be causing
> >this pathological behavior.  Any ideas are alternative places to look
> >would
> >be greatly appreciated.
> >
> >--Derek
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message