hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Help Tuning HBase GC Settings
Date Wed, 07 Dec 2011 01:20:28 GMT

There are others that know this GC issues better than myself, but setting
hfile.block.cache.size to .5 seems a bit aggressive to me.  That's 50% of
the heap right there.

Also, the issue with setting the max-heap to 24Gb is that whenever a full
GC is required on a heap that size, it's a killer.  Folks at recent Hbase
hackathons were talking about not going higher than 16Gb for RS.





On 12/6/11 8:10 PM, "Derek Wollenstein" <derek@klout.com> wrote:

>I've been working on improving GC settings for HBase RegionServer
>instances, and I seem to have run into a bit of a dead end.
>
>Basically I've been trying to tune GC settings and memory settings
>appropriately, but I keep on having my server reach something like GC
>Death.
>
>My current memory settings are
>HEAPSIZE=24576
>-ea -Xmx24576 -Xms24576 -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
>-XX:NewSize=192m -XX:MaxNewSize=192m
>-XX:CMSInitiatingOccupancyFraction=60 -verbose:gc -XX:+PrintGCDetails
>-XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log
>-XX:ParallelGCThreads=6
>
>We've also set hfile.block.cache.size to 0.5 ( believing that incresing
>the
>space for cache should improve performance;  I'm willing to accept that
>this could be causing problems, I just haven't seen this reported)
>
>Basically as the service is "warming up", we see reasonable GC Log
>timings.
>
>...
>2011-12-06T10:19:14.834+0000: 782.473: [GC [1 CMS-initial-mark:
>2225311K(3705244K)] 2237042K(3882204K), 0.0028630 secs] [Times: user=0.00
>sys=0.00, real=0.00 secs]
>2011-12-06T10:19:14.837+0000: 782.476: [CMS-concurrent-mark-start]
>2011-12-06T10:19:15.523+0000: 783.162: [CMS-concurrent-mark: 0.686/0.686
>secs] [Times: user=1.98 sys=0.05, real=0.69 secs]
>2011-12-06T10:19:15.523+0000: 783.162: [CMS-concurrent-preclean-start]
>2011-12-06T10:19:15.533+0000: 783.172: [CMS-concurrent-preclean:
>0.010/0.010 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>2011-12-06T10:19:15.534+0000: 783.172:
>[CMS-concurrent-abortable-preclean-start]
>2011-12-06T10:19:15.742+0000: 783.381: [GC 783.381: [ParNew:
>168978K->9067K(176960K), 0.0076540 secs] 2394290K->2235970K(3882204K),
>0.0077920 secs] [Times: user=0.05 sys=0.00, real=0.00 secs]
>2011-12-06T10:19:16.351+0000: 783.990: [CMS-concurrent-abortable-preclean:
>0.123/0.817 secs] [Times: user=0.87 sys=0.06, real=0.81 secs]
>2011-12-06T10:19:16.352+0000: 783.990: [GC[YG occupancy: 97548 K (176960
>K)]783.990: [Rescan (parallel) , 0.0153180 secs]784.006: [weak refs
>processing, 0.0000170 secs] [1 CMS-remark: 2226903K(3705244K)]
>2324451K(3882204K), 0.0154470 secs] [Times: user=0.09 sys=0.00, real=0.02
>secs]
>...
>This seems to continue for about for about 16 hours
>But as we hit some critical time, we end up stuck running multiple young
>GC/second.  Note that they at appear to be successful (if I'm reading the
>logs correctly)
>011-12-06T23:49:42.132+0000: 49409.770: [GC 49409.771: [ParNew:
>166477K->7992K(176960K), 0.0390800 secs] 11642960K->11486691K(19297180K),
>0.0392470 secs] [Times: user=0.22 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:42.522+0000: 49410.161:
>[CMS-concurrent-abortable-preclean: 0.247/0.589 secs] [Times: user=1.04
>sys=0.02, real=0.58 secs]
>2011-12-06T23:49:42.523+0000: 49410.162: [GC[YG occupancy: 149306 K
>(176960
>K)]49410.162: [Rescan (parallel) , 0.0314250 secs]49410.193: [weak refs
>processing, 0.0000890 secs] [1 CMS-remark: 11478698K(1
>9120220K)] 11628005K(19297180K), 0.0316410 secs] [Times: user=0.17
>sys=0.01, real=0.03 secs]
>2011-12-06T23:49:42.555+0000: 49410.194: [CMS-concurrent-sweep-start]
>2011-12-06T23:49:42.597+0000: 49410.236: [GC 49410.236: [ParNew:
>165304K->7868K(176960K), 0.0405890 secs] 11643677K->11487303K(19297180K),
>0.0407690 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:43.048+0000: 49410.687: [GC 49410.687: [ParNew:
>165180K->6485K(176960K), 0.0389860 secs] 11027946K->10869726K(19297180K),
>0.0392000 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:43.181+0000: 49410.819: [CMS-concurrent-sweep:
>0.542/0.625
>secs] [Times: user=1.73 sys=0.02, real=0.62 secs]
>2011-12-06T23:49:43.181+0000: 49410.819: [CMS-concurrent-reset-start]
>2011-12-06T23:49:43.232+0000: 49410.870: [CMS-concurrent-reset:
>0.051/0.051
>secs] [Times: user=0.10 sys=0.00, real=0.05 secs]
>2011-12-06T23:49:43.481+0000: 49411.120: [GC 49411.120: [ParNew:
>163797K->7150K(176960K), 0.0409170 secs] 10380339K->10224698K(19297180K),
>0.0410870 secs] [Times: user=0.24 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:43.920+0000: 49411.559: [GC 49411.559: [ParNew:
>164462K->8178K(176960K), 0.0394640 secs] 10382010K->10226321K(19297180K),
>0.0396290 secs] [Times: user=0.22 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:44.355+0000: 49411.994: [GC 49411.994: [ParNew:
>165490K->8303K(176960K), 0.0367330 secs] 10383633K->10227244K(19297180K),
>0.0368970 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
>2011-12-06T23:49:44.785+0000: 49412.424: [GC 49412.424: [ParNew:
>165615K->10439K(176960K), 0.0398080 secs] 10384556K->10229598K(19297180K),
>0.0399870 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
>2011-12-06T23:49:45.225+0000: 49412.864: [GC 49412.864: [ParNew:
>167751K->13171K(176960K), 0.0393970 secs] 10386910K->10233071K(19297180K),
>0.0395730 secs] [Times: user=0.23 sys=0.00, real=0.04 secs]
>...
>Note that we are doing about 2 YGC/second (about 80ms). I believe that
>this
>situation represents us using up some undesirable amount of memory
>resources, but I can't find any particular reason why we're going from
>using a reasonable amount of GC to using GC constantly.   One theory is
>that this is the result of something in the young generation being
>promoted?  But I'm not really sure what would be causing
>this pathological behavior.  Any ideas are alternative places to look
>would
>be greatly appreciated.
>
>--Derek



Mime
View raw message