hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: 答复: GC pause issues
Date Fri, 25 Jan 2013 17:55:09 GMT
My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses
every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I
set the young gen low. The one thing I noted today was that the pauses are
spaced @10 seconds until a minor compaction kicks in - and then then the
pauses happen every 3-4 seconds. I found this to be highly correlated on a
region server. These minor compactions are not huge - they are typically
compacting 3 files - upto 200M in total size. They are lasting for 20-30
seconds. Once they are over the GC pauses recover back to the 10 second
frequency.

I have the following settings enabled (index.cacheonwrite and
bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes
though i find it unlikely that they could be impacting my setup.

On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <magnito@gmail.com> wrote:

> Generally, the larger the flush to harder the GC will work. Flush more
> often to avoid this. What is your total heap size set at?
> On Jan 24, 2013 9:02 PM, "Varun Sharma" <varun@pinterest.com> wrote:
>
> > I do have significant block cache churn and this issue is typical
> > correlated with a huge increase in read latencies - could that be the
> > reason for this - mslab should be taking care of the memstore related
> heap
> > fragmentation ? Has anyone seen issues with block cache churn ?
> >
> > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <varun@pinterest.com>
> wrote:
> >
> > > I am curious how reducing the memstore size would help - I have 6
> regions
> > > and 3G total for memstore - so I would like max out on that by having a
> > > bigger flush size per region. Are you asking me to have more regions
> and
> > > smaller memstore flush size instead ? How is that likely to help
> > >
> > >
> > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xieliang@xiaomi.com> wrote:
> > >
> > >> Hi Varun,
> > >>
> > >> Please note if you try to increase new generation size, then the
> > "ParNew"
> > >> time will be up accordingly, and CMS YGC is also a STW.
> > >> could you have a try to reduce memstore size to a smaller value, e.g.
> > >> 128m or 256m ?
> > >>
> > >> Regards,
> > >> Liang
> > >> ________________________________________
> > >> 发件人: Varun Sharma [varun@pinterest.com]
> > >> 发送时间: 2013年1月25日 1:40
> > >> 收件人: user@hbase.apache.org
> > >> 主题: GC pause issues
> > >>
> > >> Hi,
> > >>
> > >> I have a region server which has the following logs. As you can see
> from
> > >> the log, ParNew is sufficiently big (450M) and there are heavy writes
> > >> going
> > >> in. I am seeing 200ms pauses which eventually build up and there is a
> > >> promotion failure. There is a parnew collection every 2-3 seconds so
> it
> > >> fills up real fast. My memstore size is bigger 512m for flushes and 4
> > >> regions per server. (overall size is 3G for all memstores) - I have
> > mslab
> > >> enabled
> > >>
> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
> > >> 471841K->52416K(471872K), 0.2251880 secs]
> 8733008K->8445039K(12727104K),
> > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
> > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
> > >> 461593K->52416K(471872K), 0.2812690 secs]
> 8854216K->8557572K(12727104K),
> > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
> > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
> > >> 442836K->52416K(471872K), 0.2781490 secs]
> 8947992K->8705355K(12727104K),
> > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
> > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
> > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
> > >> user=0.02
> > >> sys=0.00, real=0.04 secs]
> > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
> > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
> > >> 471872K->52251K(471872K), 0.1611970 secs]
> 9124811K->8831437K(12727104K),
> > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
> > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark:
> > 4.105/4.268
> > >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
> > >> 2013-01-24T13:08:26.434+0000: 63543.527:
> [CMS-concurrent-preclean-start]
> > >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
> > >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
> > >> 2013-01-24T13:08:26.597+0000: 63543.691:
> > >> [CMS-concurrent-abortable-preclean-start]
> > >> 2013-01-24T13:08:27.401+0000: 63544.495:
> > >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times:
> user=1.46
> > >> sys=0.16, real=0.80 secs]
> > >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
> > >> (471872
> > >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak
> refs
> > >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
> > >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
> > real=0.06
> > >> secs]
> > >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
> > >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
> > >> 471707K->44566K(471872K), 0.1371770 secs]
> 9044714K->8701862K(12727104K),
> > >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
> > >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
> > >> 445714K->52416K(471872K), 0.5626120 secs]
> 8648805K->8410223K(12727104K),
> > >> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
> > >> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
> > >> 471872K->52416K(471872K), 0.2300560 secs]
> 8247804K->7976043K(12727104K),
> > >> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
> > >> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew
> > (promotion
> > >> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
> > >> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
> > >> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
> > >>
> > >> What might be advised here - should I up the size to 1G for the new
> > >> generation ?
> > >>
> > >> Thanks
> > >> Varun
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message