hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: 答复: GC pause issues
Date Sat, 26 Jan 2013 06:54:24 GMT
How much memory are you flushing?  Can you paste the log here?  The
large the chunk of the flush the longer  your GC is going to be.

-Jack

On Fri, Jan 25, 2013 at 9:55 AM, Varun Sharma <varun@pinterest.com> wrote:
> My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses
> every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I
> set the young gen low. The one thing I noted today was that the pauses are
> spaced @10 seconds until a minor compaction kicks in - and then then the
> pauses happen every 3-4 seconds. I found this to be highly correlated on a
> region server. These minor compactions are not huge - they are typically
> compacting 3 files - upto 200M in total size. They are lasting for 20-30
> seconds. Once they are over the GC pauses recover back to the 10 second
> frequency.
>
> I have the following settings enabled (index.cacheonwrite and
> bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes
> though i find it unlikely that they could be impacting my setup.
>
> On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <magnito@gmail.com> wrote:
>
>> Generally, the larger the flush to harder the GC will work. Flush more
>> often to avoid this. What is your total heap size set at?
>> On Jan 24, 2013 9:02 PM, "Varun Sharma" <varun@pinterest.com> wrote:
>>
>> > I do have significant block cache churn and this issue is typical
>> > correlated with a huge increase in read latencies - could that be the
>> > reason for this - mslab should be taking care of the memstore related
>> heap
>> > fragmentation ? Has anyone seen issues with block cache churn ?
>> >
>> > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <varun@pinterest.com>
>> wrote:
>> >
>> > > I am curious how reducing the memstore size would help - I have 6
>> regions
>> > > and 3G total for memstore - so I would like max out on that by having a
>> > > bigger flush size per region. Are you asking me to have more regions
>> and
>> > > smaller memstore flush size instead ? How is that likely to help
>> > >
>> > >
>> > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xieliang@xiaomi.com> wrote:
>> > >
>> > >> Hi Varun,
>> > >>
>> > >> Please note if you try to increase new generation size, then the
>> > "ParNew"
>> > >> time will be up accordingly, and CMS YGC is also a STW.
>> > >> could you have a try to reduce memstore size to a smaller value, e.g.
>> > >> 128m or 256m ?
>> > >>
>> > >> Regards,
>> > >> Liang
>> > >> ________________________________________
>> > >> 发件人: Varun Sharma [varun@pinterest.com]
>> > >> 发送时间: 2013年1月25日 1:40
>> > >> 收件人: user@hbase.apache.org
>> > >> 主题: GC pause issues
>> > >>
>> > >> Hi,
>> > >>
>> > >> I have a region server which has the following logs. As you can see
>> from
>> > >> the log, ParNew is sufficiently big (450M) and there are heavy writes
>> > >> going
>> > >> in. I am seeing 200ms pauses which eventually build up and there is
a
>> > >> promotion failure. There is a parnew collection every 2-3 seconds so
>> it
>> > >> fills up real fast. My memstore size is bigger 512m for flushes and
4
>> > >> regions per server. (overall size is 3G for all memstores) - I have
>> > mslab
>> > >> enabled
>> > >>
>> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
>> > >> 471841K->52416K(471872K), 0.2251880 secs]
>> 8733008K->8445039K(12727104K),
>> > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
>> > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
>> > >> 461593K->52416K(471872K), 0.2812690 secs]
>> 8854216K->8557572K(12727104K),
>> > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
>> > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
>> > >> 442836K->52416K(471872K), 0.2781490 secs]
>> 8947992K->8705355K(12727104K),
>> > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
>> > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
>> > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
>> > >> user=0.02
>> > >> sys=0.00, real=0.04 secs]
>> > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
>> > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
>> > >> 471872K->52251K(471872K), 0.1611970 secs]
>> 9124811K->8831437K(12727104K),
>> > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
>> > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark:
>> > 4.105/4.268
>> > >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
>> > >> 2013-01-24T13:08:26.434+0000: 63543.527:
>> [CMS-concurrent-preclean-start]
>> > >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
>> > >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
>> > >> 2013-01-24T13:08:26.597+0000: 63543.691:
>> > >> [CMS-concurrent-abortable-preclean-start]
>> > >> 2013-01-24T13:08:27.401+0000: 63544.495:
>> > >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times:
>> user=1.46
>> > >> sys=0.16, real=0.80 secs]
>> > >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
>> > >> (471872
>> > >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak
>> refs
>> > >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
>> > >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
>> > real=0.06
>> > >> secs]
>> > >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
>> > >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
>> > >> 471707K->44566K(471872K), 0.1371770 secs]
>> 9044714K->8701862K(12727104K),
>> > >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
>> > >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
>> > >> 445714K->52416K(471872K), 0.5626120 secs]
>> 8648805K->8410223K(12727104K),
>> > >> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
>> > >> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
>> > >> 471872K->52416K(471872K), 0.2300560 secs]
>> 8247804K->7976043K(12727104K),
>> > >> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
>> > >> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew
>> > (promotion
>> > >> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
>> > >> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
>> > >> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
>> > >>
>> > >> What might be advised here - should I up the size to 1G for the new
>> > >> generation ?
>> > >>
>> > >> Thanks
>> > >> Varun
>> > >>
>> > >
>> > >
>> >
>>

Mime
View raw message