hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Garbage collection issues
Date Mon, 29 Nov 2010 15:51:04 GMT
On Mon, Nov 29, 2010 at 6:33 AM, Sean Sechrist <ssechrist@gmail.com> wrote:

> Just an update, in case anyone's interested in our performance numbers:
>
> With the 512MB newSize, our minor GC pauses are generally less than .05s,
> although we see a fair amount get up around .15s. We still see some
> promotion failures causing full pauses over a minute occasionally. But we
> have a script running to automatically restart our regionservers if that
> happens. Things seem to be going ok right now.
>
> On a related note: If a region server encounters the GC pause of death,
> will
> all of the writes in its memstore at the time be lost (without using WAL)?
> I
> think it would be.
>

Yep, they would be - that's why the WAL is important.

One thing I've been thinking about is a way to have an HBase-orchestrated
constant rolling System.gc(). If we can detect heap fragmentation before it
causes a long pause, we can shed regions gracefully, do system.gc(), and
then pick them up again. A little tricky but should solve these issues once
and forall, especially on big clusters where constant rolling restart isn't
a big deal compared to total capacity.

-Todd


> On Mon, Nov 29, 2010 at 4:49 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
>
> > On a slightly related note, we've been running with G1 with default
> > settings on a 16GB heap for some weeks now. It's never given us trouble,
> so
> > I didn't do any real analysis on the GC times, just some eye balling.
> >
> > I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i
> > real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s.
> The
> > minor pauses all appear to be around 0.2s. I can pastebin a GC log if
> anyone
> > is interested in the G1 behavior.
> >
> >
> >
> > Friso
> >
> >
> >
> > On 29 nov 2010, at 09:47, Ryan Rawson wrote:
> >
> > > I'd love to hear the kinds of minor pauses you get... left alone to
> > > it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
> > > xmx is large enough, at that size you are looking at 800ms minor
> > > pauses!
> > >
> > > It's a tough subject.
> > >
> > > -ryan
> > >
> > > On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ssechrist@gmail.com>
> > wrote:
> > >> Interesting. The settings we tried earlier today slowed jobs
> > significantly,
> > >> but no failures (yet). We're going to try the 512MB newSize and 60%
> > >> CMSInitiatingOccupancyFraction. 1 second pauses here and there would
> be
> > OK
> > >> for us.... we just want to avoid the long pauses right now. We'll also
> > do
> > >> what we can to avoid swapping. The ganglia metrics on on there.
> > >>
> > >> Thanks,
> > >> Sean
> > >>
> > >> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <todd@cloudera.com>
> wrote:
> > >>
> > >>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ssechrist@gmail.com
> > >wrote:
> > >>>
> > >>>> Hey guys,
> > >>>>
> > >>>> I just want to get an idea about how everyone avoids these long
GC
> > pauses
> > >>>> that cause regionservers to die.
> > >>>>
> > >>>> What kind of java heap and garbage collection settings do you use?
> > >>>>
> > >>>> What do you do to make sure that the HBase vm never uses swap?
I
> have
> > >>>> heard
> > >>>> turning off swap altogether can be dangerous, so right now we have
> the
> > >>>> setting vm.swappiness=0. How do you tell if it's using swap? On
> > Ganglia,
> > >>>> we
> > >>>> see the "CPU wio" metric at around 4.5% before one of our crashes.
> Is
> > that
> > >>>> high?
> > >>>>
> > >>>> To try to avoid using too much memory, is reducing the memstore
> > >>>> upper/lower
> > >>>> limit, or the block cache size a good idea? Should we just tune
down
> > >>>> HBase's
> > >>>> total heap to try to avoid swap?
> > >>>>
> > >>>> In terms of our specific problem:
> > >>>>
> > >>>> We seem to keep running into garbage collection pauses that cause
> the
> > >>>> regionservers to die. We have mix of some random read jobs, as
well
> as
> > a
> > >>>> few
> > >>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions),
> > and
> > >>>> we
> > >>>> are always inserting data. We would rather sacrifice a little speed
> > for
> > >>>> stability, if that means anything. We have 7 nodes (RS + DN + TT)
> with
> > >>>> 12GB
> > >>>> max heap given to HBase, and 24GB memory total.
> > >>>>
> > >>>> We were using the following garbage collection options:
> > >>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> > >>>> -XX:CMSInitiatingOccupancyFraction=75
> > >>>>
> > >>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning,
> we
> > are
> > >>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
> > >>>> CMSInitiatingOccupancyFraction to 50.
> > >>>>
> > >>>
> > >>> Rather than reducing the new size, you should consider increasing new
> > size
> > >>> if you're OK with higher latency but fewer long GC pauses.
> > >>>
> > >>> GC is a complicated subject, but here are a few rules of thumb:
> > >>>
> > >>> - A larger young generation means that the young GC pauses, which are
> > >>> stop-the-world, will take longer. In my experience it's somewhere
> > around 1
> > >>> second per GB of new size. So, if you're OK with periodic 1 second
> > pauses, a
> > >>> large (1GB) new size should be fine.
> > >>> - A larger young generation also means that less data will get
> tenured
> > to
> > >>> the old generation. This means that the old generation will have to
> > collect
> > >>> less often and also that it will become less fragmented.
> > >>> - In HBase, the long (45second+) pauses generally happen when
> promotion
> > >>> fails due to heap fragmentation in the old generation. So, it falls
> > back to
> > >>> stop-the-world compacting collection which takes a long time.
> > >>>
> > >>> So, in general, a large young gen will reduce the frequency of
> > super-long
> > >>> pauses, but will increase the frequency of shorter pauses.
> > >>>
> > >>> It sounds like you may be OK with longer young gen pauses, so maybe
> > >>> consider new size at 512M with your 12G total heap?
> > >>>
> > >>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will
> cause
> > CMS
> > >>> to always be running which isn't that efficient.
> > >>>
> > >>> -Todd
> > >>>
> > >>>
> > >>>>
> > >>>> We see messages like this in our GC logs:
> > >>>>
> > >>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17
secs]
> > >>>>
> > >>>>   (concurrent mode failure): 10126729K->5760080K(13246464K),
> > 91.2530340
> > >>>>> secs]
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew
> > (promotion
> > >>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> > >>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> > >>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> > >>>>  (concurrent mode failure): 10126729K->5760080K(13246464K),
> 91.2530340
> > >>>> secs]
> > >>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> > >>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
> > >>>>
> > >>>> There's a lot of questions there, but I definitely appreciate any
> > advice
> > >>>> or
> > >>>> input anybody else has. Thanks so much!
> > >>>>
> > >>>> -Sean
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Todd Lipcon
> > >>> Software Engineer, Cloudera
> > >>>
> > >>
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message