hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Pratt <prat...@adobe.com>
Subject RE: gc pause killing regionserver
Date Tue, 06 Mar 2012 18:56:54 GMT
> > Why use the following in your config?
> I use these GC tuning options because I found them somewere on the
> mailing list advertised as generally advised GC options. I think it would be nice
> if HBase ref guide recommends default GC settings, I can imagine that they
> are different for different heap sizes.
 [Sandy Pratt] 

Fair enough.  IIRC hbase-env.sh comes with "-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
out of the box so I took that to be the default.  It's my understanding that args like the
"max GC pause" one are hints to the ergonomics engine more than anything else.  I believe
the intention is that you can specify performance characteristics and let the JVM work out
generation ratios and such[1].  There's nothing wrong with using them, just was curious the
reasoning behind them.

1: http://docs.oracle.com/javase/1.5.0/docs/guide/vm/gc-ergonomics.html

> > It's possible that you actually are in swap due to uncollected
> > off-heap memory allocations.  I doubt that even severe fragmentation
> > on the heap would cause that kind of slowdown
> Munin does show some minor swapping (and memory overcommit), but
> considering the amount of free space left (os disk cache) and the fact that
> swapiness is set to zero, I was under the impression that it was harmless.
> On second thought I will deep digger into this.
[Sandy Pratt] 
I bring this up because I've had real problems with it, and my initial intuition was dead
wrong.  My advice is to look at the process size reported by top at various points in your
execution.  In some situations, it's not uncommon to see JVM processes with 2 GB heaps have
6-7 GB virtual sizes and bloated resident sizes as well.  That's not necessarily a problem
in all cases.  For example, if a process has a large virtual size because of a bunch of files
mapped in read-only mode, that's not a big deal.  I found that in many of my boxes, the oversized
memory footprint was correlated with crashes due to long GC.  My best guess was that the problem
was due to direct-allocated byte buffers not being cleaned up often enough, and when they
finally were, I would be effectively GCing in swap, which is a death sentence.

I, and a few other people I've spoken too, have had some success with '-XX:+UseParallelOldGC
-XX:MaxDirectMemorySize=128m'.  I think the first argument is more important than the second.
 This goes against the HBase defaults, so take it with a grain of salt.

I don't know for sure if you have the same problem, but if you suspect something it can't
hurt to paste some output from top here.

Which JVM are you running?

> > If I'm reading your log correctly, you have about 2.5 GB of heap, right?
> That's right, 2100100100 bytes to be exact.
> > Is this server has the same load as the other ones?
> Yes. They all run about the same amount of regions and generally have the
> same load. The hardware is (should be) identical.
[Sandy Pratt] 

Maybe it tends to be serving a hotspot region each time.  ISTR there was an HBase feature
to reassign regions to the same server each time; that could be what's happening.

View raw message