hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: How to know the root reason to cause RegionServer OOM?
Date Wed, 13 May 2015 18:04:03 GMT
After moving to the G1GC we were plagued with random OOMs from time to
time.  We always thought it was due to people requesting a big row or group
of rows, but upon investigation noticed that the heap dumps were many GBs
less than the max heap at time of OOM.  If you have this symptom, you may
be running into humongous allocation issues.

I think HBase is especially prone to humongous allocations if you are
batching Puts on the client side, or have large cells.  Googling for
humongous allocations will return a lot of useful results.  I found
http://www.infoq.com/articles/tuning-tips-G1-GC to be especially helpful.

The bottom line is this:

- If an allocation is larger than a 50% of the G1 region size, it is a
humongous allocation which is more expensive to clean up.  We want to avoid
this.
- The default region size is only a few mb, so any big batch puts or scans
can easily be considered humongous.  If you don't set Xms, it will be even
smaller.
- Make sure you are setting Xms to the same value as Xmx.  This is used by
the G1 to calculate default region sizes.
- Enable -XX:+PrintAdaptiveSizePolicy, which will print out information you
can use for debugging humongous allocations.  Any time an allocation is
considered humongous, it will print the size of the allocation.  For us,
enabling this setting made it immediately obvious there was an issue.
- Using the output of the above, determine your optimal region size.
Region sizes must be a power of 2, and you should generally target around
2000 regions.  So a compromise is sometimes needed, as you don't want to be
*too* far below this number.
- Use -XX:G1HeapRegionSize=xM to set the region size.  Like I said, use a
power of 2.

For us, we were getting a lot of allocations around 3-5mb.  The largest
percentage were around 3 to less than 4mb.  On our 25GB regionservers, we
set to the region size to 8MB, so that the vast majority of allocations
fell under 50% of 8mb.  The remaining humongous allocations were low enough
volume to work fine.  On our 32GB regionservers, we set this to 16mb and
completely eliminated humongous allocations.

Since the above tuning, G1GC has worked great for us and we have not had
any OOMs in a couple months.

Hope this helps.

On Wed, May 13, 2015 at 10:37 AM, Stack <stack@duboce.net> wrote:

> On Tue, May 12, 2015 at 7:41 PM, David chen <c77_cn@163.com> wrote:
>
> > A RegionServer was killed because OutOfMemory(OOM), although  the process
> > killed can be seen in the Linux message log, but i still have two
> following
> > problems:
> > 1. How to inspect the root reason to cause OOM?
> >
>
> Start the regionserver with -XX:-HeapDumpOnOutOfMemoryError specifying a
> location for the heap to be dumped to on OOME (See
>
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> ).
> Remove the XX:OnOutOfMemoryError because now it will conflict with
> HeapDumpOnOutOfMemoryError
>  Then open the heap dump in the java mission control, jprofiler, etc., to
> see how the retained objects are associated.
>
>
> > 2  When RegionServer encounters OOM, why can't it free some memories
> > occupied? if so, whether or not killer will not need.
> >
>
> We require a certain amount of memory to process a particular work load. If
> an insufficient allocation, we OOME. Once an application has OOME'd, its
> state goes indeterminate. We opt to kill the process rather than hang
> around in a damaged state.
>
> Enable GC logging to figure why in particular you OOME'd (There are
> different categories of OOME [1]). We may have a sufficient memory
> allocation but an incorrectly tuned GC or a badly specified set of heap
> args may bring on OOME.
>
> St.Ack
> 1.
>
> http://www.javacodegeeks.com/2013/08/understanding-the-outofmemoryerror.html
>
>
> > Any ideas can be appreciated!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message