hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Region Server OutOfMemory Error
Date Tue, 06 Jan 2015 20:06:00 GMT
Hi,

The first thing I'd want to know is which memory poor is getting filled.
There are several in the JVM.
Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd (this
one is actually from an HBase cluster).  If you see any of the lines at
100% that's potential trouble.  If it stays at 100% it's trouble (i.e. OOM
about to happen).  If it's constantly close to 100% that's OOM waiting to
happen and you should check your GC and CPU graphs and see how much time
the JVM is spending on GC.

Once you know which pool is problematic you'll be better informed and may
be able to increase the size of just that pool.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <linshuai2012@gmail.com> wrote:

> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message