Yeah, I know a heap dump would work, but I'm a little worried about dumping 22GB of data on a production server, since it could take quite a while, and make the recovery more slower. On Wed, Jan 7, 2015 at 10:51 AM, 谢良 wrote: > Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ? > the heap dump will make the thing clear > ________________________________________ > 发件人: Shuai Lin > 发送时间: 2015年1月6日 19:32 > 收件人: user@hbase.apache.org > 主题: Region Server OutOfMemory Error > > Hi all, > > We have a hbase cluster of 5 region servers, each, each hosting 60+ > regions. > > But under heavy load the region servers crashes for OOME now and then: > > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 16820"... > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC > log. The last few lines of the GC log before each crash are always like > this: > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G), > 0.8867660 secs] > [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap: > 7122.7M(22.0G)->5837.2M(22.0G)] > [Times: user=1.42 sys=0.00, real=0.89 secs] > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G), > 0.6378260 secs] > [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap: > 5837.2M(22.0G)->5836.5M(22.0G)] > [Times: user=0.93 sys=0.00, real=0.63 secs] > > From the last lineI see the heap only occupies 5837MB, and the capacity is > 22GB, so how can the OOM happen? Or is my interpretation of the gc log > wrong? > > I read some articles and onlhy got some basic concept of G1GC. I've tried > tools like GCViewer, but none gives me useful explanation of the details of > the GC log. > > > Regards, > Shuai >