hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Pratt <prat...@adobe.com>
Subject RE: FeedbackRe: Suspected memory leak
Date Mon, 05 Dec 2011 21:54:26 GMT
Gaojinchao,

I'm not certain, but this looks a lot like some of the issues I've been dealing with lately
(namely, non-Java-heap memory leakage).

First, -XX:MaxDirectMemorySize doesn't seem to be a solution.  This flag is poorly documented,
and moreover the problem appears to be related to releasing/reclaiming resources rather than
over-allocating them.  See http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ae283c11508fb97ede5fe27a1554b?bug_id=4469299

Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather
than CMS GC.  I have been trying this recently on some of my app servers and hadoop servers,
and it certainly does fix the problem of non-Java heap growth.  The concern with parallel
GC is that full GCs (which are the solution to the non-heap memory problem, it would seem)
take too long.  Personally, I consider this reasoning fallacious, since full GC is bound to
occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal
(and even without this bug, CMS uses a single thread for a full GC AFAIK).  The numbers for
parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max
pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major
at least 3:1, total overhead 1.3%).  If your application can tolerate a second or two of latency
once in a while, you can switch to parallelOldGC and call it a day.  

The fact that some installations are trying to deal with ~24GB heaps sounds like a design
issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis
on scaling vertically just because the hardware comes in a certain size sounds misguided.
 But not having that hardware, I might be missing something.

Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think
it's in sysctl.conf).  I have set swappiness to 0 for my servers, and I'm happy with it. 
I don't know the exact mechanism, but it certainly appears that there's a memory pressure
feedback of some sort going on between the kernel and the JVM.  Perhaps it has to do with
the total commit charge appearing lower (just physical instead of physical + swap) when swappiness
is low.  I'd love to hear from someone with a deep understanding of OS memory allocation about
this.

Hope this helps,
Sandy


> -----Original Message-----
> From: Gaojinchao [mailto:gaojinchao@huawei.com]
> Sent: Saturday, December 03, 2011 19:58
> To: user@hbase.apache.org; dev@hbase.apache.org
> Cc: Chenjian; wenzaohua
> Subject: FeedbackRe: Suspected memory leak
> 
> Thank you for your help.
> 
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there
> doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using
> GC confiugre parameter of our client doesn't produce any "full gc".
> 
> This is only a preliminary result,  All tests is running, If have any further results
> , we will be fed back.
> Finally , I will update our story to issue
> https://issues.apache.org/jira/browse/HBASE-4633.
> 
> If our digging is crrect, whether we should set a default value for the "-
> XXMaxDirectMemorySize" to prevent this situation?
> 
> 
> Thanks
> 
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
> 
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the
> heap-size is stabilized, but the non-heap size keep growing. So I think not the
> problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the
> records contains the info like below:
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydi
> ywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
> 
> Jieshan.
> 
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
> 
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due to a
> bug/bugs in the CMS GC, collection may not happen and the heap can grow
> really big.  See
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> details.
> 
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
> related objects were being collected properly. This option forces the
> concurrent marker to be one thread. This was for HDFS, but I think the same
> applies here.
> 
> Kihwal
> 
> On 12/1/11 1:26 PM, "Stack" <stack@duboce.net> wrote:
> 
> Make sure its not the issue that Jonathan Payne identifiied a while
> back:
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45b
> c7ba788b2357#
> St.Ack

Mime
View raw message