hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: FeedbackRe: Suspected memory leak
Date Mon, 05 Dec 2011 03:39:25 GMT
Jinchao:
Since we found the workaround, can you summarize the following statistics
on HBASE-4633 ?

Thanks

2011/12/4 Gaojinchao <gaojinchao@huawei.com>

> Yes, I have tested, System is fine.
> Nearly one hours , trigger a full GC.
> 10022.210: [Full GC (System) 10022.210: [Tenured:
> 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K),
> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
> sys=0.00, real=1.75 secs]
> .........
>
> .........
> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
> user=1.90 sys=0.01, real=0.14 secs]
> 13624.630: [Full GC (System) 13624.630: [Tenured:
> 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K),
> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
>           [Times: user=1.94 sys=0.00, real=1.96 secs]
>
> 7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
> 7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2011年12月5日 9:06
> 收件人: dev@hbase.apache.org
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Can you try specifying XX:MaxDirectMemorySize with moderate value and see
> if the leak gets under control ?
>
> Thanks
>
> 2011/12/4 Gaojinchao <gaojinchao@huawei.com>
>
> > I have attached the stack in
> > https://issues.apache.org/jira/browse/HBASE-4633.
> > I will update our story.
> >
> >
> > -----邮件原件-----
> > 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> > 发送时间: 2011年12月5日 7:37
> > 收件人: dev@hbase.apache.org; lars hofhansl
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > I looked through TRUNK and 0.90 code but didn't find
> > HBaseClient.Connection.setParam().
> > The method should be sendParam().
> >
> > When I was in China I tried to access Jonathan's post but wasn't able to.
> >
> > If Jinchao's stack trace resonates with the one Jonathan posted, we
> should
> > consider using netty for HBaseClient.
> >
> > Cheers
> >
> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lhofhansl@yahoo.com>
> wrote:
> >
> > > I think HBASE-4508 is unrelated.
> > > The "connections" I referring to are HBaseClient.Connection objects
> (not
> > > HConnections).
> > > It turns out that HBaseClient.Connection.setParam is actually called
> > > directly by the client threads, which means we can get
> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
> > >
> > > The JDK will cache 3 per thread with a size necessary to serve the IO.
> So
> > > sending some large requests from many thread
> > > will lead to OOM.
> > >
> > > I think that was a related thread that Stack forwarded a while back
> from
> > > the asynchbase mailing lists.
> > >
> > > Jinchao, could you add a text version (not a png image, please :-) ) of
> > > this to the jira?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Yu <yuzhihong@gmail.com>
> > > To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> > > Cc: Gaojinchao <gaojinchao@huawei.com>; Chenjian <
> > jean.chenjian@huawei.com>;
> > > wenzaohua <wenzaohua@huawei.com>
> > > Sent: Sunday, December 4, 2011 12:43 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
> because
> > > 0.90.5 hasn't been released.
> > > Assuming the NIO consumption is related to the number of connections
> from
> > > client side, it would help to perform benchmarking on 0.90.5
> > >
> > > Jinchao:
> > > Please attach stack trace to HBASE-4633 so that we can verify our
> > > assumptions.
> > >
> > > Thanks
> > >
> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lhofhansl@yahoo.com>
> > > wrote:
> > >
> > > > Thanks. Now the question is: How many connection threads do we have?
> > > >
> > > > I think there is one per regionserver, which would indeed be a
> problem.
> > > > Need to look at the code again (I'm only partially familiar with the
> > > > client code).
> > > >
> > > > Either the client should chunk (like the server does), or there
> should
> > be
> > > > a limited number of thread that
> > > > perform IO on behalf of the client (or both).
> > > >
> > > > -- Lars
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Gaojinchao <gaojinchao@huawei.com>
> > > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl
<
> > > > lhofhansl@yahoo.com>
> > > > Cc: Chenjian <jean.chenjian@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 11:22 PM
> > > > Subject: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > This is dump stack.
> > > >
> > > >
> > > > -----邮件原件-----
> > > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > > > 发送时间: 2011年12月4日 14:15
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > Dropping user list.
> > > >
> > > > Could you (or somebody) point me to where the client is using NIO?
> > > > I'm looking at HBaseClient and I do not see references to NIO, also
> it
> > > > seems that all work is handed off to
> > > > separate threads: HBaseClient.Connection, and the JDK will not cache
> > more
> > > > than 3 direct buffers per thread.
> > > >
> > > > It's possible (likely?) that I missed something in the code.
> > > >
> > > > Thanks.
> > > >
> > > > -- Lars
> > > >
> > > > ________________________________
> > > > From: Gaojinchao <gaojinchao@huawei.com>
> > > > To: "user@hbase.apache.org" <user@hbase.apache.org>; "
> > > dev@hbase.apache.org"
> > > > <dev@hbase.apache.org>
> > > > Cc: Chenjian <jean.chenjian@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 7:57 PM
> > > > Subject: FeedbackRe: Suspected memory leak
> > > >
> > > > Thank you for your help.
> > > >
> > > > This issue appears to be a configuration problem:
> > > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > > > there doesn't have "full gc", all direct memory can't reclaim.
> > > > Unfortunately, using GC confiugre parameter of our client doesn't
> > produce
> > > > any "full gc".
> > > >
> > > > This is only a preliminary result,  All tests is running, If have any
> > > > further results , we will be fed back.
> > > > Finally , I will update our story to issue
> > > > https://issues.apache.org/jira/browse/HBASE-4633.
> > > >
> > > > If our digging is crrect, whether we should set a default value for
> the
> > > > "-XXMaxDirectMemorySize" to prevent this situation?
> > > >
> > > >
> > > > Thanks
> > > >
> > > > -----邮件原件-----
> > > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > > > 发送时间: 2011年12月2日 15:37
> > > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Thank you all.
> > > > I think it's the same problem with the link provided by Stack.
> Because
> > > the
> > > > heap-size is stabilized, but the non-heap size keep growing. So I
> think
> > > not
> > > > the problem of the CMS GC bug.
> > > > And we have known the content of the problem memory section, all the
> > > > records contains the info like below:
> > > >
> > > >
> > >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > > > "BBZHtable_UFDR_058,048342220093168-02570"
> > > > ........
> > > >
> > > > Jieshan.
> > > >
> > > > -----邮件原件-----
> > > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > > > 发送时间: 2011年12月2日 4:20
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Adding to the excellent write-up by Jonathan:
> > > > Since finalizer is involved, it takes two GC cycles to collect them.
> >  Due
> > > > to a bug/bugs in the CMS GC, collection may not happen and the heap
> can
> > > > grow really big.  See
> > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> > details.
> > > >
> > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> > > socket
> > > > related objects were being collected properly. This option forces the
> > > > concurrent marker to be one thread. This was for HDFS, but I think
> the
> > > same
> > > > applies here.
> > > >
> > > > Kihwal
> > > >
> > > > On 12/1/11 1:26 PM, "Stack" <stack@duboce.net> wrote:
> > > >
> > > > Make sure its not the issue that Jonathan Payne identifiied a while
> > > > back:
> > > >
> > >
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > > > St.Ack
> > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message