hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Suspected memory leak
Date Mon, 05 Dec 2011 18:49:24 GMT
Lars:
What you proposed below should be close to what netty does.
Instead of managing the complexity of NIO related code, we can delegate to
netty as what asynchbase does.
This discussion should be under a different thread / JIRA.

sendParam() is called by HBaseClient.call() which is called by
WritableRpcEngine and SecureRpcEngine.
Can you elaborate on what you think the call hierarchy should be ?

Overall, I think we can resolve HBASE-4633 and put further discussion under
https://issues.apache.org/jira/browse/HBASE-4956

Cheers

On Sun, Dec 4, 2011 at 10:08 PM, Lars <lhofhansl@yahoo.com> wrote:

> To Ted... yes sorry sendParam.
>
> Any better solution involves changing the code.
>
> I could envision a form of active object where all NIO is handled by a
> small pool of threads and/or doing chunking into (say) 8k chunks on the
> client. Or both.
>
> In both cases there would less direct buffer garbage produced by the
> client.
>
> Why is sendParam called directly by the client (app) threads? Is it to
> enforce ordering?
>
> Lastly, XX:MaxDirectMemorySize should definitely be documented.
>
> -- Lars
>
> Gaojinchao <gaojinchao@huawei.com> schrieb:
>
> >Ok. Anyone has better solution?. Do we need to introduce in book?
> >
> >
> >-----邮件原件-----
> >发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> >发送时间: 2011年12月5日 11:39
> >收件人: dev@hbase.apache.org
> >主题: Re: FeedbackRe: Suspected memory leak
> >
> >Jinchao:
> >Since we found the workaround, can you summarize the following statistics
> >on HBASE-4633 ?
> >
> >Thanks
> >
> >2011/12/4 Gaojinchao <gaojinchao@huawei.com>
> >
> >> Yes, I have tested, System is fine.
> >> Nearly one hours , trigger a full GC.
> >> 10022.210: [Full GC (System) 10022.210: [Tenured:
> >> 577566K->257349K(1048576K), 1.7515610 secs]
> 9651924K->257349K(14260672K),
> >> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
> >> sys=0.00, real=1.75 secs]
> >> .........
> >>
> >> .........
> >> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
> >> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
> >> user=1.90 sys=0.01, real=0.14 secs]
> >> 13624.630: [Full GC (System) 13624.630: [Tenured:
> >> 310202K->175378K(1048576K), 1.9529280 secs]
> 11581276K->175378K(14260672K),
> >> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
> >>           [Times: user=1.94 sys=0.00, real=1.96 secs]
> >>
> >> 7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
> >> 7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java
> >>
> >> -----邮件原件-----
> >> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> >> 发送时间: 2011年12月5日 9:06
> >> 收件人: dev@hbase.apache.org
> >> 主题: Re: FeedbackRe: Suspected memory leak
> >>
> >> Can you try specifying XX:MaxDirectMemorySize with moderate value and
> see
> >> if the leak gets under control ?
> >>
> >> Thanks
> >>
> >> 2011/12/4 Gaojinchao <gaojinchao@huawei.com>
> >>
> >> > I have attached the stack in
> >> > https://issues.apache.org/jira/browse/HBASE-4633.
> >> > I will update our story.
> >> >
> >> >
> >> > -----邮件原件-----
> >> > 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> >> > 发送时间: 2011年12月5日 7:37
> >> > 收件人: dev@hbase.apache.org; lars hofhansl
> >> > 主题: Re: FeedbackRe: Suspected memory leak
> >> >
> >> > I looked through TRUNK and 0.90 code but didn't find
> >> > HBaseClient.Connection.setParam().
> >> > The method should be sendParam().
> >> >
> >> > When I was in China I tried to access Jonathan's post but wasn't able
> to.
> >> >
> >> > If Jinchao's stack trace resonates with the one Jonathan posted, we
> >> should
> >> > consider using netty for HBaseClient.
> >> >
> >> > Cheers
> >> >
> >> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lhofhansl@yahoo.com>
> >> wrote:
> >> >
> >> > > I think HBASE-4508 is unrelated.
> >> > > The "connections" I referring to are HBaseClient.Connection objects
> >> (not
> >> > > HConnections).
> >> > > It turns out that HBaseClient.Connection.setParam is actually called
> >> > > directly by the client threads, which means we can get
> >> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
> >> > >
> >> > > The JDK will cache 3 per thread with a size necessary to serve the
> IO.
> >> So
> >> > > sending some large requests from many thread
> >> > > will lead to OOM.
> >> > >
> >> > > I think that was a related thread that Stack forwarded a while back
> >> from
> >> > > the asynchbase mailing lists.
> >> > >
> >> > > Jinchao, could you add a text version (not a png image, please :-)
> ) of
> >> > > this to the jira?
> >> > >
> >> > >
> >> > > -- Lars
> >> > >
> >> > >
> >> > >
> >> > > ----- Original Message -----
> >> > > From: Ted Yu <yuzhihong@gmail.com>
> >> > > To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> >> > > Cc: Gaojinchao <gaojinchao@huawei.com>; Chenjian <
> >> > jean.chenjian@huawei.com>;
> >> > > wenzaohua <wenzaohua@huawei.com>
> >> > > Sent: Sunday, December 4, 2011 12:43 PM
> >> > > Subject: Re: FeedbackRe: Suspected memory leak
> >> > >
> >> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
> >> because
> >> > > 0.90.5 hasn't been released.
> >> > > Assuming the NIO consumption is related to the number of connections
> >> from
> >> > > client side, it would help to perform benchmarking on 0.90.5
> >> > >
> >> > > Jinchao:
> >> > > Please attach stack trace to HBASE-4633 so that we can verify our
> >> > > assumptions.
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lhofhansl@yahoo.com
> >
> >> > > wrote:
> >> > >
> >> > > > Thanks. Now the question is: How many connection threads do we
> have?
> >> > > >
> >> > > > I think there is one per regionserver, which would indeed be
a
> >> problem.
> >> > > > Need to look at the code again (I'm only partially familiar with
> the
> >> > > > client code).
> >> > > >
> >> > > > Either the client should chunk (like the server does), or there
> >> should
> >> > be
> >> > > > a limited number of thread that
> >> > > > perform IO on behalf of the client (or both).
> >> > > >
> >> > > > -- Lars
> >> > > >
> >> > > >
> >> > > > ----- Original Message -----
> >> > > > From: Gaojinchao <gaojinchao@huawei.com>
> >> > > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars
hofhansl
> <
> >> > > > lhofhansl@yahoo.com>
> >> > > > Cc: Chenjian <jean.chenjian@huawei.com>; wenzaohua <
> >> > wenzaohua@huawei.com
> >> > > >
> >> > > > Sent: Saturday, December 3, 2011 11:22 PM
> >> > > > Subject: Re: FeedbackRe: Suspected memory leak
> >> > > >
> >> > > > This is dump stack.
> >> > > >
> >> > > >
> >> > > > -----邮件原件-----
> >> > > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> >> > > > 发送时间: 2011年12月4日 14:15
> >> > > > 收件人: dev@hbase.apache.org
> >> > > > 抄送: Chenjian; wenzaohua
> >> > > > 主题: Re: FeedbackRe: Suspected memory leak
> >> > > >
> >> > > > Dropping user list.
> >> > > >
> >> > > > Could you (or somebody) point me to where the client is using
NIO?
> >> > > > I'm looking at HBaseClient and I do not see references to NIO,
> also
> >> it
> >> > > > seems that all work is handed off to
> >> > > > separate threads: HBaseClient.Connection, and the JDK will not
> cache
> >> > more
> >> > > > than 3 direct buffers per thread.
> >> > > >
> >> > > > It's possible (likely?) that I missed something in the code.
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > -- Lars
> >> > > >
> >> > > > ________________________________
> >> > > > From: Gaojinchao <gaojinchao@huawei.com>
> >> > > > To: "user@hbase.apache.org" <user@hbase.apache.org>; "
> >> > > dev@hbase.apache.org"
> >> > > > <dev@hbase.apache.org>
> >> > > > Cc: Chenjian <jean.chenjian@huawei.com>; wenzaohua <
> >> > wenzaohua@huawei.com
> >> > > >
> >> > > > Sent: Saturday, December 3, 2011 7:57 PM
> >> > > > Subject: FeedbackRe: Suspected memory leak
> >> > > >
> >> > > > Thank you for your help.
> >> > > >
> >> > > > This issue appears to be a configuration problem:
> >> > > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> >> > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value,
> So if
> >> > > > there doesn't have "full gc", all direct memory can't reclaim.
> >> > > > Unfortunately, using GC confiugre parameter of our client doesn't
> >> > produce
> >> > > > any "full gc".
> >> > > >
> >> > > > This is only a preliminary result,  All tests is running, If
have
> any
> >> > > > further results , we will be fed back.
> >> > > > Finally , I will update our story to issue
> >> > > > https://issues.apache.org/jira/browse/HBASE-4633.
> >> > > >
> >> > > > If our digging is crrect, whether we should set a default value
> for
> >> the
> >> > > > "-XXMaxDirectMemorySize" to prevent this situation?
> >> > > >
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > -----邮件原件-----
> >> > > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> >> > > > 发送时间: 2011年12月2日 15:37
> >> > > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> >> > > > 抄送: Chenjian; wenzaohua
> >> > > > 主题: Re: Suspected memory leak
> >> > > >
> >> > > > Thank you all.
> >> > > > I think it's the same problem with the link provided by Stack.
> >> Because
> >> > > the
> >> > > > heap-size is stabilized, but the non-heap size keep growing.
So I
> >> think
> >> > > not
> >> > > > the problem of the CMS GC bug.
> >> > > > And we have known the content of the problem memory section,
all
> the
> >> > > > records contains the info like below:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> >> > > > "BBZHtable_UFDR_058,048342220093168-02570"
> >> > > > ........
> >> > > >
> >> > > > Jieshan.
> >> > > >
> >> > > > -----邮件原件-----
> >> > > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> >> > > > 发送时间: 2011年12月2日 4:20
> >> > > > 收件人: dev@hbase.apache.org
> >> > > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> >> > > > 主题: Re: Suspected memory leak
> >> > > >
> >> > > > Adding to the excellent write-up by Jonathan:
> >> > > > Since finalizer is involved, it takes two GC cycles to collect
> them.
> >> >  Due
> >> > > > to a bug/bugs in the CMS GC, collection may not happen and the
> heap
> >> can
> >> > > > grow really big.  See
> >> > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> >> > details.
> >> > > >
> >> > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all
> the
> >> > > socket
> >> > > > related objects were being collected properly. This option forces
> the
> >> > > > concurrent marker to be one thread. This was for HDFS, but I
think
> >> the
> >> > > same
> >> > > > applies here.
> >> > > >
> >> > > > Kihwal
> >> > > >
> >> > > > On 12/1/11 1:26 PM, "Stack" <stack@duboce.net> wrote:
> >> > > >
> >> > > > Make sure its not the issue that Jonathan Payne identifiied a
> while
> >> > > > back:
> >> > > >
> >> > >
> >> >
> >>
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> >> > > > St.Ack
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message