hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shrijeet Paliwal (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism
Date Sun, 04 Dec 2011 20:48:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162475#comment-13162475
] 

Shrijeet Paliwal commented on HBASE-4633:
-----------------------------------------

Recent updates: 
* In my case the leak/memory-hold is not in HBase client. I could not find enough evidence
to conclude that. What I did find is, our application holds one heavy object in memory. This
object is shared between threads. Every N minutes the application creates a new instance of
this class. Unless any thread is still holding on to an old instance, all old instances are
GCed in time. Hence in theory at any time there should be only one active instance of heavy
object. 

* Under heavy load and client operation RPC timeout enabled, some threads get stuck. This
causes multiple instances of heavy object. In turn heap grows. 

After reading client code multiple times I can not gather why there will be a case when application
thread will get stuck for several minutes. We have safe guards to clean up calls 'forcefully'
if they have been alive for more than rpc timeout interval. 

I had planned to update the title of Jira to reflect above finding but Gaojinchao observed
something interesting at his end and so keeping title same for now. Gaojinchao's thread is
here: http://search-hadoop.com/m/teczL8KvcH

                
> Potential memory leak in client RPC timeout mechanism
> -----------------------------------------------------
>
>                 Key: HBASE-4633
>                 URL: https://issues.apache.org/jira/browse/HBASE-4633
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.3
>         Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0
>            Reporter: Shrijeet Paliwal
>
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> note , the same behavior is not observed in 'not so busy
> cluster.
> The patch is here : https://gist.github.com/1288023

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message