hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guanghao Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16165) Decrease RpcServer.callQueueSize before writeResponse causes OOM
Date Wed, 14 Sep 2016 09:29:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489936#comment-15489936
] 

Guanghao Zhang commented on HBASE-16165:
----------------------------------------

We observed an OOM case in our production cluster. Table A in source cluster has 500+ regions
but it only has 1 region in slave cluster.  Then the mr job write a lot data in source cluster.
It replicate to slave cluster and all data write to one regionserver. Then the regionserver
crashed by OOM. One fix is to decrease RpcServer.callQueueSize when the responder wirte out
the response really. Another fix is nullify the param early. Upload a little fix for this
and set the param null when send response.

> Decrease RpcServer.callQueueSize before writeResponse causes OOM
> ----------------------------------------------------------------
>
>                 Key: HBASE-16165
>                 URL: https://issues.apache.org/jira/browse/HBASE-16165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Priority: Minor
>         Attachments: HBASE-16165.patch
>
>
> In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls which causes
OOM. But in {{CallRunner.run}}, we decrease it before send the response back. And even after
calling {{sendResponseIfReady}}, the call object could stay in our heap for a long time if
we can not write out the response(That's why we need a Responder thread...). This makes it
possible that the actual size of all call object in heap is larger than {{maxQueueSizeInBytes}}
and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message