hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-3813) Change RPC callQueue size from "handlerCount * MAX_QUEUE_SIZE_PER_HANDLER;"
Date Thu, 05 May 2011 05:41:03 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-3813:
-------------------------

    Attachment: 3813.txt

Patch for 0.90 branch.

My thinking is this needs a fix for 0.90.3.  100 times the handler count can turn ugly real
fast if cells are of any significant size and the RS stalls for a moment and queues backup.
 This patch makes it configurable at least w/ the default tuned down from 100 to be more like
10 or so.

Todd and Gary, you fellas are talking about a more correct fix. This unaccounted memory usage
is going to mess us up over and over again so I think it a critical issue in need of a proper
fix but I'm thinking proper fix is over in 0.92.0?

I'm fine w/ this workaround not going into 0.90.3.  Just putting it up here in case folks
are amenable.

> Change RPC callQueue size from "handlerCount * MAX_QUEUE_SIZE_PER_HANDLER;"
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-3813
>                 URL: https://issues.apache.org/jira/browse/HBASE-3813
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: stack
>            Priority: Critical
>         Attachments: 3813.txt
>
>
> Yesterday debugging w/ Jack we noticed that with few handlers on a big box, he was seeing
stats like this:
> {code}
> 2011-04-21 11:54:49,451 DEBUG org.apache.hadoop.ipc.HBaseServer: Server connection from
X.X.X.X:60931; # active connections: 11; # queued calls: 2500
> {code}
> We had 2500 items in the rpc queue waiting to be processed.
> Turns out he had too few handlers for number of clients (but also, it seems like he figured
hw issues in that his RAM bus was running at 1/4 the rate that it should have been running
at).
> Chatting w/ J-D this morning, he asked if the queues hold 'data'.  The queues hold 'Calls'.
 Calls are the client request.  They contain data.
> Jack had 2500 items queued.  If each item to insert was 1MB, thats 25k * 1MB of memory
that is outside of our generally accounting.
> Currently the queue size is handlers * MAX_QUEUE_SIZE_PER_HANDLER where MAX_QUEUE_SIZE_PER_HANDLER
is hardcoded to be 100.
> If the queue is full we block (LinkedBlockingQueue).
> Going to change the queue size from 100 to 10 by default -- but also will make it configurable
and will doc. this as possible cause of OOME.  Will try it on production here before committing
patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message