hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException
Date Thu, 17 Nov 2016 01:02:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672280#comment-15672280

Gary Helmling commented on HBASE-17114:

The new CoDel may help in successfully processing more requests in these overloaded situations.

But, in general, I'm not sure we should handle CQTBE differently from any other retry-triggering
exception (other than RetryImmediatelyException), and giving another knob to configure seems
like it would just further complicate HBase tuning.

Another approach to this would be to allow the server to hint back to the client how long
it should back off.  In this case, the exception itself could carry a multiplier as part of
the payload.  As the server remains overloaded for a longer and longer period of time, in
could increase the multiplier returned in the exception, which would allow it to hint to clients
that they should back off for longer.  The heuristics for doing this correctly may be tricky
to get right, but I think this could be more generally applicable.  We could introduce a new
parent exception (RetryIOException) to contain the multiplier and apply this in all situations
that make sense.  However, this would also require a change to RPC to carry through the multiplier
value.  This isn't perfect either -- the multiplier received by the client represents the
server state at a previous point in time, which may already have changed.  But I think this
is better than just statically configuring different pauses for different exceptions.

> Add an option to set special retry pause when encountering CallQueueTooBigException
> -----------------------------------------------------------------------------------
>                 Key: HBASE-17114
>                 URL: https://issues.apache.org/jira/browse/HBASE-17114
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} instead of dead-wait.
This is good for performance for most cases but might cause a side-effect that if too many
clients connect to the busy RS, that the retry requests may come over and over again and RS
never got the chance for recovering, and the issue will become especially critical when the
target region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE in name
of {{hbase.client.pause.special}}, and by default it will be 500ms (5 times of {{hbase.client.pause}}
default value)

This message was sent by Atlassian JIRA

View raw message