hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17114) Add an option to set special retry pause when encountering CallQueueTooBigException
Date Fri, 18 Nov 2016 03:13:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675588#comment-15675588
] 

Yu Li commented on HBASE-17114:
-------------------------------

bq. Another approach to this would be to allow the server to hint back to the client how long
it should back off
I guess the above statement about "back off" is the back off policy instead of the exponential
backoff array? So I checked the default value of {{ClientBackoffPolicy}}, or could you please
explain how to make server hint back? [~ghelmling]

bq. If you want to make this overridable for some exception types, that seems ok, but in that
case the config property for overriding the value should be more closely tied to the exception.
Well, if checking the uploaded patch, it's indeed tied to CQTBE only. Introducing a new property
is only for making things more flexible, and of course we could use a hard-coded, like 5 times
than the existing pause, for CQTBE. But I'd say this is a trade-off, waiting longer for CQTBE
could prevent the vicious circle but is also causing a higher latency, and IMHO user should
be able to control such trade-off. If they don't want CQTBE to be special, they could set
{{hbase.client.pause.special}} to the same value as {{hbase.client.pause}}, which gives them
more options.

No offense but I'm even thinking of making CQTBE thrown optional, because for some case dead-wait
for the request to be executed in RpcServer until time-out is preferable by user rather than
receiving some exception and retry and fail again, but obviously this is another topic (Smile).

bq. It's only special in the sense that it should not clear the client meta cache
Sorry but I don't see any difference in "should not clear the client meta cache" and "should
not retry so frequently", both trying to resolve some problem and make things better.

OTOH, we already have the {{RetryImmediatelyException}} just because in some case retry w/o
waiting is good, then why retry slower is not acceptable? Now that the retry pause already
split into immediately and wait, I think it's ok to further split the wait case into quick
and slow, wdyt?

Thanks.

> Add an option to set special retry pause when encountering CallQueueTooBigException
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-17114
>                 URL: https://issues.apache.org/jira/browse/HBASE-17114
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-17114.patch
>
>
> As titled, after HBASE-15146 we will throw {{CallQueueTooBigException}} instead of dead-wait.
This is good for performance for most cases but might cause a side-effect that if too many
clients connect to the busy RS, that the retry requests may come over and over again and RS
never got the chance for recovering, and the issue will become especially critical when the
target region is META.
> So here in this JIRA we propose to supply some special retry pause for CQTBE in name
of {{hbase.client.pause.special}}, and by default it will be 500ms (5 times of {{hbase.client.pause}}
default value)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message